# Retail Store Time Series Analysis

This notebook provides a comprehensive workflow for time series analysis on retail store sales data. The steps include exploratory data analysis (EDA), preprocessing, visualization, modeling, and forecasting. The goal is to help you understand patterns, seasonality, and make accurate sales forecasts to support data-driven retail decision-making.

## Table of Contents
1. [Introduction](#introduction)
2. [Data Loading & Overview](#data-loading)
3. [Exploratory Data Analysis (EDA)](#eda)
4. [Data Preprocessing](#preprocessing)
5. [Time Series Visualization](#visualization)
6. [Stationarity & Decomposition](#stationarity)
7. [Modeling & Forecasting](#modeling)
8. [Evaluation & Conclusions](#evaluation)

---

## 1. Introduction <a id='introduction'></a>

Retail businesses rely on accurate sales forecasts for inventory management, staffing, and marketing. Time series analysis helps uncover hidden trends, seasonality, and patterns in sales data. In this notebook, we will:

- Explore and visualize retail store sales data
- Prepare the data for time series modeling
- Build forecasting models (ARIMA, Exponential Smoothing, Prophet)
- Evaluate model performance
- Generate actionable insights for retail planning

In [None]:
# 2. Data Loading & Overview <a id='data-loading'></a>
import pandas as pd
import numpy as np

# Load your data: replace 'your_data.csv' as needed
df = pd.read_csv('your_data.csv', parse_dates=['date'])
df.head()

**Key Columns to Expect:**
- `date`: Date of sales record
- `store`: Store identifier
- `item`: Item identifier (if applicable)
- `sales`: Number of items sold
- (Optional) Other features: promotions, holidays, etc.

In [None]:
# Quick overview
print('Shape:', df.shape)
print('Columns:', df.columns.tolist())
df.describe(include='all')

---
## 3. Exploratory Data Analysis (EDA) <a id='eda'></a>

In [None]:
# Check for missing values
df.isnull().sum()

In [None]:
# Sales over time
import matplotlib.pyplot as plt
plt.figure(figsize=(14, 6))
plt.plot(df.groupby('date')['sales'].sum())
plt.title('Total Sales Over Time')
plt.xlabel('Date')
plt.ylabel('Sales')
plt.show()

In [None]:
# Sales by store (if applicable)
df.groupby('store')['sales'].sum().sort_values(ascending=False).plot(kind='bar', figsize=(10,4))
plt.title('Total Sales by Store')
plt.xlabel('Store')
plt.ylabel('Sales')
plt.show()

---
## 4. Data Preprocessing <a id='preprocessing'></a>

In [None]:
# Set date as index
df = df.set_index('date')

# Aggregate sales (example: daily total sales)
daily_sales = df['sales'].resample('D').sum()
daily_sales.head()

In [None]:
# Fill missing dates with 0 sales (if appropriate)
daily_sales = daily_sales.asfreq('D', fill_value=0)

---
## 5. Time Series Visualization <a id='visualization'></a>

In [None]:
plt.figure(figsize=(14,6))
plt.plot(daily_sales)
plt.title('Daily Sales Time Series')
plt.xlabel('Date')
plt.ylabel('Sales')
plt.show()

In [None]:
# Rolling mean and std
rolling_mean = daily_sales.rolling(window=30).mean()
rolling_std = daily_sales.rolling(window=30).std()

plt.figure(figsize=(14,6))
plt.plot(daily_sales, label='Daily Sales')
plt.plot(rolling_mean, label='30-day Mean')
plt.plot(rolling_std, label='30-day Std')
plt.legend()
plt.title('Rolling Mean and Standard Deviation')
plt.show()

---
## 6. Stationarity & Decomposition <a id='stationarity'></a>

In [None]:
# ADF test for stationarity
from statsmodels.tsa.stattools import adfuller
result = adfuller(daily_sales)
print('ADF Statistic:', result[0])
print('p-value:', result[1])

In [None]:
# Decomposition
from statsmodels.tsa.seasonal import seasonal_decompose
decomposition = seasonal_decompose(daily_sales, model='additive', period=365)
fig = decomposition.plot()
fig.set_size_inches(14, 10)
plt.show()

---
## 7. Modeling & Forecasting <a id='modeling'></a>

In [None]:
# Train/test split
split_date = daily_sales.index[-int(0.2*len(daily_sales))]
train = daily_sales[:split_date]
test = daily_sales[split_date:]
print('Train shape:', train.shape)
print('Test shape:', test.shape)

### ARIMA Model

In [None]:
from statsmodels.tsa.arima.model import ARIMA
import warnings
warnings.filterwarnings('ignore')

model = ARIMA(train, order=(1,1,1))
arima_result = model.fit()
arima_forecast = arima_result.forecast(steps=len(test))

plt.figure(figsize=(14,6))
plt.plot(train, label='Train')
plt.plot(test, label='Test')
plt.plot(test.index, arima_forecast, label='ARIMA Forecast')
plt.legend()
plt.title('ARIMA Forecast vs Actuals')
plt.show()

### Exponential Smoothing

In [None]:
from statsmodels.tsa.holtwinters import ExponentialSmoothing
hw_model = ExponentialSmoothing(train, trend='add', seasonal='add', seasonal_periods=365).fit()
hw_forecast = hw_model.forecast(len(test))

plt.figure(figsize=(14,6))
plt.plot(test, label='Test')
plt.plot(test.index, hw_forecast, label='Holt-Winters Forecast')
plt.legend()
plt.title('Holt-Winters Forecast vs Actuals')
plt.show()

### Prophet (Optional, if available)
Prophet is a powerful time series library from Facebook. Install with `pip install prophet` if not present.

In [None]:
try:
    from prophet import Prophet
    prophet_df = train.reset_index().rename(columns={'date':'ds','sales':'y'})
    model = Prophet(yearly_seasonality=True, daily_seasonality=False)
    model.fit(prophet_df)

    future = model.make_future_dataframe(periods=len(test))
    forecast = model.predict(future)

    plt.figure(figsize=(14,6))
    plt.plot(test.index, test, label='Test')
    plt.plot(test.index, forecast['yhat'][-len(test):].values, label='Prophet Forecast')
    plt.legend()
    plt.title('Prophet Forecast vs Actuals')
    plt.show()
except ImportError:
    print('Prophet not installed, skipping this section.')

---
## 8. Evaluation & Conclusions <a id='evaluation'></a>

In [None]:
from sklearn.metrics import mean_squared_error, mean_absolute_error

def print_metrics(true, pred, name):
    print(f'--- {name} ---')
    print('MAE:', mean_absolute_error(true, pred))
    print('RMSE:', mean_squared_error(true, pred, squared=False))

print_metrics(test, arima_forecast, 'ARIMA')
print_metrics(test, hw_forecast, 'Holt-Winters')
# For Prophet, if used:
# print_metrics(test, forecast['yhat'][-len(test):].values, 'Prophet')

### Key Takeaways
- Visualizations and decomposition help understand seasonality and trends.
- ARIMA and Exponential Smoothing provide good benchmarks for retail forecasting.
- Prophet is robust for capturing complex seasonal effects (when available).
- Always use hold-out/test data to evaluate forecasting accuracy.

For production, consider tuning model hyperparameters, including external regressors (promotions, holidays), and automating retraining.