
# Time‑Series Forecasting — Python Notebook

**When to Use**  
- You need to **predict future values** (sales, demand, traffic) from historical observations and optional exogenous drivers (price, promos, holidays).  
- Planning **inventory**, **staffing**, and **media pacing** where lead times matter.

**Best Application**  
- Stable seasonal patterns (daily/weekly/monthly) with clear trend/seasonality.  
- **Scenario forecasts** with exogenous regressors (promotions, macro indicators).  
- **Forecast combinations** to reduce model risk.

**When Not to Use**  
- Highly volatile series with **frequent structural breaks** and minimal history (consider judgmental forecasting or causal models/experiments).  
- When the business question is **incrementality/causality** rather than prediction error—use causal inference instead.

**How to Interpret Results**  
- Compare models via **out‑of‑sample error** (MAE/MAPE/RMSE) from **rolling cross‑validation**.  
- Inspect **residual diagnostics** for autocorrelation and bias.  
- Coefficients in regression‑with‑ARIMA errors (SARIMAX) show **associations** with the target, not necessarily causality.


In [None]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from statsmodels.tsa.seasonal import STL
from statsmodels.tsa.holtwinters import ExponentialSmoothing
from statsmodels.tsa.statespace.sarimax import SARIMAX
from sklearn.metrics import mean_absolute_error, mean_absolute_percentage_error, mean_squared_error

pd.set_option('display.max_columns', 120)
plt.rcParams['figure.figsize'] = (8,4)
rng = np.random.default_rng(123)


### Data: Synthetic weekly retail sales with seasonality, promo, price index, and holiday spikes

In [None]:

# 3 years of weekly data
weeks = pd.date_range('2022-01-02', periods=156, freq='W-SUN')
t = np.arange(len(weeks))

# Components: trend, seasonality, noise
trend = 200 + 0.6 * t
seasonal = 20 * np.sin(2 * np.pi * t / 52) + 10 * np.sin(2 * np.pi * t / 26)

# Promo flag ~20% of weeks, PriceIndex around 100 +/- 5, MacroIndex around 50 +/- 3
promo = (rng.random(len(t)) < 0.2).astype(int)
price_index = 100 + rng.normal(0, 4, size=len(t))
macro_index = 50 + 0.3*np.sin(2*np.pi*t/52) + rng.normal(0,1.5,size=len(t))

# Holiday spikes around late Nov/Dec weeks
holiday = ((weeks.month==11) & (weeks.week>=47)) | (weeks.month==12)
holiday = holiday.astype(int)

# Generate sales with effects
sales = (trend + seasonal
         + 15*promo
         - 0.8*(price_index-100)
         + 0.9*(macro_index-50)
         + 25*holiday
         + rng.normal(0, 8, size=len(t)))

df = pd.DataFrame({
    'date': weeks,
    'sales': sales,
    'promo': promo,
    'price_index': price_index,
    'macro_index': macro_index,
    'holiday': holiday
}).set_index('date')

df.head()


### STL Decomposition (diagnostics)

In [None]:

stl = STL(df['sales'], period=52)
res = stl.fit()
res.trend.head(), res.seasonal.head(), res.resid.head()


In [None]:

plt.plot(df.index, res.trend)
plt.title('STL Trend')
plt.show()

plt.plot(df.index, res.seasonal)
plt.title('STL Seasonal')
plt.show()

plt.plot(df.index, res.resid)
plt.title('STL Residual')
plt.show()


### Train/Test Split

In [None]:

train = df.iloc[:-26].copy()
test = df.iloc[-26:].copy()
len(train), len(test)


### Baselines: Naive and Seasonal Naive

In [None]:

# Naive: last observed value
naive_fc = pd.Series(train['sales'].iloc[-1], index=test.index)

# Seasonal naive: repeat last year's same week value (period=52); if not enough history, fallback to naive
if len(train) >= 52:
    seas_naive_fc = train['sales'].iloc[-52:].reset_index(drop=True)
    seas_naive_fc = pd.Series(seas_naive_fc.values[:len(test)], index=test.index)
else:
    seas_naive_fc = naive_fc.copy()

def eval_forecast(y_true, y_pred):
    return {
        'MAE': mean_absolute_error(y_true, y_pred),
        'MAPE': mean_absolute_percentage_error(y_true, y_pred),
        'RMSE': mean_squared_error(y_true, y_pred, squared=False),
    }

baseline_scores = {
    'naive': eval_forecast(test['sales'], naive_fc),
    'seasonal_naive': eval_forecast(test['sales'], seas_naive_fc),
}
baseline_scores


In [None]:

plt.plot(train.index, train['sales'], label='Train')
plt.plot(test.index, test['sales'], label='Test')
plt.plot(test.index, naive_fc, label='Naive FC')
plt.title('Naive Forecast vs Actual')
plt.legend()
plt.show()

plt.plot(train.index, train['sales'], label='Train')
plt.plot(test.index, test['sales'], label='Test')
plt.plot(test.index, seas_naive_fc, label='Seasonal Naive FC')
plt.title('Seasonal Naive Forecast vs Actual')
plt.legend()
plt.show()


### Exponential Smoothing (Holt‑Winters)

In [None]:

hw = ExponentialSmoothing(
    train['sales'],
    trend='add',
    seasonal='add',
    seasonal_periods=52
).fit(optimized=True)

hw_fc = hw.forecast(len(test))
hw_scores = eval_forecast(test['sales'], hw_fc)
hw_scores


In [None]:

plt.plot(train.index, train['sales'], label='Train')
plt.plot(test.index, test['sales'], label='Test')
plt.plot(test.index, hw_fc, label='HW Forecast')
plt.title('Holt-Winters Forecast vs Actual')
plt.legend()
plt.show()


### SARIMAX (ARIMA with Exogenous Regressors)

In [None]:

# Simple manual order; in practice use AIC grid search / pmdarima auto_arima
order = (1,1,1)
seasonal_order = (1,1,1,52)

exog_cols = ['promo','price_index','macro_index','holiday']
sarimax = SARIMAX(train['sales'], exog=train[exog_cols], order=order, seasonal_order=seasonal_order, enforce_stationarity=False, enforce_invertibility=False)
sarimax_res = sarimax.fit(disp=False)

sarimax_fc = sarimax_res.forecast(steps=len(test), exog=test[exog_cols])
sarimax_scores = eval_forecast(test['sales'], sarimax_fc)
sarimax_scores


In [None]:

plt.plot(train.index, train['sales'], label='Train')
plt.plot(test.index, test['sales'], label='Test')
plt.plot(test.index, sarimax_fc, label='SARIMAX Forecast')
plt.title('SARIMAX Forecast vs Actual')
plt.legend()
plt.show()


### Rolling-Origin Cross‑Validation (Walk‑Forward)

In [None]:

def walk_forward(df, exog_cols, h=4, initial=78):
    # initial ~ 1.5 yrs, forecast horizon h weeks, roll by h
    cutpoints = list(range(initial, len(df)-h+1, h))
    evals = []
    for cp in cutpoints:
        tr = df.iloc[:cp]
        te = df.iloc[cp:cp+h]
        # Fit SARIMAX
        mod = SARIMAX(tr['sales'], exog=tr[exog_cols], order=(1,1,1), seasonal_order=(1,1,1,52), enforce_stationarity=False, enforce_invertibility=False)
        res = mod.fit(disp=False)
        fc = res.forecast(steps=h, exog=te[exog_cols])
        ev = eval_forecast(te['sales'], fc)
        ev['start'] = tr.index[-1]
        evals.append(ev)
    return pd.DataFrame(evals)

cv_scores = walk_forward(df, exog_cols, h=4, initial=78)
cv_scores[['MAE','MAPE','RMSE']].mean().to_dict()


In [None]:

plt.plot(cv_scores['start'], cv_scores['RMSE'])
plt.title('Walk-Forward RMSE by Split')
plt.show()


### Model Comparison

In [None]:

summary = pd.DataFrame({
    'MAE': [baseline_scores['naive']['MAE'], baseline_scores['seasonal_naive']['MAE'], hw_scores['MAE'], sarimax_scores['MAE']],
    'MAPE': [baseline_scores['naive']['MAPE'], baseline_scores['seasonal_naive']['MAPE'], hw_scores['MAPE'], sarimax_scores['MAPE']],
    'RMSE': [baseline_scores['naive']['RMSE'], baseline_scores['seasonal_naive']['RMSE'], hw_scores['RMSE'], sarimax_scores['RMSE']],
}, index=['Naive','SeasonalNaive','HoltWinters','SARIMAX'])

summary.round(3)



---

### Practical Guidance
- Start with **baselines** (naive/seasonal naive) and beat them.  
- Use **Holt‑Winters** for level+trend+seasonality; upgrade to **SARIMAX** to include exogenous drivers.  
- Apply **walk‑forward validation**; report average MAE/MAPE/RMSE across splits.  
- Combine forecasts (simple average) to reduce variance in practice.

### References (non‑link citations)
1. Hyndman & Athanasopoulos — *Forecasting: Principles and Practice*.  
2. Box, Jenkins, Reinsel & Ljung — *Time Series Analysis*.  
3. Greene — *Econometric Analysis*.
