[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/danpele/Time-Series-Analysis/blob/main/chapter4_lecture_notebook.ipynb)

---

# Chapter 4: SARIMA Models for Seasonal Time Series

**Course:** Time Series Analysis and Forecasting  
**Program:** Bachelor program, Faculty of Cybernetics, Statistics and Economic Informatics, Bucharest University of Economic Studies, Romania  
**Academic Year:** 2025-2026

---

## Learning Objectives

By the end of this notebook, you will be able to:
1. Identify and characterize seasonal patterns in time series
2. Apply seasonal differencing to remove stochastic seasonality
3. Understand the SARIMA$(p,d,q) \times (P,D,Q)_s$ notation
4. Fit SARIMA models using Python
5. Interpret seasonal ACF/PACF patterns
6. Generate seasonal forecasts with confidence intervals

## Setup and Imports

In [None]:
# Core libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

# Time series specific
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.stattools import adfuller, kpss, acf, pacf
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.stats.diagnostic import acorr_ljungbox
from scipy import stats

# Plotting style - clean, professional
plt.rcParams['figure.figsize'] = (12, 5)
plt.rcParams['font.size'] = 11
plt.rcParams['axes.facecolor'] = 'none'
plt.rcParams['figure.facecolor'] = 'none'
plt.rcParams['savefig.facecolor'] = 'none'
plt.rcParams['axes.grid'] = False
plt.rcParams['axes.spines.top'] = False
plt.rcParams['axes.spines.right'] = False

# Colors (IDA color scheme)
COLORS = {
    'blue': '#1A3A6E',
    'red': '#DC3545',
    'green': '#2E7D32',
    'orange': '#E67E22',
    'gray': '#666666'
}

print("All libraries loaded successfully!")

## 1. Introduction to Seasonality

**Seasonality** is a repeating pattern that occurs at regular intervals:
- Monthly retail sales peak in December
- Quarterly GDP shows annual patterns
- Daily electricity demand varies by day of week

### Types of Seasonality
- **Deterministic**: Fixed, repeating pattern (use seasonal dummies)
- **Stochastic**: Evolving seasonal pattern (use SARIMA)

In [None]:
# Load the classic airline passengers dataset
from statsmodels.datasets import co2

# Use airline passengers data
try:
    # Try loading from statsmodels
    url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-passengers.csv'
    airline = pd.read_csv(url, index_col=0, parse_dates=True)
    airline.columns = ['Passengers']
    airline.index = pd.date_range('1949-01', periods=len(airline), freq='ME')
except:
    # Generate synthetic airline-like data
    np.random.seed(42)
    n = 144  # 12 years monthly
    t = np.arange(n)
    trend = 100 + 2.5 * t
    seasonal = 40 * np.sin(2 * np.pi * t / 12)
    noise = np.random.randn(n) * 10
    # Multiplicative: amplitude grows with level
    passengers = trend * (1 + 0.3 * np.sin(2 * np.pi * t / 12)) + noise
    airline = pd.DataFrame(
        {'Passengers': passengers},
        index=pd.date_range('1949-01', periods=n, freq='ME')
    )

print(f"Airline Passengers Data: {len(airline)} monthly observations")
print(f"Period: {airline.index[0].date()} to {airline.index[-1].date()}")
print(airline.head())

In [None]:
# Plot the airline passengers data
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Original data
axes[0].plot(airline.index, airline['Passengers'], color=COLORS['blue'], linewidth=1, label='Monthly Passengers')
axes[0].set_title('Airline Passengers (1949-1960)', fontweight='bold')
axes[0].set_xlabel('Date')
axes[0].set_ylabel('Passengers (thousands)')
axes[0].legend(loc='upper center', bbox_to_anchor=(0.5, -0.12), frameon=False)

# Seasonal subseries plot - show each month's pattern
monthly_means = airline.groupby(airline.index.month).mean()
axes[1].bar(range(1, 13), monthly_means['Passengers'].values, color=COLORS['blue'], alpha=0.7, label='Monthly Average')
axes[1].set_title('Average Passengers by Month', fontweight='bold')
axes[1].set_xlabel('Month')
axes[1].set_ylabel('Average Passengers')
axes[1].set_xticks(range(1, 13))
axes[1].set_xticklabels(['J', 'F', 'M', 'A', 'M', 'J', 'J', 'A', 'S', 'O', 'N', 'D'])
axes[1].legend(loc='upper center', bbox_to_anchor=(0.5, -0.12), frameon=False)

plt.tight_layout()
plt.show()

print("Key observations:")
print("- Clear upward trend over time")
print("- Strong seasonal pattern (peaks in summer)")
print("- Seasonal amplitude GROWS with the level (multiplicative seasonality)")

## 2. Seasonal Decomposition

Time series with seasonality can be decomposed into:
- **Trend** ($T_t$): Long-term direction
- **Seasonal** ($S_t$): Repeating pattern
- **Residual** ($R_t$): Random noise

### Decomposition Types
- **Additive**: $Y_t = T_t + S_t + R_t$ (constant seasonal amplitude)
- **Multiplicative**: $Y_t = T_t \times S_t \times R_t$ (growing seasonal amplitude)

In [None]:
# Perform seasonal decomposition
decomposition = seasonal_decompose(airline['Passengers'], model='multiplicative', period=12)

fig, axes = plt.subplots(4, 1, figsize=(14, 10), sharex=True)

# Original
axes[0].plot(airline.index, airline['Passengers'], color=COLORS['blue'], linewidth=1, label='Observed')
axes[0].set_title('Original Series', fontweight='bold')
axes[0].legend(loc='upper left', frameon=False)

# Trend
axes[1].plot(airline.index, decomposition.trend, color=COLORS['green'], linewidth=1, label='Trend')
axes[1].set_title('Trend Component', fontweight='bold')
axes[1].legend(loc='upper left', frameon=False)

# Seasonal
axes[2].plot(airline.index, decomposition.seasonal, color=COLORS['orange'], linewidth=1, label='Seasonal')
axes[2].set_title('Seasonal Component (multiplicative factors)', fontweight='bold')
axes[2].axhline(y=1, color='black', linestyle='--', alpha=0.3)
axes[2].legend(loc='upper left', frameon=False)

# Residual
axes[3].plot(airline.index, decomposition.resid, color=COLORS['red'], linewidth=1, label='Residual')
axes[3].set_title('Residual Component', fontweight='bold')
axes[3].axhline(y=1, color='black', linestyle='--', alpha=0.3)
axes[3].legend(loc='upper left', frameon=False)
axes[3].set_xlabel('Date')

plt.tight_layout()
plt.show()

print("\nSeasonal factors (multiplicative):")
seasonal_factors = decomposition.seasonal.iloc[:12]
for i, (month, factor) in enumerate(zip(['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 
                                          'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'], 
                                         seasonal_factors)):
    print(f"  {month}: {factor:.3f} ({(factor-1)*100:+.1f}% from average)")

## 3. The Seasonal Differencing Operator

For data with period $s$, the **seasonal difference** is:

$$(1 - L^s)Y_t = Y_t - Y_{t-s}$$

### Examples
- Monthly data ($s=12$): $(1-L^{12})Y_t = Y_t - Y_{t-12}$
- Quarterly data ($s=4$): $(1-L^4)Y_t = Y_t - Y_{t-4}$

### Full Differencing for Trend + Seasonality
$$(1-L)(1-L^s)Y_t = (Y_t - Y_{t-1}) - (Y_{t-s} - Y_{t-s-1})$$

In [None]:
# Apply different differencing operators
y = airline['Passengers'].values

# Log transform first (for multiplicative seasonality)
log_y = np.log(y)

# First difference (removes trend)
diff1 = np.diff(log_y)

# Seasonal difference (removes seasonality)
diff12 = log_y[12:] - log_y[:-12]

# Both differences (removes both)
diff1_12 = diff12[1:] - diff12[:-1]

fig, axes = plt.subplots(2, 2, figsize=(14, 8))

# Original log series
axes[0, 0].plot(log_y, color=COLORS['blue'], linewidth=1, label='log(Y_t)')
axes[0, 0].set_title('Log Passengers (Original)', fontweight='bold')
axes[0, 0].legend(loc='upper center', bbox_to_anchor=(0.5, -0.12), frameon=False)

# First difference only
axes[0, 1].plot(diff1, color=COLORS['green'], linewidth=1, label='(1-L) log(Y_t)')
axes[0, 1].axhline(y=0, color='red', linestyle='--', alpha=0.5)
axes[0, 1].set_title('First Difference (trend removed, seasonality remains)', fontweight='bold')
axes[0, 1].legend(loc='upper center', bbox_to_anchor=(0.5, -0.12), frameon=False)

# Seasonal difference only
axes[1, 0].plot(diff12, color=COLORS['orange'], linewidth=1, label='(1-L^12) log(Y_t)')
axes[1, 0].axhline(y=0, color='red', linestyle='--', alpha=0.5)
axes[1, 0].set_title('Seasonal Difference (seasonality removed, trend remains)', fontweight='bold')
axes[1, 0].legend(loc='upper center', bbox_to_anchor=(0.5, -0.12), frameon=False)

# Both differences
axes[1, 1].plot(diff1_12, color=COLORS['red'], linewidth=1, label='(1-L)(1-L^12) log(Y_t)')
axes[1, 1].axhline(y=0, color='black', linestyle='--', alpha=0.5)
axes[1, 1].set_title('Both Differences (stationary!)', fontweight='bold')
axes[1, 1].legend(loc='upper center', bbox_to_anchor=(0.5, -0.12), frameon=False)

plt.tight_layout()
plt.show()

In [None]:
# Unit root tests to verify stationarity
def run_adf_test(series, name):
    """Run ADF test and print results."""
    series_clean = series[~np.isnan(series)]
    result = adfuller(series_clean, autolag='AIC')
    print(f"\n{name}:")
    print(f"  ADF Statistic: {result[0]:.4f}")
    print(f"  p-value: {result[1]:.6f}")
    print(f"  Conclusion: {'STATIONARY' if result[1] < 0.05 else 'NON-STATIONARY'}")

print("="*60)
print("ADF Tests for Different Transformations")
print("="*60)
run_adf_test(log_y, "Log Passengers")
run_adf_test(diff1, "First Difference of Log")
run_adf_test(diff12, "Seasonal Difference of Log")
run_adf_test(diff1_12, "Both Differences")

## 4. SARIMA Model Structure

**SARIMA**$(p,d,q) \times (P,D,Q)_s$ combines:

| Component | Meaning |
|-----------|--------|
| $p$ | Non-seasonal AR order |
| $d$ | Non-seasonal differencing |
| $q$ | Non-seasonal MA order |
| $P$ | Seasonal AR order |
| $D$ | Seasonal differencing |
| $Q$ | Seasonal MA order |
| $s$ | Seasonal period |

### The Full Model
$$\phi(L)\Phi(L^s)(1-L)^d(1-L^s)^D Y_t = c + \theta(L)\Theta(L^s)\varepsilon_t$$

where:
- $\phi(L) = 1 - \phi_1 L - \cdots - \phi_p L^p$ (non-seasonal AR)
- $\Phi(L^s) = 1 - \Phi_1 L^s - \cdots - \Phi_P L^{Ps}$ (seasonal AR)
- $\theta(L) = 1 + \theta_1 L + \cdots + \theta_q L^q$ (non-seasonal MA)
- $\Theta(L^s) = 1 + \Theta_1 L^s + \cdots + \Theta_Q L^{Qs}$ (seasonal MA)

## 5. The Airline Model: SARIMA$(0,1,1) \times (0,1,1)_{12}$

The classic **airline model** introduced by Box & Jenkins (1970):

$$(1-L)(1-L^{12})Y_t = (1 + \theta_1 L)(1 + \Theta_1 L^{12})\varepsilon_t$$

### Expanded Form
$$Y_t - Y_{t-1} - Y_{t-12} + Y_{t-13} = \varepsilon_t + \theta_1 \varepsilon_{t-1} + \Theta_1 \varepsilon_{t-12} + \theta_1 \Theta_1 \varepsilon_{t-13}$$

Only **2 parameters** ($\theta_1, \Theta_1$) yet captures complex seasonal dynamics!

In [None]:
# Fit the airline model to log passengers
model = SARIMAX(log_y, 
                order=(0, 1, 1), 
                seasonal_order=(0, 1, 1, 12),
                enforce_stationarity=False,
                enforce_invertibility=False)

results = model.fit(disp=False)

print("Airline Model: SARIMA(0,1,1)(0,1,1)[12]")
print("="*60)
print(results.summary())

In [None]:
# Interpret the coefficients
print("\nCoefficient Interpretation:")
print("="*50)
print(f"θ₁ (ma.L1) = {results.params['ma.L1']:.4f}")
print(f"Θ₁ (ma.S.L12) = {results.params['ma.S.L12']:.4f}")
print(f"σ² = {results.params['sigma2']:.6f}")
print("\nBoth MA coefficients are negative (typical for this data):")
print("- Negative θ₁: Positive shocks partially reverse next month")
print("- Negative Θ₁: Positive shocks partially reverse next year")

## 6. ACF/PACF Analysis for Seasonal Data

For seasonal data, look for patterns at:
- **Non-seasonal lags**: 1, 2, 3, ... (regular ARMA patterns)
- **Seasonal lags**: $s$, $2s$, $3s$, ... (e.g., 12, 24, 36 for monthly)

### Pattern Recognition
| If you see... | Consider... |
|---------------|------------|
| ACF cuts off at lag $s$ | Seasonal MA ($Q$) |
| PACF cuts off at lag $s$ | Seasonal AR ($P$) |
| Both decay at seasonal lags | Mixed seasonal ARMA |

In [None]:
# ACF and PACF of differenced series
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# ACF of (1-L)(1-L^12) log Y
plot_acf(diff1_12, ax=axes[0, 0], lags=48, color=COLORS['blue'])
axes[0, 0].set_title('ACF of (1-L)(1-L^12) log(Y)', fontweight='bold')
# Mark seasonal lags
for lag in [12, 24, 36, 48]:
    axes[0, 0].axvline(x=lag, color='red', linestyle=':', alpha=0.5)

# PACF of (1-L)(1-L^12) log Y
plot_pacf(diff1_12, ax=axes[0, 1], lags=48, color=COLORS['blue'])
axes[0, 1].set_title('PACF of (1-L)(1-L^12) log(Y)', fontweight='bold')
for lag in [12, 24, 36, 48]:
    axes[0, 1].axvline(x=lag, color='red', linestyle=':', alpha=0.5)

# Zoom in on first 15 lags
plot_acf(diff1_12, ax=axes[1, 0], lags=15, color=COLORS['green'])
axes[1, 0].set_title('ACF: Non-seasonal lags (1-15)', fontweight='bold')

# Show only seasonal lags in a bar chart
acf_vals = acf(diff1_12, nlags=48)
seasonal_lags = [12, 24, 36, 48]
seasonal_acf = [acf_vals[lag] for lag in seasonal_lags]
axes[1, 1].bar(seasonal_lags, seasonal_acf, color=COLORS['orange'], width=2, label='ACF at seasonal lags')
axes[1, 1].axhline(y=0, color='black', linestyle='-')
axes[1, 1].axhline(y=1.96/np.sqrt(len(diff1_12)), color='blue', linestyle='--', alpha=0.5)
axes[1, 1].axhline(y=-1.96/np.sqrt(len(diff1_12)), color='blue', linestyle='--', alpha=0.5)
axes[1, 1].set_title('ACF at Seasonal Lags (12, 24, 36, 48)', fontweight='bold')
axes[1, 1].set_xlabel('Lag')
axes[1, 1].legend(loc='upper center', bbox_to_anchor=(0.5, -0.12), frameon=False)

plt.tight_layout()
plt.show()

print("\nACF/PACF Interpretation:")
print("- Significant spike at lag 1 in ACF → MA(1) component")
print("- Significant spike at lag 12 in ACF → Seasonal MA(1) component")
print("- This suggests SARIMA(0,1,1)(0,1,1)[12] - the airline model!")

## 7. Model Diagnostics

In [None]:
# Diagnostic plots
# Skip first 13 residuals (burn-in for seasonal model with s=12)
residuals_full = results.resid
residuals = residuals_full[13:]  # Skip burn-in period

fig, axes = plt.subplots(2, 2, figsize=(14, 8))

# Residuals over time
axes[0, 0].plot(residuals.index, residuals.values, color=COLORS['blue'], linewidth=0.5, label='Residuals')
axes[0, 0].axhline(y=0, color='red', linestyle='--')
axes[0, 0].set_title('Residuals Over Time (after burn-in)', fontweight='bold')
axes[0, 0].set_xlabel('Time')
axes[0, 0].legend(loc='upper center', bbox_to_anchor=(0.5, -0.12), frameon=False)

# Histogram
axes[0, 1].hist(residuals, bins=20, color=COLORS['blue'], edgecolor='black', 
                alpha=0.7, density=True, label='Residuals')
x = np.linspace(residuals.min(), residuals.max(), 100)
axes[0, 1].plot(x, stats.norm.pdf(x, residuals.mean(), residuals.std()), 
                color=COLORS['red'], linewidth=2, label='Normal')
axes[0, 1].set_title('Residual Distribution', fontweight='bold')
axes[0, 1].legend(loc='upper center', bbox_to_anchor=(0.5, -0.12), ncol=2, frameon=False)

# ACF of residuals
plot_acf(residuals, ax=axes[1, 0], lags=36, color=COLORS['blue'])
axes[1, 0].set_title('ACF of Residuals', fontweight='bold')
for lag in [12, 24, 36]:
    axes[1, 0].axvline(x=lag, color='red', linestyle=':', alpha=0.3)

# Q-Q plot
(osm, osr), (slope, intercept, r) = stats.probplot(residuals, dist="norm")
axes[1, 1].scatter(osm, osr, color=COLORS['blue'], s=20, alpha=0.5, label='Sample')
axes[1, 1].plot(osm, slope*osm + intercept, color=COLORS['red'], linewidth=2, label='Theoretical')
axes[1, 1].set_title('Q-Q Plot', fontweight='bold')
axes[1, 1].set_xlabel('Theoretical Quantiles')
axes[1, 1].set_ylabel('Sample Quantiles')
axes[1, 1].legend(loc='upper center', bbox_to_anchor=(0.5, -0.12), ncol=2, frameon=False)

plt.tight_layout()
plt.show()

print(f"\nNote: First 13 residuals skipped (model burn-in period for seasonal order s=12)")

In [None]:
# Ljung-Box test
lb_test = acorr_ljungbox(residuals, lags=[12, 24, 36], return_df=True)
print("Ljung-Box Test for Residual Autocorrelation:")
print("="*50)
print(lb_test)
print("\nInterpretation:")
print("If all p-values > 0.05, residuals are white noise (model is adequate)")

## 8. Model Selection with Auto-SARIMA

In [None]:
# Install pmdarima if needed
try:
    import pmdarima as pm
    print("pmdarima is available")
except ImportError:
    print("Installing pmdarima...")
    !pip install pmdarima -q
    import pmdarima as pm
    print("pmdarima installed successfully")

In [None]:
# Use auto_arima to find the best SARIMA model
import pmdarima as pm

auto_model = pm.auto_arima(
    log_y,
    start_p=0, start_q=0,
    max_p=2, max_q=2,
    d=1,           # Non-seasonal differencing
    start_P=0, start_Q=0,
    max_P=2, max_Q=2,
    D=1,           # Seasonal differencing
    m=12,          # Seasonal period
    seasonal=True,
    stepwise=True,
    suppress_warnings=True,
    trace=True
)

print("\n" + "="*60)
print("Auto-SARIMA Selected Model:")
print("="*60)
print(auto_model.summary())

In [None]:
# Compare different SARIMA specifications
print("Model Comparison:")
print("="*70)
print(f"{'Model':<35} {'AIC':>12} {'BIC':>12}")
print("-"*70)

models_to_try = [
    ((0, 1, 1), (0, 1, 1, 12)),  # Airline model
    ((1, 1, 0), (1, 1, 0, 12)),  # Pure AR
    ((1, 1, 1), (0, 1, 1, 12)),  # Mixed
    ((0, 1, 1), (1, 1, 1, 12)),  # Extended seasonal
    ((1, 1, 1), (1, 1, 1, 12)),  # Full model
]

best_aic = float('inf')
best_model_name = None

for order, seasonal_order in models_to_try:
    try:
        model = SARIMAX(log_y, order=order, seasonal_order=seasonal_order,
                        enforce_stationarity=False, enforce_invertibility=False)
        res = model.fit(disp=False)
        model_name = f"SARIMA{order}x{seasonal_order[:3]}[{seasonal_order[3]}]"
        print(f"{model_name:<35} {res.aic:>12.2f} {res.bic:>12.2f}")
        if res.aic < best_aic:
            best_aic = res.aic
            best_model_name = model_name
    except Exception as e:
        pass

print("-"*70)
print(f"Best model by AIC: {best_model_name}")

## 9. Forecasting with SARIMA

In [None]:
# Generate forecasts
forecast_steps = 24  # 2 years ahead
forecast = results.get_forecast(steps=forecast_steps)
forecast_mean = forecast.predicted_mean
forecast_ci = forecast.conf_int()

# Convert back from log scale
forecast_passengers = np.exp(forecast_mean)
ci_lower = np.exp(forecast_ci.iloc[:, 0])
ci_upper = np.exp(forecast_ci.iloc[:, 1])

# Create forecast dates
last_date = airline.index[-1]
forecast_dates = pd.date_range(start=last_date, periods=forecast_steps+1, freq='ME')[1:]

# Plot
fig, ax = plt.subplots(figsize=(14, 6))

# Historical data
ax.plot(airline.index, airline['Passengers'], color=COLORS['blue'], linewidth=1, label='Historical')

# Forecasts
ax.plot(forecast_dates, forecast_passengers, color=COLORS['red'], linewidth=2, label='Forecast')

# Confidence interval
ax.fill_between(forecast_dates, ci_lower, ci_upper,
                color=COLORS['red'], alpha=0.2, label='95% CI')

ax.axvline(x=last_date, color='black', linestyle='-', alpha=0.3)
ax.set_xlabel('Date')
ax.set_ylabel('Passengers (thousands)')
ax.set_title('Airline Passengers: SARIMA(0,1,1)(0,1,1)[12] Forecasts', fontweight='bold')
ax.legend(loc='upper center', bbox_to_anchor=(0.5, -0.1), ncol=3, frameon=False)
plt.tight_layout()
plt.show()

print(f"\nForecast Summary (next 12 months):")
print("="*50)
for i in range(12):
    print(f"{forecast_dates[i].strftime('%Y-%m')}: {forecast_passengers.iloc[i]:.0f} "
          f"[{ci_lower.iloc[i]:.0f}, {ci_upper.iloc[i]:.0f}]")

## 10. In-Sample Fit Evaluation

In [None]:
# Compare fitted values with actual
fitted = np.exp(results.fittedvalues)

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Actual vs Fitted
axes[0].plot(airline.index, airline['Passengers'], color=COLORS['blue'], 
             linewidth=1, alpha=0.7, label='Actual')
axes[0].plot(airline.index, fitted, color=COLORS['red'], 
             linewidth=1, alpha=0.7, label='Fitted')
axes[0].set_title('Actual vs Fitted Values', fontweight='bold')
axes[0].set_xlabel('Date')
axes[0].set_ylabel('Passengers')
axes[0].legend(loc='upper center', bbox_to_anchor=(0.5, -0.12), ncol=2, frameon=False)

# Scatter plot
axes[1].scatter(airline['Passengers'], fitted, alpha=0.5, color=COLORS['blue'], s=20, label='Observations')
axes[1].plot([airline['Passengers'].min(), airline['Passengers'].max()],
             [airline['Passengers'].min(), airline['Passengers'].max()],
             'r--', linewidth=2, label='Perfect Fit')
axes[1].set_xlabel('Actual Passengers')
axes[1].set_ylabel('Fitted Passengers')
axes[1].set_title('Actual vs Fitted (Scatter)', fontweight='bold')
axes[1].legend(loc='upper center', bbox_to_anchor=(0.5, -0.12), ncol=2, frameon=False)

plt.tight_layout()
plt.show()

# Calculate fit statistics
actual = airline['Passengers'].values
mape = np.mean(np.abs((actual - fitted) / actual)) * 100
rmse = np.sqrt(np.mean((actual - fitted)**2))
r2 = 1 - np.sum((actual - fitted)**2) / np.sum((actual - actual.mean())**2)

print(f"\nModel Fit Statistics:")
print(f"  MAPE: {mape:.2f}%")
print(f"  RMSE: {rmse:.2f}")
print(f"  R²: {r2:.4f}")

## 11. Rolling Forecast Evaluation

In [None]:
# Rolling origin forecast evaluation
train_size = 120  # Use first 10 years for training
test_size = len(log_y) - train_size

forecasts = []
actuals = []

print(f"Rolling forecast evaluation ({test_size} forecasts)...")

for i in range(test_size):
    # Fit model on expanding window
    train = log_y[:train_size + i]
    model = SARIMAX(train, order=(0, 1, 1), seasonal_order=(0, 1, 1, 12),
                    enforce_stationarity=False, enforce_invertibility=False)
    res = model.fit(disp=False)
    
    # 1-step ahead forecast
    fc = res.forecast(steps=1)
    forecasts.append(np.exp(fc.values[0]))
    actuals.append(y[train_size + i])

forecasts = np.array(forecasts)
actuals = np.array(actuals)

# Calculate accuracy metrics
mape_oos = np.mean(np.abs((actuals - forecasts) / actuals)) * 100
rmse_oos = np.sqrt(np.mean((actuals - forecasts)**2))

# Plot
fig, ax = plt.subplots(figsize=(14, 5))

test_dates = airline.index[train_size:]
ax.plot(test_dates, actuals, color=COLORS['blue'], linewidth=1.5, label='Actual')
ax.plot(test_dates, forecasts, color=COLORS['red'], linewidth=1.5, label='1-Step Forecast')
ax.set_title('Rolling 1-Step Ahead Forecasts', fontweight='bold')
ax.set_xlabel('Date')
ax.set_ylabel('Passengers')
ax.legend(loc='upper center', bbox_to_anchor=(0.5, -0.12), ncol=2, frameon=False)

plt.tight_layout()
plt.show()

print(f"\nOut-of-Sample Forecast Accuracy:")
print(f"  MAPE: {mape_oos:.2f}%")
print(f"  RMSE: {rmse_oos:.2f}")

## 12. Comparing SARIMA to Naive Seasonal Benchmark

In [None]:
# Seasonal naive: forecast = same month last year
naive_forecasts = y[train_size - 12: -12]  # Shift by 12 months

mape_naive = np.mean(np.abs((actuals - naive_forecasts) / actuals)) * 100
rmse_naive = np.sqrt(np.mean((actuals - naive_forecasts)**2))

print("Forecast Accuracy Comparison:")
print("="*50)
print(f"{'Method':<25} {'MAPE':>10} {'RMSE':>10}")
print("-"*50)
print(f"{'Seasonal Naive':<25} {mape_naive:>10.2f}% {rmse_naive:>10.2f}")
print(f"{'SARIMA(0,1,1)(0,1,1)[12]':<25} {mape_oos:>10.2f}% {rmse_oos:>10.2f}")
print("-"*50)

improvement = (mape_naive - mape_oos) / mape_naive * 100
print(f"\nSARIMA improvement over naive: {improvement:.1f}%")

## Summary

### Key Takeaways

1. **Seasonality** is a common feature in economic time series
   - Monthly, quarterly, weekly patterns
   - Can be deterministic or stochastic

2. **Seasonal differencing** $(1-L^s)$ removes stochastic seasonality
   - For monthly data: $(1-L^{12})Y_t = Y_t - Y_{t-12}$
   - Often combined with regular differencing

3. **SARIMA**$(p,d,q) \times (P,D,Q)_s$ combines:
   - Regular ARIMA components
   - Seasonal ARIMA components
   - Multiplicative structure for interactions

4. **The airline model** SARIMA$(0,1,1) \times (0,1,1)_{12}$:
   - Only 2 parameters
   - Works well for many seasonal series

5. **ACF/PACF patterns** appear at seasonal lags
   - Look for spikes at $s$, $2s$, $3s$, ...

6. **Auto-SARIMA** automates model selection
   - `pmdarima` package in Python

### Next Chapter
Chapter 5 will cover **multivariate time series**: VAR models, Granger causality, and cointegration.