[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/danpele/Time-Series-Analysis/blob/main/chapter10_lecture_notebook.ipynb)

---

# Chapter 10: Comprehensive Review

**Complete Time Series Analysis Workflow**

**Course:** Time Series Analysis and Forecasting  
**Program:** Bachelor program, Faculty of Cybernetics, Statistics and Economic Informatics, Bucharest University of Economic Studies, Romania  
**Academic Year:** 2025-2026

---

## Learning Objectives

This comprehensive review demonstrates the complete time series analysis workflow using **simulated data based on real-world patterns**:

1. **Case Study 1: Bitcoin** - Cryptocurrency volatility analysis with ARIMA-GARCH
2. **Case Study 2: Sunspots** - Long-cycle seasonal data (11-year Schwabe cycle)
3. **Case Study 3: US Unemployment** - Economic data with COVID-19 structural break

We will apply ALL methods covered in the course:
- Stationarity testing (ADF, KPSS)
- ARIMA/SARIMA modeling
- GARCH for volatility clustering
- Prophet with changepoint detection
- Fourier terms for long seasonality
- Model comparison and evaluation

**Note:** Data is simulated to replicate real-world patterns (regime changes, cycles, structural breaks) for educational purposes.

## Setup and Imports

In [None]:
# Install required packages (for Colab)
import sys
if 'google.colab' in sys.modules:
    !pip install prophet arch statsmodels -q

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

# Statistical tests and models
from statsmodels.tsa.stattools import adfuller, kpss
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.tsa.seasonal import seasonal_decompose

# GARCH
try:
    from arch import arch_model
    HAS_ARCH = True
except ImportError:
    HAS_ARCH = False
    print("arch not installed. Install with: pip install arch")

# Prophet
try:
    from prophet import Prophet
    HAS_PROPHET = True
except ImportError:
    HAS_PROPHET = False
    print("Prophet not installed. Install with: pip install prophet")

from sklearn.metrics import mean_squared_error, mean_absolute_error

# Plotting style - CONSISTENT WITH OTHER CHAPTERS
plt.rcParams['figure.figsize'] = (12, 5)
plt.rcParams['font.size'] = 11
plt.rcParams['axes.facecolor'] = 'none'
plt.rcParams['figure.facecolor'] = 'none'
plt.rcParams['axes.grid'] = False
plt.rcParams['axes.spines.top'] = False
plt.rcParams['axes.spines.right'] = False
plt.rcParams['legend.frameon'] = False

COLORS = {'blue': '#1A3A6E', 'red': '#DC3545', 'green': '#2E7D32', 'orange': '#E67E22', 'gray': '#666666'}

print("Setup complete!")
print(f"ARCH/GARCH available: {HAS_ARCH}")
print(f"Prophet available: {HAS_PROPHET}")

## Simulated Data Loading Functions

These functions generate data that mimics real-world patterns for educational purposes.

In [None]:
def get_bitcoin_data():
    """Simulated Bitcoin daily prices with regime-based volatility (2019-2024)
    
    Mimics real Bitcoin behavior: COVID crash, 2021 bull run, 2022 crypto winter.
    """
    np.random.seed(42)
    dates = pd.date_range('2019-01-01', '2024-01-01', freq='D')
    n = len(dates)
    
    price = 3700  # Bitcoin starting price Jan 2019
    prices = [price]
    
    for i in range(1, n):
        date = dates[i]
        
        # 2019: Recovery from crypto winter
        if date < pd.Timestamp('2020-01-01'):
            drift, vol = 0.0008, 0.035
        # Early 2020: Pre-COVID stability
        elif date < pd.Timestamp('2020-03-01'):
            drift, vol = 0.001, 0.03
        # COVID crash (March 2020)
        elif date < pd.Timestamp('2020-04-01'):
            drift, vol = -0.02, 0.08
        # 2020-2021: Bull run to ATH
        elif date < pd.Timestamp('2021-11-01'):
            drift, vol = 0.003, 0.04
        # 2022: Crypto winter/bear market
        elif date < pd.Timestamp('2023-01-01'):
            drift, vol = -0.002, 0.045
        # 2023: Recovery
        else:
            drift, vol = 0.0015, 0.025
        
        ret = drift + vol * np.random.randn()
        price = max(prices[-1] * (1 + ret), 1000)  # Floor at $1000
        prices.append(price)
    
    df = pd.DataFrame({'ds': dates, 'price': prices})
    df['returns'] = df['price'].pct_change() * 100
    return df


def get_sunspot_data():
    """Simulated monthly sunspot numbers with 11-year Schwabe cycle (1990-2023)
    
    Mimics real sunspot behavior: asymmetric cycles, varying amplitudes.
    """
    np.random.seed(123)
    dates = pd.date_range('1990-01-01', '2023-12-01', freq='MS')
    n = len(dates)
    
    # 11-year (132 months) solar cycle
    cycle_period = 132
    sunspots = []
    
    for i in range(n):
        # Base cycle (asymmetric: rapid rise, slow decline)
        phase = (i % cycle_period) / cycle_period
        if phase < 0.4:  # Rising phase (40% of cycle)
            cycle_value = 150 * np.sin(np.pi * phase / 0.4) ** 1.5
        else:  # Declining phase (60% of cycle)
            cycle_value = 150 * np.sin(np.pi * (1 - (phase - 0.4) / 0.6)) ** 0.8
        
        # Add variations between cycles
        cycle_num = i // cycle_period
        cycle_amplitude = [1.0, 0.7, 1.2][cycle_num % 3]
        cycle_value *= cycle_amplitude
        
        # Add noise
        noise = np.random.exponential(10) * np.random.choice([-1, 1])
        sunspots.append(max(0, cycle_value + noise))
    
    return pd.DataFrame({'ds': dates, 'y': sunspots})


def get_unemployment_data():
    """Simulated US Unemployment Rate with COVID-19 structural break (2015-2023)
    
    Mimics real unemployment: pre-COVID decline, April 2020 spike to 14.7%, recovery.
    """
    np.random.seed(456)
    dates = pd.date_range('2015-01-01', '2023-12-01', freq='MS')
    n = len(dates)
    
    unemployment = []
    
    for i in range(n):
        date = dates[i]
        
        # Pre-COVID trend (declining unemployment)
        if date < pd.Timestamp('2020-03-01'):
            base = 5.5 - (i / 60) * 2  # From 5.5% to ~3.5%
            noise = np.random.normal(0, 0.1)
            unemployment.append(max(3.5, base + noise))
        # COVID spike (March-April 2020)
        elif date < pd.Timestamp('2020-05-01'):
            if date.month == 3:
                unemployment.append(4.4)
            else:
                unemployment.append(14.7)  # Peak
        # Rapid recovery (May-Dec 2020)
        elif date < pd.Timestamp('2021-01-01'):
            months_since_peak = (date.year - 2020) * 12 + date.month - 4
            rate = 14.7 - months_since_peak * 1.2
            unemployment.append(max(6.7, rate + np.random.normal(0, 0.2)))
        # Continued recovery (2021-2022)
        elif date < pd.Timestamp('2023-01-01'):
            months_since_2021 = (date.year - 2021) * 12 + date.month
            rate = 6.7 - months_since_2021 * 0.15
            unemployment.append(max(3.5, rate + np.random.normal(0, 0.1)))
        # Stabilization (2023)
        else:
            unemployment.append(3.5 + np.random.normal(0, 0.1))
    
    return pd.DataFrame({'ds': dates, 'y': unemployment})


print("Data loading functions defined (simulated data):")
print("- Bitcoin: Extreme volatility, regime changes")
print("- Sunspots: 11-year (132-month) seasonal cycle")
print("- US Unemployment: COVID-19 structural break")

---
# Case Study 1: Bitcoin Volatility Analysis

Cryptocurrency data is characterized by:
- **Extreme volatility** (much higher than traditional assets)
- **Regime changes** (bull markets, crypto winters)
- **Fat tails** (more extreme events than normal distribution)
- **Volatility clustering** (GARCH is essential)

In [None]:
# Load Bitcoin data
btc = get_bitcoin_data()

print("Bitcoin Data Overview (Simulated)")
print("="*50)
print(f"Period: {btc['ds'].min().date()} to {btc['ds'].max().date()}")
print(f"Observations: {len(btc)} days")
print(f"\nPrice Statistics:")
print(f"  Min: ${btc['price'].min():,.2f}")
print(f"  Max: ${btc['price'].max():,.2f}")
print(f"\nReturn Statistics (Daily %):")
print(f"  Mean: {btc['returns'].mean():.4f}%")
print(f"  Std: {btc['returns'].std():.4f}%")
print(f"  Skewness: {btc['returns'].skew():.4f}")
print(f"  Kurtosis: {btc['returns'].kurtosis():.4f}")
print(f"\nNote: High kurtosis indicates fat tails (extreme events)")

In [None]:
# Visualize Bitcoin prices and returns
fig, axes = plt.subplots(2, 1, figsize=(12, 8))

# Prices with regime annotations
axes[0].plot(btc['ds'], btc['price'], color=COLORS['blue'], linewidth=1)
axes[0].axvspan(pd.Timestamp('2020-03-01'), pd.Timestamp('2020-04-01'),
                alpha=0.3, color=COLORS['red'], label='COVID-19 Crash')
axes[0].axvspan(pd.Timestamp('2021-11-01'), pd.Timestamp('2022-12-31'),
                alpha=0.2, color=COLORS['orange'], label='Crypto Winter 2022')
axes[0].set_title('Bitcoin Daily Prices (2019-2024) - Simulated', fontweight='bold')
axes[0].set_ylabel('Price (USD)')
axes[0].set_yscale('log')  # Log scale for better visualization
axes[0].legend(loc='upper center', bbox_to_anchor=(0.5, -0.12), ncol=2, frameon=False)

# Returns
axes[1].plot(btc['ds'], btc['returns'], color=COLORS['green'], linewidth=0.5)
axes[1].axhline(y=0, color='black', linewidth=0.5, alpha=0.3)
axes[1].set_title('Bitcoin Daily Returns (%)', fontweight='bold')
axes[1].set_xlabel('Date')
axes[1].set_ylabel('Return (%)')

plt.tight_layout()
plt.subplots_adjust(hspace=0.35)
plt.show()

print("Key Observations:")
print("- COVID-19 crash (March 2020): ~50% drop in weeks")
print("- 2021 bull run to all-time highs (~$69K)")
print("- 2022 crypto winter: Extended decline")
print("- Clear volatility clustering: Large moves cluster together")

## Step 1: Stationarity Testing

In [None]:
def test_stationarity(series, name):
    """Run ADF and KPSS tests"""
    print(f"\nStationarity Tests for {name}")
    print("-" * 40)
    
    # ADF Test (H0: unit root exists = non-stationary)
    adf_result = adfuller(series.dropna(), autolag='AIC')
    print(f"ADF Test:")
    print(f"  Statistic: {adf_result[0]:.4f}")
    print(f"  p-value: {adf_result[1]:.4f}")
    print(f"  Conclusion: {'Stationary' if adf_result[1] < 0.05 else 'Non-stationary'}")
    
    # KPSS Test (H0: stationary)
    kpss_result = kpss(series.dropna(), regression='c', nlags='auto')
    print(f"\nKPSS Test:")
    print(f"  Statistic: {kpss_result[0]:.4f}")
    print(f"  p-value: {kpss_result[1]:.4f}")
    print(f"  Conclusion: {'Stationary' if kpss_result[1] > 0.05 else 'Non-stationary'}")
    
    return adf_result[1] < 0.05 and kpss_result[1] > 0.05

# Test prices (expect non-stationary)
prices_stationary = test_stationarity(btc['price'], 'Bitcoin Prices')

# Test returns (expect stationary)
returns_stationary = test_stationarity(btc['returns'].dropna(), 'Bitcoin Returns')

print("\n" + "="*50)
print("CONCLUSION: Use RETURNS for mean modeling (stationary)")
print("But volatility of returns is NOT constant -> GARCH needed!")

## Step 2: ACF/PACF Analysis

In [None]:
returns = btc['returns'].dropna()

fig, axes = plt.subplots(2, 2, figsize=(12, 8))

# ACF/PACF for returns (mean equation)
plot_acf(returns, ax=axes[0, 0], lags=30, alpha=0.05)
axes[0, 0].set_title('ACF: Returns (Mean)', fontweight='bold')

plot_pacf(returns, ax=axes[0, 1], lags=30, alpha=0.05, method='ywm')
axes[0, 1].set_title('PACF: Returns (Mean)', fontweight='bold')

# ACF/PACF for squared returns (volatility)
plot_acf(returns**2, ax=axes[1, 0], lags=30, alpha=0.05)
axes[1, 0].set_title('ACF: Squared Returns (Volatility)', fontweight='bold')

plot_pacf(returns**2, ax=axes[1, 1], lags=30, alpha=0.05, method='ywm')
axes[1, 1].set_title('PACF: Squared Returns (Volatility)', fontweight='bold')

plt.tight_layout()
plt.show()

print("Key Findings:")
print("- Returns: Weak autocorrelation (hard to predict mean)")
print("- Squared returns: STRONG persistence -> GARCH essential!")
print("- This is typical of financial returns: 'weak mean, strong variance'")

## Step 3: ARIMA-GARCH Model

**GARCH(1,1) Model:**
$$\sigma_t^2 = \omega + \alpha \epsilon_{t-1}^2 + \beta \sigma_{t-1}^2$$

Where:
- $\omega$ = constant (long-run variance component)
- $\alpha$ = ARCH effect (reaction to recent shock)
- $\beta$ = GARCH effect (persistence of volatility)
- $\alpha + \beta$ = persistence (how long shocks affect volatility)

In [None]:
if HAS_ARCH:
    # Fit GARCH(1,1) model with AR(1) mean
    print("Fitting GARCH(1,1) Model for Bitcoin")
    print("="*50)
    
    model = arch_model(returns, vol='Garch', p=1, q=1, mean='AR', lags=1)
    results = model.fit(disp='off')
    
    print(results.summary())
else:
    print("ARCH package not available. Install with: pip install arch")

In [None]:
if HAS_ARCH:
    # Plot conditional volatility
    fig, axes = plt.subplots(2, 1, figsize=(12, 8))
    
    # Returns with volatility bands
    cond_vol = results.conditional_volatility
    dates = btc['ds'][1:]
    
    axes[0].plot(dates, returns, color=COLORS['blue'], linewidth=0.5, alpha=0.7, label='Returns')
    axes[0].plot(dates, 2*cond_vol, color=COLORS['red'], linewidth=1, label='+2\u03c3')
    axes[0].plot(dates, -2*cond_vol, color=COLORS['red'], linewidth=1, label='-2\u03c3')
    axes[0].axhline(y=0, color='black', linewidth=0.5, alpha=0.3)
    axes[0].set_title('Bitcoin Returns with GARCH(1,1) Volatility Bands', fontweight='bold')
    axes[0].set_ylabel('Return (%)')
    axes[0].legend(loc='upper center', bbox_to_anchor=(0.5, -0.12), ncol=3, frameon=False)
    
    # Conditional volatility
    axes[1].fill_between(dates, 0, cond_vol, color=COLORS['orange'], alpha=0.7, label='Conditional Volatility')
    axes[1].axvspan(pd.Timestamp('2020-03-01'), pd.Timestamp('2020-04-30'),
                    alpha=0.3, color=COLORS['red'], label='COVID Crash')
    axes[1].set_title('GARCH(1,1) Conditional Volatility', fontweight='bold')
    axes[1].set_xlabel('Date')
    axes[1].set_ylabel('Volatility (\u03c3)')
    axes[1].legend(loc='upper center', bbox_to_anchor=(0.5, -0.15), ncol=2, frameon=False)
    
    plt.tight_layout()
    plt.subplots_adjust(hspace=0.35)
    plt.show()
    
    # Interpret parameters
    alpha = results.params.get('alpha[1]', 0)
    beta = results.params.get('beta[1]', 0)
    
    print("\nGARCH Model Interpretation:")
    print(f"- \u03b1 (ARCH): {alpha:.4f} - Reaction to recent shocks")
    print(f"- \u03b2 (GARCH): {beta:.4f} - Persistence of past volatility")
    print(f"- Persistence (\u03b1+\u03b2): {alpha + beta:.4f}")
    print("\nHigh persistence indicates volatility shocks are long-lasting.")
    if alpha + beta > 0.95:
        print("Warning: Very high persistence - consider IGARCH or long-memory models.")

---
# Case Study 2: Sunspot Cycle Analysis

Sunspots demonstrate:
- **Very long seasonality** (11-year Schwabe cycle = 132 months)
- **Asymmetric pattern** (rapid rise, slow decline)
- **Variable cycle amplitude**
- Challenge: Standard SARIMA cannot handle 132-month seasonality

In [None]:
# Load Sunspot data
sunspots = get_sunspot_data()

print("Sunspot Data Overview (Simulated)")
print("="*50)
print(f"Period: {sunspots['ds'].min().strftime('%Y-%m')} to {sunspots['ds'].max().strftime('%Y-%m')}")
print(f"Observations: {len(sunspots)} months")
print(f"\nStatistics:")
print(f"  Mean: {sunspots['y'].mean():.1f}")
print(f"  Std: {sunspots['y'].std():.1f}")
print(f"  Min: {sunspots['y'].min():.0f}")
print(f"  Max: {sunspots['y'].max():.0f}")
print(f"\nKey Feature: 11-year (132-month) Schwabe solar cycle")

In [None]:
# Visualize sunspot data
fig, axes = plt.subplots(2, 1, figsize=(12, 8))

# Time series
axes[0].plot(sunspots['ds'], sunspots['y'], color=COLORS['blue'], linewidth=1, label='Sunspot Count')
axes[0].set_title('Monthly Sunspot Numbers (1990-2023) - Simulated', fontweight='bold')
axes[0].set_ylabel('Sunspot Count')

# Mark solar cycles
for cycle_start in [1996, 2008, 2019]:
    axes[0].axvline(pd.Timestamp(f'{cycle_start}-01-01'), color=COLORS['red'], 
                    linestyle='--', alpha=0.5, linewidth=1)
axes[0].text(pd.Timestamp('1997-01-01'), sunspots['y'].max()*0.9, 'Cycle 23', fontsize=10)
axes[0].text(pd.Timestamp('2009-01-01'), sunspots['y'].max()*0.9, 'Cycle 24', fontsize=10)
axes[0].text(pd.Timestamp('2020-01-01'), sunspots['y'].max()*0.9, 'Cycle 25', fontsize=10)

# Rolling statistics
window = 24  # 2-year window
rolling_mean = sunspots['y'].rolling(window=window).mean()
rolling_std = sunspots['y'].rolling(window=window).std()

axes[1].plot(sunspots['ds'], rolling_mean, color=COLORS['green'], 
             linewidth=2, label=f'{window}-month Rolling Mean')
axes[1].fill_between(sunspots['ds'], rolling_mean - rolling_std, rolling_mean + rolling_std,
                     color=COLORS['green'], alpha=0.2, label='\u00b11 Std Dev')
axes[1].set_title('Rolling Statistics', fontweight='bold')
axes[1].set_xlabel('Date')
axes[1].set_ylabel('Sunspot Count')
axes[1].legend(loc='upper center', bbox_to_anchor=(0.5, -0.15), ncol=2, frameon=False)

plt.tight_layout()
plt.subplots_adjust(hspace=0.35)
plt.show()

print("Key Observations:")
print("- Clear 11-year periodicity (solar cycle)")
print("- Asymmetric shape: rapid rise, gradual decline")
print("- Cycle amplitudes vary (Cycle 24 was weaker)")

## Decomposition and Spectral Analysis

In [None]:
# Seasonal decomposition with period=132 (11 years)
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# Use custom period
sunspot_series = pd.Series(sunspots['y'].values, index=sunspots['ds'])

# FFT to find dominant period
from scipy.fft import fft, fftfreq

y = sunspots['y'].values - sunspots['y'].mean()
n = len(y)
yf = np.abs(fft(y))[:n//2]
xf = fftfreq(n, 1)[:n//2]  # Monthly frequency

# Convert to period in months
period_months = 1 / xf[1:]  # Skip zero frequency
power = yf[1:]

# Plot spectrum
axes[0, 0].plot(sunspots['ds'], sunspots['y'], color=COLORS['blue'], linewidth=0.8)
axes[0, 0].set_title('Original Sunspot Series', fontweight='bold')
axes[0, 0].set_ylabel('Sunspot Count')

axes[0, 1].plot(period_months[:200], power[:200], color=COLORS['orange'], linewidth=1, label='Power Spectrum')
axes[0, 1].axvline(x=132, color=COLORS['red'], linestyle='--', 
                   label='11-year cycle (132 months)')
axes[0, 1].set_title('Power Spectrum (Periodogram)', fontweight='bold')
axes[0, 1].set_xlabel('Period (months)')
axes[0, 1].set_ylabel('Spectral Power')
axes[0, 1].set_xlim([0, 200])
axes[0, 1].legend(loc='upper center', bbox_to_anchor=(0.5, -0.15), ncol=2, frameon=False)

# ACF showing long cycle
plot_acf(sunspots['y'], ax=axes[1, 0], lags=150, alpha=0.05)
axes[1, 0].axvline(x=66, color=COLORS['red'], linestyle='--', alpha=0.5)
axes[1, 0].axvline(x=132, color=COLORS['red'], linestyle='--', alpha=0.5)
axes[1, 0].set_title('ACF: Shows 132-month Periodicity', fontweight='bold')

plot_pacf(sunspots['y'], ax=axes[1, 1], lags=50, alpha=0.05, method='ywm')
axes[1, 1].set_title('PACF', fontweight='bold')

plt.tight_layout()
plt.show()

print("Spectral Analysis confirms:")
print("- Dominant period around 132 months (11 years)")
print("- ACF shows peak correlation at lag 132")
print("- Challenge: Standard SARIMA can't handle such long seasonality")

## Modeling Long Seasonality with Fourier Terms

For very long seasonal periods (132 months), we use **Fourier terms** instead of traditional SARIMA:

$$y_t = \sum_{k=1}^{K} \left[ a_k \sin\left(\frac{2\pi k t}{m}\right) + b_k \cos\left(\frac{2\pi k t}{m}\right) \right] + \epsilon_t$$

Where $m$ = 132 (seasonal period) and $K$ = number of harmonics

In [None]:
def add_fourier_terms(df, period, K):
    """Add Fourier terms for seasonality"""
    t = np.arange(len(df))
    for k in range(1, K + 1):
        df[f'sin_{k}'] = np.sin(2 * np.pi * k * t / period)
        df[f'cos_{k}'] = np.cos(2 * np.pi * k * t / period)
    return df

# Prepare data with Fourier terms
sunspot_model = sunspots.copy()
sunspot_model = add_fourier_terms(sunspot_model, period=132, K=3)  # 3 harmonics

# Train/test split
train_size = len(sunspots) - 36  # Hold out last 3 years
train = sunspot_model.iloc[:train_size]
test = sunspot_model.iloc[train_size:]

print(f"Training: {train_size} months")
print(f"Testing: {len(test)} months")
print(f"\nFourier terms added: sin_1, cos_1, sin_2, cos_2, sin_3, cos_3")

In [None]:
# Fit ARIMA with Fourier regressors
exog_cols = ['sin_1', 'cos_1', 'sin_2', 'cos_2', 'sin_3', 'cos_3']

print("Fitting ARIMA(2,0,1) with Fourier Regressors")
print("="*50)

model = SARIMAX(train['y'], 
                exog=train[exog_cols],
                order=(2, 0, 1),
                enforce_stationarity=False,
                enforce_invertibility=False)
results = model.fit(disp=False)

print(results.summary().tables[1])

In [None]:
# Forecast
forecast = results.get_forecast(steps=len(test), exog=test[exog_cols])
forecast_mean = forecast.predicted_mean
forecast_ci = forecast.conf_int()

# Metrics
rmse = np.sqrt(mean_squared_error(test['y'], forecast_mean))
mape = np.mean(np.abs((test['y'].values - forecast_mean.values) / (test['y'].values + 1))) * 100

print(f"\nForecast Performance:")
print(f"  RMSE: {rmse:.2f}")
print(f"  MAPE: {mape:.2f}%")

# Plot forecast
fig, ax = plt.subplots(figsize=(12, 6))

ax.plot(train['ds'], train['y'], color=COLORS['blue'], linewidth=1, label='Training')
ax.plot(test['ds'], test['y'], color=COLORS['blue'], linewidth=1.5, label='Actual')
ax.plot(test['ds'], forecast_mean, color=COLORS['red'], linewidth=1.5, 
        linestyle='--', label=f'Forecast (RMSE={rmse:.1f})')
ax.fill_between(test['ds'], forecast_ci.iloc[:, 0], forecast_ci.iloc[:, 1],
                color=COLORS['red'], alpha=0.2, label='95% CI')
ax.axvline(x=train['ds'].iloc[-1], color='black', linestyle=':', alpha=0.5)

ax.set_title('Sunspot Forecast: ARIMA with Fourier Terms', fontweight='bold')
ax.set_xlabel('Date')
ax.set_ylabel('Sunspot Count')
ax.legend(loc='upper center', bbox_to_anchor=(0.5, -0.12), ncol=4, frameon=False)

plt.tight_layout()
plt.show()

print("\nNote: Fourier terms capture the 11-year cycle effectively")
print("without needing SARIMA(p,d,q)(P,D,Q)132 which would be computationally infeasible.")

---
# Case Study 3: US Unemployment with Structural Break

The COVID-19 pandemic created an unprecedented structural break:
- Pre-COVID: 3.5% (50-year low)
- Peak: 14.7% (April 2020)
- Recovery: Back to ~3.5% by 2023

This challenges traditional time series models.

In [None]:
# Load unemployment data
unemp = get_unemployment_data()

print("US Unemployment Data Overview (Simulated)")
print("="*50)
print(f"Period: {unemp['ds'].min().strftime('%Y-%m')} to {unemp['ds'].max().strftime('%Y-%m')}")
print(f"Observations: {len(unemp)} months")
print(f"\nKey Statistics:")
print(f"  Pre-COVID Min: {unemp[unemp['ds'] < '2020-03-01']['y'].min():.1f}% (Feb 2020)")
print(f"  COVID Peak: {unemp['y'].max():.1f}% (April 2020)")
print(f"  Latest: {unemp['y'].iloc[-1]:.1f}%")
print(f"\nStructural Break: March-April 2020")

In [None]:
# Visualize unemployment with COVID impact
fig, axes = plt.subplots(2, 1, figsize=(12, 8))

# Main plot
axes[0].plot(unemp['ds'], unemp['y'], color=COLORS['blue'], linewidth=1.5, label='Unemployment Rate')
axes[0].axvspan(pd.Timestamp('2020-03-01'), pd.Timestamp('2020-05-01'),
                alpha=0.3, color=COLORS['red'], label='COVID-19 Shock')

# Pre-COVID trend
pre_covid = unemp[unemp['ds'] < '2020-03-01']
z1 = np.polyfit(range(len(pre_covid)), pre_covid['y'], 1)
trend_pre = np.polyval(z1, range(len(pre_covid)))
axes[0].plot(pre_covid['ds'], trend_pre, color=COLORS['gray'], linewidth=2, 
             linestyle='--', label='Pre-COVID Trend')

# Extend pre-COVID trend (counterfactual)
future_len = len(unemp) - len(pre_covid)
trend_counter = np.polyval(z1, range(len(pre_covid), len(unemp)))
axes[0].plot(unemp['ds'][len(pre_covid):], trend_counter, color=COLORS['gray'], 
             linewidth=1, linestyle=':', alpha=0.5, label='Counterfactual')

axes[0].set_title('US Unemployment Rate (2015-2023): COVID-19 Structural Break - Simulated', fontweight='bold')
axes[0].set_ylabel('Unemployment Rate (%)')
axes[0].set_ylim([0, 16])
axes[0].legend(loc='upper center', bbox_to_anchor=(0.5, -0.12), ncol=4, frameon=False)

# Month-over-month change
unemp_change = unemp['y'].diff()
colors_bar = [COLORS['green'] if x < 0 else COLORS['red'] for x in unemp_change]
axes[1].bar(unemp['ds'], unemp_change, color=colors_bar, width=20, alpha=0.7)
axes[1].axhline(y=0, color='black', linewidth=0.5)
axes[1].set_title('Month-over-Month Change', fontweight='bold')
axes[1].set_xlabel('Date')
axes[1].set_ylabel('Change (pp)')

plt.tight_layout()
plt.subplots_adjust(hspace=0.35)
plt.show()

print("Key Observations:")
print("- April 2020: Largest single-month increase in history (+10.3pp)")
print("- Recovery was rapid but took ~3 years to return to pre-COVID levels")
print("- Traditional ARIMA would struggle with such an extreme outlier")

## Prophet Model for Structural Breaks

Prophet handles structural breaks through **changepoint detection**:
- Automatically identifies where the trend changes
- Adjustable `changepoint_prior_scale` controls flexibility
- Can add known changepoints (like COVID)

In [None]:
if HAS_PROPHET:
    # Prepare data for Prophet
    unemp_prophet = unemp.copy()
    
    # Train/test split (hold out last 12 months)
    train_unemp = unemp_prophet.iloc[:-12]
    test_unemp = unemp_prophet.iloc[-12:]
    
    print("Fitting Prophet Model with Changepoint Detection")
    print("="*50)
    
    # Model with flexible changepoints and specified COVID changepoint
    prophet_model = Prophet(
        changepoint_prior_scale=0.5,  # More flexible for extreme changes
        yearly_seasonality=False,
        weekly_seasonality=False,
        daily_seasonality=False,
        changepoints=['2020-03-01', '2020-04-01', '2020-06-01']  # COVID events
    )
    prophet_model.fit(train_unemp)
    
    # Forecast
    future = prophet_model.make_future_dataframe(periods=12, freq='MS')
    forecast = prophet_model.predict(future)
    
    # Extract test predictions
    pred_test = forecast['yhat'].iloc[-12:].values
    
    # Metrics
    rmse = np.sqrt(mean_squared_error(test_unemp['y'], pred_test))
    mape = np.mean(np.abs((test_unemp['y'].values - pred_test) / test_unemp['y'].values)) * 100
    
    print(f"\nProphet Forecast Performance:")
    print(f"  RMSE: {rmse:.2f}")
    print(f"  MAPE: {mape:.2f}%")
else:
    print("Prophet not available. Install with: pip install prophet")

In [None]:
if HAS_PROPHET:
    # Plot Prophet results
    fig, axes = plt.subplots(2, 1, figsize=(12, 8))
    
    # Forecast plot
    axes[0].plot(unemp['ds'], unemp['y'], color=COLORS['blue'], linewidth=1.5, label='Actual')
    axes[0].plot(forecast['ds'], forecast['yhat'], color=COLORS['orange'], 
                 linewidth=1.5, linestyle='--', label='Prophet Forecast')
    axes[0].fill_between(forecast['ds'], forecast['yhat_lower'], forecast['yhat_upper'],
                        color=COLORS['orange'], alpha=0.2, label='95% CI')
    axes[0].axvline(x=train_unemp['ds'].iloc[-1], color='black', linestyle=':', alpha=0.5)
    
    # Mark changepoints
    for i, cp in enumerate(prophet_model.changepoints):
        if i == 0:
            axes[0].axvline(x=cp, color=COLORS['red'], linestyle='--', alpha=0.5, linewidth=1, label='Changepoints')
        else:
            axes[0].axvline(x=cp, color=COLORS['red'], linestyle='--', alpha=0.5, linewidth=1)
    
    axes[0].set_title('Prophet Model with Changepoint Detection', fontweight='bold')
    axes[0].set_ylabel('Unemployment Rate (%)')
    axes[0].legend(loc='upper center', bbox_to_anchor=(0.5, -0.12), ncol=4, frameon=False)
    
    # Trend component
    axes[1].plot(forecast['ds'], forecast['trend'], color=COLORS['green'], linewidth=2, label='Trend')
    for i, cp in enumerate(prophet_model.changepoints):
        if i == 0:
            axes[1].axvline(x=cp, color=COLORS['red'], linestyle='--', alpha=0.5, linewidth=1, label='Changepoints')
        else:
            axes[1].axvline(x=cp, color=COLORS['red'], linestyle='--', alpha=0.5, linewidth=1)
    axes[1].set_title('Prophet Trend Component (with Changepoints)', fontweight='bold')
    axes[1].set_xlabel('Date')
    axes[1].set_ylabel('Trend')
    axes[1].legend(loc='upper center', bbox_to_anchor=(0.5, -0.15), ncol=2, frameon=False)
    
    plt.tight_layout()
    plt.subplots_adjust(hspace=0.35)
    plt.show()
    
    print("\nProphet Changepoints detected:")
    for i, cp in enumerate(prophet_model.changepoints):
        print(f"  {i+1}. {cp.strftime('%Y-%m-%d')}")
    print("\nRed dashed lines show where Prophet detected trend changes.")

## ARIMA Comparison (Post-COVID only)

For comparison, let's fit ARIMA to post-COVID data only:

In [None]:
# Post-COVID data only (from June 2020)
post_covid = unemp[unemp['ds'] >= '2020-06-01'].copy()
post_covid_train = post_covid.iloc[:-12]
post_covid_test = post_covid.iloc[-12:]

print("Fitting ARIMA to Post-COVID Data Only")
print("="*50)

arima_model = ARIMA(post_covid_train['y'], order=(2, 1, 1))
arima_results = arima_model.fit()

# Forecast
arima_forecast = arima_results.get_forecast(steps=12)
arima_pred = arima_forecast.predicted_mean
arima_ci = arima_forecast.conf_int()

# Metrics
arima_rmse = np.sqrt(mean_squared_error(post_covid_test['y'], arima_pred))
arima_mape = np.mean(np.abs((post_covid_test['y'].values - arima_pred.values) / post_covid_test['y'].values)) * 100

print(f"\nARIMA(2,1,1) Performance (Post-COVID):")
print(f"  RMSE: {arima_rmse:.2f}")
print(f"  MAPE: {arima_mape:.2f}%")

In [None]:
# Compare Prophet vs ARIMA
fig, ax = plt.subplots(figsize=(12, 6))

ax.plot(unemp['ds'], unemp['y'], color=COLORS['blue'], linewidth=1.5, label='Actual')

if HAS_PROPHET:
    ax.plot(test_unemp['ds'], pred_test, color=COLORS['orange'], linewidth=2,
            linestyle='--', label=f'Prophet (RMSE={rmse:.2f})')

ax.plot(post_covid_test['ds'], arima_pred, color=COLORS['green'], linewidth=2,
        linestyle=':', label=f'ARIMA Post-COVID (RMSE={arima_rmse:.2f})')

ax.axvline(x=train_unemp['ds'].iloc[-1], color='black', linestyle=':', alpha=0.5)
ax.axvspan(pd.Timestamp('2020-03-01'), pd.Timestamp('2020-05-01'),
           alpha=0.2, color=COLORS['red'], label='COVID Shock')

ax.set_title('Model Comparison: US Unemployment Forecast', fontweight='bold')
ax.set_xlabel('Date')
ax.set_ylabel('Unemployment Rate (%)')
ax.legend(loc='upper center', bbox_to_anchor=(0.5, -0.12), ncol=4, frameon=False)

plt.tight_layout()
plt.show()

print("\nComparison:")
print("- Prophet handles the full series including structural break")
print("- ARIMA requires careful selection of training period")
print("- For extreme events, domain knowledge + flexible models are key")

---
# Summary: Model Selection Guide

## Decision Flowchart

```
Time Series Data
    |
    v
[Is data stationary?] --No--> Apply differencing/transformations
    |
    Yes
    v
[Financial returns?] --Yes--> ARIMA-GARCH
    |
    No
    v
[Seasonality present?]
    |         |
   Yes        No
    |         |
    v         v
[Long season?]   ARIMA
(>12 periods)
    |    |
   Yes   No
    |    |
    v    v
Fourier  SARIMA
Terms
    |
    v
[Structural breaks?] --Yes--> Prophet/Piecewise models
    |
    No
    v
[Multiple seasonality?] --Yes--> Prophet/TBATS
    |
    No
    v
Select based on AIC/BIC and out-of-sample performance
```

In [None]:
print("""
\u2554\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2557
\u2551                    TIME SERIES MODEL SELECTION GUIDE                  \u2551
\u2560\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2563
\u2551                                                                      \u2551
\u2551  DATA TYPE              RECOMMENDED MODEL         SPECIAL CASE       \u2551
\u2551  \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500  \u2551
\u2551                                                                      \u2551
\u2551  Crypto/Financial       ARIMA-GARCH               Extreme volatility \u2551
\u2551  (Bitcoin, stocks)      Focus on volatility       clustering         \u2551
\u2551                                                                      \u2551
\u2551  Long seasonality       ARIMA + Fourier           Period > 12        \u2551
\u2551  (Sunspots, climate)    terms as regressors       (computationally   \u2551
\u2551                                                   feasible)          \u2551
\u2551                                                                      \u2551
\u2551  Structural breaks      Prophet with              Changepoint        \u2551
\u2551  (COVID, crises)        changepoints              detection          \u2551
\u2551                                                                      \u2551
\u2551  Standard seasonality   SARIMA                    Period \u2264 12        \u2551
\u2551  (monthly, quarterly)                                                \u2551
\u2551                                                                      \u2551
\u2551  Multiple seasonality   Prophet, TBATS            Daily + weekly +   \u2551
\u2551  (hourly data)                                    annual             \u2551
\u2551                                                                      \u2551
\u2560\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2563
\u2551                         WORKFLOW STEPS                               \u2551
\u2551  1. Visualize and explore the data                                   \u2551
\u2551  2. Test for stationarity (ADF, KPSS)                                \u2551
\u2551  3. Apply transformations if needed (log, diff)                      \u2551
\u2551  4. Identify patterns (ACF/PACF, decomposition)                      \u2551
\u2551  5. Fit candidate models                                             \u2551
\u2551  6. Check diagnostics (residuals, Ljung-Box)                         \u2551
\u2551  7. Compare with out-of-sample forecast                              \u2551
\u2551  8. Select best model for the task                                   \u2551
\u255a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u255d
""")

In [None]:
# Final summary table
print("\nChapter 10 Case Studies Summary")
print("="*75)
print(f"{'Dataset':<20} {'Challenge':<25} {'Best Model':<20} {'Key Insight'}")
print("-"*75)
print(f"{'Bitcoin':<20} {'Extreme volatility':<25} {'ARIMA-GARCH(1,1)':<20} {'Focus on volatility, not mean'}")
print(f"{'Sunspots':<20} {'132-month cycle':<25} {'ARIMA + Fourier':<20} {'Long seasonality handling'}")
print(f"{'US Unemployment':<20} {'COVID structural break':<25} {'Prophet':<20} {'Changepoint detection'}")
print("-"*75)

print("\n" + "="*75)
print("KEY TAKEAWAYS")
print("="*75)
print("""
1. VOLATILITY MODELING (Bitcoin)
   - Financial returns show weak mean predictability but strong volatility patterns
   - GARCH captures volatility clustering: large moves followed by large moves
   - Always examine squared returns for ARCH effects

2. LONG SEASONALITY (Sunspots)
   - When seasonal period > 12, traditional SARIMA is computationally infeasible
   - Fourier terms as regressors provide an elegant solution
   - Use spectral analysis to identify the true period

3. STRUCTURAL BREAKS (Unemployment)
   - Extreme events (COVID, financial crises) require flexible models
   - Prophet's changepoint detection handles regime changes
   - Consider using post-break data only for ARIMA

4. GENERAL PRINCIPLES
   - Always start with visualization and stationarity testing
   - Match the model to the data characteristics
   - Validate with out-of-sample forecasting
   - Simple models often work better than complex ones
""")

---

## Practice Exercises

1. **Bitcoin Analysis**: Try fitting EGARCH or GJR-GARCH to capture asymmetric volatility (negative returns increase volatility more than positive returns).

2. **Sunspot Forecasting**: Experiment with different numbers of Fourier harmonics (K=1, 2, 4, 6). How does it affect forecast accuracy?

3. **Unemployment Modeling**: Add COVID-19 as a regressor (dummy variable) to an ARIMA model. Compare with Prophet.

4. **Model Diagnostics**: For each case study, examine the residuals using Ljung-Box test and QQ-plots.

---

**End of Chapter 10: Comprehensive Review**

*Course: Time Series Analysis and Forecasting*  
*Bucharest University of Economic Studies*