# Week 10: Time Series - ARIMA, VAR, GARCH

## ðŸŽ¯ Learning Objectives

By the end of this week, you will understand:
- **ARIMA**: Autoregressive Integrated Moving Average
- **VAR**: Vector Autoregression for multiple series
- **GARCH**: Volatility modeling
- **Stationarity**: Why it matters and how to test

---

## Why Time Series in Finance?

Financial data is inherently temporal:
- Past prices influence future prices
- Volatility clusters (high vol follows high vol)
- Serial correlation in returns
- Cross-asset dependencies

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.stattools import adfuller, acf, pacf
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from arch import arch_model
import warnings
warnings.filterwarnings('ignore')

np.random.seed(42)
print("âœ… Libraries loaded!")
print("ðŸ“š Week 10: Time Series Theory")

---

## Part 1: Stationarity

### Definition

A time series is **stationary** if its statistical properties don't change over time:
- Constant mean: $E[Y_t] = \mu$
- Constant variance: $Var(Y_t) = \sigma^2$
- Autocovariance depends only on lag: $Cov(Y_t, Y_{t-k}) = \gamma_k$

### Why It Matters

Most time series models assume stationarity. Non-stationary data leads to:
- Spurious correlations
- Unreliable forecasts
- Invalid statistical tests

### Testing: ADF Test

Null hypothesis: Series has a unit root (non-stationary)

If p-value < 0.05 â†’ Reject null â†’ Stationary

In [None]:
# Generate price series and returns
n = 1000
returns = np.random.normal(0.0002, 0.01, n)
prices = 100 * np.cumprod(1 + returns)

# ADF Test
def adf_test(series, name):
    result = adfuller(series.dropna())
    print(f"{name}:")
    print(f"  ADF Statistic: {result[0]:.4f}")
    print(f"  p-value: {result[1]:.4f}")
    print(f"  Stationary: {'Yes' if result[1] < 0.05 else 'No'}")

print("Stationarity Tests")
print("="*50)
adf_test(pd.Series(prices), "Prices")
print()
adf_test(pd.Series(returns), "Returns")

# Visualize
fig, axes = plt.subplots(2, 1, figsize=(12, 6))
axes[0].plot(prices)
axes[0].set_title('Prices (Non-Stationary)')
axes[1].plot(returns)
axes[1].set_title('Returns (Stationary)')
plt.tight_layout()
plt.show()

---

## Part 2: ARIMA Models

### Components

**ARIMA(p, d, q)**:
- **AR(p)**: Autoregressive - depends on past p values
- **I(d)**: Integrated - differencing to achieve stationarity
- **MA(q)**: Moving Average - depends on past q errors

### The Math

$$Y_t = c + \sum_{i=1}^{p}\phi_i Y_{t-i} + \sum_{j=1}^{q}\theta_j \epsilon_{t-j} + \epsilon_t$$

### ðŸ¤” Simple Explanation

- **AR**: "Today's return depends on yesterday's return"
- **MA**: "Today's return depends on yesterday's unexpected shock"
- **I**: "Take differences until stationary"

In [None]:
# ACF and PACF to determine order
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
plot_acf(returns, lags=20, ax=axes[0])
axes[0].set_title('ACF')
plot_pacf(returns, lags=20, ax=axes[1])
axes[1].set_title('PACF')
plt.tight_layout()
plt.show()

print("Interpretation:")
print("- ACF decays slowly â†’ AR component")
print("- PACF cuts off at lag k â†’ AR(k)")

In [None]:
# Fit ARIMA model
# For returns, d=0 (already stationary)
model = ARIMA(returns, order=(1, 0, 1))
fit = model.fit()

print("ARIMA(1,0,1) Results")
print("="*50)
print(fit.summary().tables[1])

# Forecast
forecast = fit.forecast(steps=5)
print(f"\n5-step forecast: {forecast.values}")

---

## Part 3: GARCH - Volatility Modeling

### The Problem

Returns may be uncorrelated, but squared returns are correlated â†’ Volatility clusters!

### GARCH(p, q) Model

$$\sigma_t^2 = \omega + \sum_{i=1}^{p}\alpha_i \epsilon_{t-i}^2 + \sum_{j=1}^{q}\beta_j \sigma_{t-j}^2$$

Where:
- $\omega$: Long-run variance weight
- $\alpha_i$: ARCH terms (past shocks)
- $\beta_j$: GARCH terms (past variance)

### ðŸ¤” Simple Explanation

GARCH says: "Tomorrow's volatility depends on today's shock AND today's volatility." This captures the "volatility clustering" we see in markets.

In [None]:
# Simulate GARCH-like returns
n = 2000
omega = 0.00001
alpha = 0.1
beta = 0.85

sigma2 = np.zeros(n)
returns_garch = np.zeros(n)
sigma2[0] = omega / (1 - alpha - beta)  # Unconditional variance

for t in range(1, n):
    sigma2[t] = omega + alpha * returns_garch[t-1]**2 + beta * sigma2[t-1]
    returns_garch[t] = np.sqrt(sigma2[t]) * np.random.randn()

# Fit GARCH model
garch_model = arch_model(returns_garch * 100, vol='Garch', p=1, q=1)
garch_fit = garch_model.fit(disp='off')

print("GARCH(1,1) Results")
print("="*50)
print(f"omega: {garch_fit.params['omega']:.6f}")
print(f"alpha: {garch_fit.params['alpha[1]']:.4f} (true: {alpha})")
print(f"beta:  {garch_fit.params['beta[1]']:.4f} (true: {beta})")
print(f"\nPersistence (Î±+Î²): {garch_fit.params['alpha[1]'] + garch_fit.params['beta[1]']:.4f}")

In [None]:
# Visualize volatility clustering
fig, axes = plt.subplots(3, 1, figsize=(12, 8), sharex=True)

axes[0].plot(returns_garch)
axes[0].set_ylabel('Returns')
axes[0].set_title('Returns with Volatility Clustering')

axes[1].plot(np.sqrt(sigma2), label='True Vol')
axes[1].plot(garch_fit.conditional_volatility / 100, alpha=0.7, label='Fitted Vol')
axes[1].set_ylabel('Volatility')
axes[1].set_title('Conditional Volatility')
axes[1].legend()

axes[2].plot(returns_garch**2, alpha=0.5)
axes[2].set_ylabel('Squared Returns')
axes[2].set_xlabel('Time')
axes[2].set_title('Squared Returns (Volatility Proxy)')

plt.tight_layout()
plt.show()

---

## Part 4: VAR - Vector Autoregression

### Multi-Asset Dynamics

VAR models multiple time series together, capturing cross-dependencies:

$$Y_t = c + \sum_{i=1}^{p}A_i Y_{t-i} + \epsilon_t$$

Where $Y_t$ is a vector of variables.

### Finance Applications

- Lead-lag relationships between assets
- Spillover effects
- Granger causality testing

In [None]:
from statsmodels.tsa.api import VAR

# Simulate related assets
n = 500
stock_returns = np.zeros(n)
bond_returns = np.zeros(n)

# Stock leads bond by 1 day
for t in range(1, n):
    stock_returns[t] = 0.05 * stock_returns[t-1] + np.random.randn() * 0.01
    bond_returns[t] = -0.2 * stock_returns[t-1] + 0.1 * bond_returns[t-1] + np.random.randn() * 0.005

# Fit VAR
data = pd.DataFrame({'Stock': stock_returns, 'Bond': bond_returns})
var_model = VAR(data)
var_fit = var_model.fit(maxlags=2)

print("VAR Results")
print("="*50)
print(var_fit.summary().tables[1])  # Stock equation

In [None]:
# Granger Causality Test
from statsmodels.tsa.stattools import grangercausalitytests

print("\nGranger Causality: Stock â†’ Bond")
print("="*50)
result = grangercausalitytests(data[['Bond', 'Stock']], maxlag=2, verbose=False)
for lag, test in result.items():
    p_val = test[0]['ssr_ftest'][1]
    print(f"Lag {lag}: p-value = {p_val:.4f} {'*' if p_val < 0.05 else ''}")

print("\n* indicates Stock Granger-causes Bond at 5% level")

---

## Interview Questions

### Conceptual
1. Why must time series be stationary for ARIMA?
2. What does "volatility clustering" mean?
3. How do you interpret Granger causality?

### Technical
1. How do you determine the order (p, d, q) for ARIMA?
2. What's the difference between ARCH and GARCH?
3. When would you use VAR vs. separate univariate models?

### Finance-Specific
1. Can you use ARIMA to predict stock prices? Why or why not?
2. How would you use GARCH for VaR estimation?
3. What lead-lag relationships might exist in markets?

---

## Key Takeaways

| Model | Purpose | Input | Output |
|-------|---------|-------|--------|
| ARIMA | Return forecasting | Univariate | Point forecast |
| GARCH | Volatility modeling | Returns | Conditional vol |
| VAR | Multi-asset dynamics | Multiple series | Vector forecast |