In [None]:
# === Environment Setup ===
!pip install yfinance arch
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf
from arch import arch_model
from IPython.display import display, Markdown

# --- Configuration ---
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams.update({'font.size': 14, 'figure.figsize': (12, 8), 'figure.dpi': 150})
np.set_printoptions(suppress=True, linewidth=120, precision=4)

# --- Utility Functions ---
def note(msg): display(Markdown(f"<div class='alert alert-info'>📝 {msg}</div>"))
def sec(title): print(f'\n{80*"="}\n| {title.upper()} |\n{80*"="}')

note("Environment initialized for Volatility Modeling (ARCH/GARCH).")

# Chapter 8.5: Volatility Modeling with ARCH and GARCH

---

### Table of Contents

1.  [**Introduction: The Stylized Facts of Financial Returns**](#intro)
2.  [**The ARCH Model: Modeling Conditional Heteroskedasticity**](#arch)
3.  [**The GARCH Model: A More Parsimonious Approach**](#garch)
4.  [**Case Study: Modeling S&P 500 Volatility**](#case-study)
5.  [**Exercises**](#exercises)
6.  [**Summary and Key Takeaways**](#summary)

<a id='intro'></a>
## 1. Introduction: The Stylized Facts of Financial Returns

While many economic time series are characterized by their conditional mean (their trend and serial correlation), financial asset returns are primarily characterized by their **conditional variance**, or **volatility**. The stylized facts of daily asset returns include:

- **Near-zero serial correlation:** The return on one day is not very predictive of the return on the next day.
- **Leptokurtosis (Fat Tails):** The distribution of returns has fatter tails than a normal distribution; extreme events are more common than a Gaussian model would suggest.
- **Volatility Clustering:** This is the most important stylized fact. It is the empirical observation that "large changes tend to be followed by large changes, of either sign, and small changes tend to be followed by small changes." In other words, volatility is persistent.

Standard time series models like ARMA assume constant variance (homoskedasticity) and cannot capture this volatility clustering. This chapter introduces the **Autoregressive Conditional Heteroskedasticity (ARCH)** and **Generalized ARCH (GARCH)** models, which are designed specifically to model this time-varying volatility.

In [None]:
sec("Visualizing Volatility Clustering in S&P 500 Returns")

note("Attempting to download S&P 500 data from yfinance.")
try:
    sp500 = yf.download('^GSPC', start='2000-01-01', end='2022-12-31', progress=False)
    if sp500.empty:
        raise ValueError("Downloaded data is empty.")
    note("Data downloaded successfully.")
except Exception as e:
    note(f"Could not download data from yfinance ({e}). Falling back to local CSV.")
    sp500 = pd.read_csv('data/SP500.csv', index_col='Date', parse_dates=True)

sp500_returns = 100 * sp500['Close'].pct_change().dropna()

plt.figure(figsize=(15, 7))
sp500_returns.plot()
plt.title('Daily Returns of S&P 500')
plt.ylabel('Percentage Return')
plt.show()
note("The plot clearly shows volatility clustering. Periods of high volatility (e.g., 2008 financial crisis, 2020 COVID-19 pandemic) are characterized by large price swings, while other periods are much calmer.")


### Visualizing Fat Tails (Leptokurtosis)

Another key stylized fact is that the distribution of financial returns is not normal; it exhibits **leptokurtosis**, meaning it has 'fatter tails'. Extreme positive or negative returns occur more frequently than a normal distribution would predict. We can see this by plotting a histogram of the returns against a normal distribution with the same mean and variance.

In [None]:
import scipy.stats as stats

sec("Visualizing the Fat-Tailed Distribution of Returns")

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))

# Histogram vs. Normal Distribution
mu, std = stats.norm.fit(sp500_returns)
ax1.hist(sp500_returns, bins=100, density=True, alpha=0.7, label='S&P 500 Returns')
xmin, xmax = ax1.get_xlim()
x = np.linspace(xmin, xmax, 100)
p = stats.norm.pdf(x, mu, std)
ax1.plot(x, p, 'k', linewidth=2, label='Normal Distribution')
ax1.set_title('Histogram of Returns vs. Normal Distribution')
ax1.legend()

# Quantile-Quantile (Q-Q) Plot
stats.probplot(sp500_returns, dist="norm", plot=ax2)
ax2.set_title('Q-Q Plot of S&P 500 Returns')

plt.show()
note("The histogram shows that the actual returns have a much higher peak and fatter tails than the normal distribution. The Q-Q plot confirms this: the data points deviate significantly from the red line at the tails, which indicates that extreme events are far more common than predicted by a normal distribution.")

<a id='arch'></a>
## 2. The ARCH Model: Modeling Conditional Heteroskedasticity

The **ARCH(q)** model, introduced in a groundbreaking 1982 paper by **Robert Engle** (who later received the Nobel Prize in 2003 for this work), was the first to formalize the idea of time-varying volatility. It models the **conditional variance** of the error term $\epsilon_t$ as a function of past squared error terms.

Let $y_t = \mu + \epsilon_t$. The ARCH(q) model specifies:
$$ \sigma_t^2 = Var(\epsilon_t | \mathcal{F}_{t-1}) = \alpha_0 + \alpha_1 \epsilon_{t-1}^2 + ... + \alpha_q \epsilon_{t-q}^2 $$

where $\mathcal{F}_{t-1}$ is the information set at time $t-1$. The conditional variance $\sigma_t^2$ is a weighted average of past squared shocks. If a large shock occurred yesterday (a large $\epsilon_{t-1}^2$), the conditional variance for today will be high, capturing the essence of volatility clustering. For the variance to be positive, we require $\alpha_i \ge 0$ for all $i$.

<a id='garch'></a>
## 3. The GARCH Model: A More Parsimonious Approach

In practice, ARCH models often require a large number of lags ($q$) to adequately capture the persistence of volatility. The **Generalized ARCH (GARCH)** model, developed by Tim Bollerslev, provides a more parsimonious and effective solution by including lagged values of the conditional variance itself in the equation.

The standard **GARCH(1,1)** model is:
$$ \sigma_t^2 = \alpha_0 + \alpha_1 \epsilon_{t-1}^2 + \beta_1 \sigma_{t-1}^2 $$

- The **ARCH term ($\\alpha_1 \epsilon_{t-1}^2$)**: Represents the news about volatility from the previous period.
- The **GARCH term ($\eta_1 \sigma_{t-1}^2$)**: Represents the persistence of volatility. A large $\beta_1$ means that volatility is slow to die out.

The sum $\alpha_1 + \beta_1$ measures the persistence of shocks to volatility. If the sum is close to 1, shocks are highly persistent. The GARCH(1,1) model is remarkably successful at capturing the volatility dynamics of most financial time series.

In [None]:
<a id='case-study'></a>
sec("Case Study: Modeling S&P 500 Volatility with GARCH")

# We use the returns data from before

# 1. Specify and fit a GARCH(1,1) model
note("Fitting a GARCH(1,1) model to the S&P 500 daily returns.")
# We assume a constant mean and a GARCH(1,1) process for the variance
garch_model = arch_model(sp500_returns, vol='Garch', p=1, q=1)
results = garch_model.fit(disp='off')
print(results.summary())

# 2. Plot the estimated conditional volatility
note("The model's estimate of conditional volatility clearly tracks the high-volatility periods.")
fig = results.plot(annualize='D')
fig.set_size_inches(14, 10)
plt.suptitle('GARCH(1,1) Model of S&P 500 Volatility', fontsize=16)
plt.tight_layout(rect=[0, 0, 1, 0.97])
plt.show()

# 3. Diagnostic Checking
note("A good model should leave behind standardized residuals that are i.i.d. Let's check.")
std_resid = results.std_resid
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))
ax1.plot(std_resid)
ax1.set_title('Standardized Residuals')
sm.graphics.tsa.plot_acf(std_resid, lags=40, ax=ax2)
plt.show()
note("The plot of the standardized residuals looks much more like white noise, and the ACF plot shows no significant autocorrelation. This suggests our GARCH model has successfully captured the volatility dynamics.")

### 4.1 Capturing Asymmetry: The GJR-GARCH Model

A well-known stylized fact is the **leverage effect**: negative returns tend to be followed by higher volatility than positive returns of the same magnitude. The standard GARCH model cannot capture this, as the variance equation depends on *squared* returns, making the effect of positive and negative shocks symmetric.

The **GJR-GARCH** model (named after Glosten, Jagannathan, and Runkle) extends the GARCH model to account for this asymmetry by adding a term for negative shocks:
$$ \sigma_t^2 = \alpha_0 + \alpha_1 \epsilon_{t-1}^2 + \gamma_1 I_{t-1} \epsilon_{t-1}^2 + \beta_1 \sigma_{t-1}^2 $$
where $I_{t-1}$ is an indicator variable that equals 1 if $\epsilon_{t-1} < 0$ and 0 otherwise. If $\gamma_1$ is positive and statistically significant, it means that negative shocks (bad news) have a larger impact on volatility than positive shocks (good news), thus capturing the leverage effect.

In [None]:
sec("Fitting a GJR-GARCH Model")

note("Fitting a GJR-GARCH(1,1) model to test for the leverage effect.")
# The 'o=1' argument includes the asymmetric term
gjr_garch_model = arch_model(sp500_returns, vol='Garch', p=1, o=1, q=1)
gjr_results = gjr_garch_model.fit(disp='off')
print(gjr_results.summary())

note("The coefficient on the asymmetric term, `gamma[1]`, is positive and highly statistically significant (p-value is close to zero). This provides strong evidence for the leverage effect in S&P 500 returns.")

<a id='exercises'></a>\n## 5. Exercises\n\n1.  **Volatility Persistence:** In the GARCH(1,1) results for the S&P 500, what is the sum of the `alpha[1]` and `beta[1]` coefficients? What does this value tell you about the persistence of volatility shocks in the stock market?\n2.  **Interpreting GJR-GARCH:** Look at the GJR-GARCH results. By how much does a negative shock increase the impact on next-period's conditional variance compared to a positive shock of the same size? (Hint: Compare the `alpha[1]` and `gamma[1]` coefficients).\n3.  **Alternative Distributions:** The `arch` library allows you to assume different distributions for the error term $\epsilon_t$, such as Student's t-distribution, to better capture fat tails. Refit the GJR-GARCH model assuming a Student's t-distribution (add `dist='t'` to the `arch_model` call). Does this improve the model's log-likelihood? \n4.  **Forecasting Volatility:** Explain how you would use a fitted GARCH(1,1) model to produce a one-step-ahead forecast for the conditional variance, $\hat{\sigma}_{t+1}^2$.

<a id='summary'></a>\n## 6. Summary and Key Takeaways\n\nThis chapter introduced ARCH and GARCH models, the standard tools for modeling and forecasting time-varying volatility, a key feature of financial time series.\n\n**Key Concepts**:\n- **Volatility Clustering**: The empirical observation that periods of high volatility and low volatility tend to be clustered together. This is the primary stylized fact that GARCH models are designed to capture.\n- **Conditional Heteroskedasticity**: The variance of the series is not constant, but changes over time depending on past information.\n- **ARCH(q) Model**: Models conditional variance as a weighted average of past squared error terms.\n- **GARCH(p,q) Model**: A more parsimonious and effective model that includes lagged conditional variances in its equation, allowing volatility shocks to be more persistent. The GARCH(1,1) is the most widely used variant.

### Solutions to Exercises\n\n---\n\n**1. Volatility Persistence:**\nThe sum is typically very close to 1 (usually > 0.95). For example, if $\alpha_1 + \beta_1 = 0.99$, this indicates that shocks to volatility are highly persistent and die out very slowly. A large shock today will still have a significant impact on the conditional variance many periods into the future.\n\n---\n\n**2. Leverage Effect:**\nNo, the standard GARCH(1,1) model does not capture the leverage effect. The variance equation depends on the *squared* error term, $\epsilon_{t-1}^2$. This means that the sign of the past return does not matter; a large positive return has the exact same impact on next-period's volatility as a large negative return of the same magnitude.\n\n---\n\n**3. GARCH Extensions:**\nThe **GJR-GARCH** model (named after Glosten, Jagannathan, and Runkle) adds an interaction term to the variance equation: 
$$ \sigma_t^2 = \alpha_0 + \alpha_1 \epsilon_{t-1}^2 + \gamma_1 I_{t-1} \epsilon_{t-1}^2 + \beta_1 \sigma_{t-1}^2 $$
where $I_{t-1}$ is an indicator variable that equals 1 if $\epsilon_{t-1} < 0$ and 0 otherwise. If $\gamma_1$ is positive and statistically significant, it means that negative shocks (bad news) have a larger impact on volatility than positive shocks (good news), thus capturing the leverage effect.\n\n---\n\n**4. Forecasting Volatility:**\nTo make a one-step-ahead forecast, you use the fitted GARCH(1,1) equation. You need two pieces of information from time $t$: the most recent squared residual, $\epsilon_t^2$, and the model's estimate of the conditional variance at time $t$, $\sigma_t^2$. The forecast for the next period's variance is then simply:
$$ \hat{\sigma}_{t+1}^2 = \hat{\alpha}_0 + \hat{\alpha}_1 \epsilon_t^2 + \hat{\beta}_1 \sigma_t^2 $$