# ARCH Models

## What is ARCH?

ARCH, which stands for Autoregressive Conditional Heteroskedasticity, is a model for forecasting the variance of a time series that is conditional on past errors. It was introduced by Robert Engle in 1982, for which he won the Nobel Prize in Economics. ARCH models are used particularly in financial time series analysis, where the goal is to model and predict future volatility based on past patterns of volatility. An ARCH model predicts the variance at time t as a function of the squares of the previous periods’ errors up to q lags:

$$
\sigma_t^2 = \alpha_0 + \alpha_1 \epsilon_{t-1}^2 + \alpha_2 \epsilon_{t-2}^2 + ... + \alpha_q \epsilon_{t-q}^2
$$

Where $ \sigma_t^2 $ is the conditional variance (a proxy for volatility), $ \epsilon $ are the residuals from a mean equation, $ \alpha_0 $ is a constant term, and $ \alpha_1 $, ... ,$ \alpha_q $ are the weights for the squared residuals. These models are particularly useful when the error terms have a tendency to cluster, indicating that high or low volatility in one period correlates with similar levels of volatility in subsequent periods.

Unlike models that predict the actual values in a series, ARCH focuses on the variance of the series, forecasting periods of expected volatility — crucial for risk management and option pricing in financial markets.

## What is Volatility?

- Autoregressive models can be developed for univariate time-series data that is stationary (AR), has a trend (ARIMA), and has a seasonal component (SARIMA). But, these Autoregressive models do not model is a change in the variance over time.

- The error terms in the stochastic processes generating the time series were homoscedastic, i.e. with constant variance.

- There are some time series where the variance changes consistently over time. In the context of a time series in the financial domain, this would be called **increasing and decreasing volatility**.

- High volatility indicates that the price of the security can change dramatically over a short time period in either direction, which signifies a higher risk. Conversely, low volatility implies that a security's value does not fluctuate dramatically and tends to be more stable.

## What is Heteroskedasticity?


Heteroskedasticity occurs when the variance of errors or the spread of residuals within a dataset is not consistent across all levels of an independent variable. In simple terms, it means that the size of the error differs across values of an independent variable. 

In the context of time series, heteroskedasticity often manifests as **volatility clustering** where periods of high volatility are followed by high volatility and periods of low volatility follow low volatility.

## Mathematical Foundation of ARCH

The mathematical representation of an ARCH model is fairly straightforward. At its core, the ARCH model posits that the current period's error variance can be expressed as a function of past period squared errors. 

Let's take the ARCH(1) model as an example:

$$
\sigma_t^2 = \alpha_0 + \alpha_1 \epsilon_{t-1}^2
$$

Where:

- $ \sigma_t^2 $ is the current period's variance
- $ \alpha_0 $ is a constant (which must be positive to ensure positive variance)
- $ \alpha_1 $ is the coefficient that measures the impact of the last period's squared error on current variance
- $ \epsilon_{t-1}^2 $ is the previous period's squared error

This model illustrates how past volatility influences current expectations. If, for instance, $ \alpha_1 $ is high, then a large error in the last period will significantly increase the variance forecast for the current period, indicating high volatility.

## Code Implementation

In [9]:
import numpy as np
import matplotlib.pyplot as plt
from arch import arch_model
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.statespace.sarimax import SARIMAX

# Set the random seed for reproducibility
np.random.seed(42)

# Define parameters for the ARCH process
omega = 0.2  # Baseline volatility
alpha = 0.5  # Reaction to past squared shocks

# Simulate some data for the example
T = 1000
e = np.random.randn(T)  # Random shocks
y = np.zeros_like(e)    # Initialize the series

# Generate an ARCH(1) process
for t in range(1, T):
    y[t] = np.sqrt(omega + alpha * y[t-1]**2) * e[t]

# Define the ARCH(1) model
arch1_model = arch_model(y, mean='Zero', vol='ARCH', p=1)

# Fit the ARCH model
arch1_results = arch1_model.fit(disp='off')

# Print the summary of the ARCH model fit
print("ARCH Model")
print(arch1_results.summary())

# Fit an ARMA model
arma_model = ARIMA(y, order=(1, 0, 1))
arma_results = arma_model.fit()

# Fit an ARIMA model
arima_model = ARIMA(y, order=(1, 1, 1))
arima_results = arima_model.fit()

# Fit a SARIMAX model
sarimax_model = SARIMAX(y, order=(1, 0, 1), seasonal_order=(1, 0, 1, 12))
sarimax_results = sarimax_model.fit(disp=False)

# Print the summary of the ARMA, ARIMA and SARIMAX model fits
print("\nARMA Model")
print(arma_results.summary())
print("\nARIMA Model")
print(arima_results.summary())
print("\nSARIMAX Model")
print(sarimax_results.summary())

# Compare Log Likelihood values
print("\nComparison of Log Likelihood:")
print(f"ARCH Log Likelihood: {arch1_results.loglikelihood}")
print(f"ARMA Log Likelihood: {arma_results.llf}")
print(f"ARIMA Log Likelihood: {arima_results.llf}")
print(f"SARIMAX Log Likelihood: {sarimax_results.llf}")

ARCH Model
                        Zero Mean - ARCH Model Results                        
Dep. Variable:                      y   R-squared:                       0.000
Mean Model:                 Zero Mean   Adj. R-squared:                  0.001
Vol Model:                       ARCH   Log-Likelihood:               -822.840
Distribution:                  Normal   AIC:                           1649.68
Method:            Maximum Likelihood   BIC:                           1659.50
                                        No. Observations:                 1000
Date:                Mon, Apr 15 2024   Df Residuals:                     1000
Time:                        15:29:48   Df Model:                            0
                            Volatility Model                            
                 coef    std err          t      P>|t|  95.0% Conf. Int.
------------------------------------------------------------------------
omega          0.1974  1.430e-02     13.806  2.340e-43 [  0

As can be seen from the log likelihood results and p values, model fits the data very well and it producess better results than the other models.

## Practical Considerations

In applying ARCH models to real-world data, there are several considerations:

- **Stationarity**: ARCH models should be applied to stationary series. Non-stationary data can be transformed using differencing or taking logarithms.
- **Model Order**: The order (q) of an ARCH model should be determined based on the autocorrelation of squared residuals.
- **Mean Equation**: Often, ARCH models are used in tandem with an ARMA model for the mean equation. This combined approach is known as ARMA-GARCH.

Some of the real-time examples where ARCH model(s) applied: Stock prices, oil prices, bond prices, inflation rates, GDP, unemployment rates, etc.


## Conclusion

ARCH models are essential in quantifying and forecasting the variance and volatility in time series data. They allow practitioners to make informed decisions on pricing and hedging financial instruments, thereby managing risks more effectively.