## GARCH(p, q)
$$\sigma_t^2=\omega+\sum_{i=1}^{p}\alpha_i\epsilon_{t-1}^2+\sum_{i=1}^{q}\beta_i\sigma_{t-1}^2$$

## Import Relevant Packages

In [1]:
# Install arch library
!pip install arch

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting arch
  Downloading arch-5.3.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (907 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m908.0/908.0 KB[0m [31m20.1 MB/s[0m eta [36m0:00:00[0m
Collecting property-cached>=1.6.4
  Downloading property_cached-1.6.4-py2.py3-none-any.whl (7.8 kB)
Installing collected packages: property-cached, arch
Successfully installed arch-5.3.1 property-cached-1.6.4


In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.graphics.tsaplots as sgt
import statsmodels.tsa.stattools as sts
from statsmodels.tsa.arima.model import ARIMA
from scipy.stats.distributions import chi2
from math  import sqrt
import seaborn as sns
from google.colab import drive
import warnings
from statsmodels.tsa.statespace.sarimax import SARIMAX
from arch import arch_model

warnings.filterwarnings("ignore")
sns.set()

In [3]:
drive.mount("/content/drive")

Mounted at /content/drive


## Importing the Data and Pre-Processing

In [4]:
raw_csv_data = pd.read_csv("/content/drive/MyDrive/Formations/Time Series/Index2018.csv", index_col="date", parse_dates=True, dayfirst=True)
df_comp = raw_csv_data.copy()
df_comp = df_comp.asfreq("b")
df_comp = df_comp.fillna(method="ffill")
df_comp["market_value"] = df_comp.ftse

size = int(len(df_comp)*0.8)
df, df_test = df_comp.iloc[:size], df_comp[size:]

## LLR-Test

In [5]:
"""mod_1, mod_2 : models we want to compare
DF : degrees of freedom"""
def LLR_test(mod_1, mod_2, DF=1):
  L1 = mod_1.fit().llf 
  L2 = mod_2.fit().llf
  LR = 2*(L2-L1) 
  p = chi2.sf(LR, DF).round(3)
  return p 

## Creating returns

In [6]:
df["returns"] = df.market_value.pct_change(1)*100

## The Simple GARCH Model
- We get a better model
- Including past values as a form of baseline provides much greater accuracy.

In [8]:
model_garch_1_1 = arch_model(df.returns[1:], mean="Constant", vol="GARCH", p=1, q=1)
results_garch_1_1 = model_garch_1_1.fit(update_freq = 5)
print(results_garch_1_1.summary())

Iteration:      5,   Func. Count:     35,   Neg. LLF: 7010.712887007633
Iteration:     10,   Func. Count:     64,   Neg. LLF: 6970.058478413694
Optimization terminated successfully    (Exit mode 0)
            Current function value: 6970.058366189882
            Iterations: 13
            Function evaluations: 78
            Gradient evaluations: 13
                     Constant Mean - GARCH Model Results                      
Dep. Variable:                returns   R-squared:                       0.000
Mean Model:             Constant Mean   Adj. R-squared:                  0.000
Vol Model:                      GARCH   Log-Likelihood:               -6970.06
Distribution:                  Normal   AIC:                           13948.1
Method:            Maximum Likelihood   BIC:                           13974.2
                                        No. Observations:                 5020
Date:                Wed, Mar 29 2023   Df Residuals:                     5019
Time:          

## Higher-Lag GARCH Models
- No higher-order GARCH models outperform the GARCH(1, 1) when it comes to variance of market returns. This is due to the recursive nature in which past conditional variances are computed including one not only makes it redundant past squared residuals since it already captures the effect but also makes other conditional variances obsolete. From mathematical point of view, the effects of the conditional variance two days ago would be contained in the conditional variance yesterday, there would be no need to include more than one GARCH component. 
- We observe p-value of $1$ for **beta[2]**. This means we have a case of full multicollinearity due to relationship between conditional variances which we explained earlier.

In [9]:
model_garch_1_2 = arch_model(df.returns[1:], mean="Constant", vol="GARCH", p=1, q=2)
results_garch_1_2 = model_garch_1_2.fit(update_freq = 5)
print(results_garch_1_2.summary())

Iteration:      5,   Func. Count:     40,   Neg. LLF: 6974.173831538361
Iteration:     10,   Func. Count:     71,   Neg. LLF: 6970.058391826686
Optimization terminated successfully    (Exit mode 0)
            Current function value: 6970.05836622724
            Iterations: 12
            Function evaluations: 83
            Gradient evaluations: 12
                     Constant Mean - GARCH Model Results                      
Dep. Variable:                returns   R-squared:                       0.000
Mean Model:             Constant Mean   Adj. R-squared:                  0.000
Vol Model:                      GARCH   Log-Likelihood:               -6970.06
Distribution:                  Normal   AIC:                           13950.1
Method:            Maximum Likelihood   BIC:                           13982.7
                                        No. Observations:                 5020
Date:                Wed, Mar 29 2023   Df Residuals:                     5019
Time:           

In [10]:
model_garch_1_3 = arch_model(df.returns[1:], mean="Constant", vol="GARCH", p=1, q=3)
results_garch_1_3 = model_garch_1_3.fit(update_freq = 5)
print(results_garch_1_3.summary())

Iteration:      5,   Func. Count:     47,   Neg. LLF: 7044.91400453258
Iteration:     10,   Func. Count:     88,   Neg. LLF: 6973.179513958814
Iteration:     15,   Func. Count:    124,   Neg. LLF: 6970.058367403908
Optimization terminated successfully    (Exit mode 0)
            Current function value: 6970.05836622959
            Iterations: 17
            Function evaluations: 137
            Gradient evaluations: 17
                     Constant Mean - GARCH Model Results                      
Dep. Variable:                returns   R-squared:                       0.000
Mean Model:             Constant Mean   Adj. R-squared:                  0.000
Vol Model:                      GARCH   Log-Likelihood:               -6970.06
Distribution:                  Normal   AIC:                           13952.1
Method:            Maximum Likelihood   BIC:                           13991.2
                                        No. Observations:                 5020
Date:                We

Even though we don't get a p-value of one, we can still see the additional coefficient is not significantly different from zero at $5\%$ significance level. Therefore, we should avoid using this method regardless of that, we can decide to be brave and even go a step further by trying the GARCH(3, 1). 

In [11]:
model_garch_2_1 = arch_model(df.returns[1:], mean="Constant", vol="GARCH", p=2, q=1)
results_garch_2_1 = model_garch_2_1.fit(update_freq = 5)
print(results_garch_2_1.summary())

Iteration:      5,   Func. Count:     40,   Neg. LLF: 8793.711867692436
Iteration:     10,   Func. Count:     76,   Neg. LLF: 6967.73124749643
Optimization terminated successfully    (Exit mode 0)
            Current function value: 6967.731020076215
            Iterations: 12
            Function evaluations: 87
            Gradient evaluations: 12
                     Constant Mean - GARCH Model Results                      
Dep. Variable:                returns   R-squared:                       0.000
Mean Model:             Constant Mean   Adj. R-squared:                  0.000
Vol Model:                      GARCH   Log-Likelihood:               -6967.73
Distribution:                  Normal   AIC:                           13945.5
Method:            Maximum Likelihood   BIC:                           13978.1
                                        No. Observations:                 5020
Date:                Wed, Mar 29 2023   Df Residuals:                     5019
Time:           