$$AIC=-2\frac{\ln L}{T}+2\frac{2}{T}k$$
$$BIC=-2\frac{\ln L}{T}+\frac{\ln T}{T}k$$
- $\ln L$ : log likelihood of estimated model
- $k$ : number of parameters
- $T$ : length of time series

<table>
  <tr>
    <th>Pros</th>
    <th>Cons</th>
  </tr>
  <tr>
    <th>1.Saves time</th>
    <th>1. Blindly putting our faith into one criterion</th>
  </tr>
  <tr>
    <th>2. Removes ambiguity</th>
    <th>2. Never really see how well the other models perform</th>
  </tr>
  <tr>
    <th>3. Reduces risk of human error</th>
    <th>3. Topic expertise</th>
  </tr>
  <tr>
    <th></th>
    <th>4. Human error</th>
  </tr>
</table>

## Packages

In [19]:
# Install arch library
!pip install arch
!pip install pmdarima

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pmdarima
  Downloading pmdarima-2.0.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl (1.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m23.4 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: pmdarima
Successfully installed pmdarima-2.0.3


In [22]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.graphics.tsaplots as sgt
import statsmodels.tsa.stattools as sts
from statsmodels.tsa.arima.model import ARIMA
from scipy.stats.distributions import chi2
from math  import sqrt
import seaborn as sns
from google.colab import drive
import warnings
from statsmodels.tsa.statespace.sarimax import SARIMAX
from arch import arch_model
import yfinance
from pmdarima.arima import auto_arima

warnings.filterwarnings("ignore")
sns.set()

In [3]:
drive.mount("/content/drive")

Mounted at /content/drive


## Loading the data

In [13]:
raw_data = yfinance.download(tickers = "^GSPC ^FTSE ^N225 ^GDAXI", start = "1994-01-07", end = "2018-01-29", 
                             interval = "1d", group_by = "ticker", auto_adjust = True)

[*********************100%***********************]  4 of 4 completed


In [14]:
df_comp = raw_data.copy()

In [15]:
df_comp['spx'] = df_comp['^GSPC'].Close[:]
df_comp['dax'] = df_comp['^GDAXI'].Close[:]
df_comp['ftse'] = df_comp['^FTSE'].Close[:]
df_comp['nikkei'] = df_comp['^N225'].Close[:]

In [16]:
df_comp = df_comp.iloc[1:]
del df_comp['^N225']
del df_comp['^GSPC']
del df_comp['^GDAXI']
del df_comp['^FTSE']
df_comp = df_comp.asfreq('b')
df_comp = df_comp.fillna(method='ffill')

## Creating Returns

In [17]:
df_comp['ret_spx'] = df_comp.spx.pct_change(1)*100
df_comp['ret_ftse'] = df_comp.ftse.pct_change(1)*100
df_comp['ret_dax'] = df_comp.dax.pct_change(1)*100
df_comp['ret_nikkei'] = df_comp.nikkei.pct_change(1)*100

## Splitting the Data

In [18]:
size = int(len(df_comp)*0.8)
df, df_test = df_comp.iloc[:size], df_comp.iloc[size:]

## Fitting a Model
1. The rules of model selection are rather "rules of thumb" thn fixed
2. Auto ARIMA only considers a single feature - the AIC
3. We could have easily overfitted while going through the models in our previous sections
4. The default arguments of the method restrict the number of AR and MA components

In [23]:
model_auto = auto_arima(df.ret_ftse[1:])

In [29]:
model_auto

In [26]:
print(model_auto.summary())

                               SARIMAX Results                                
Dep. Variable:                      y   No. Observations:                 5019
Model:               SARIMAX(4, 0, 5)   Log Likelihood               -7882.776
Date:                Wed, 29 Mar 2023   AIC                          15785.552
Time:                        14:54:15   BIC                          15850.762
Sample:                    01-11-1994   HQIC                         15808.403
                         - 04-05-2013                                         
Covariance Type:                  opg                                         
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
ar.L1          0.0120      0.082      0.147      0.883      -0.148       0.172
ar.L2         -0.6543      0.077     -8.457      0.000      -0.806      -0.503
ar.L3         -0.1628      0.071     -2.290      0.0

## Important Arguments
- Since there are 5 business days in the week, we set the length of the cycle to 5.
- `n_jobs` : how many models to fit simultaneously (number of CPUs).

In [31]:
model_auto = auto_arima(df_comp.ret_ftse[1:], exogeneous = df_comp[['ret_spx', 'ret_dax', 'ret_nikkei']][1:], m = 5,
                        max_order=None, max_p = 7, max_q = 7, max_d = 2, max_P = 4, max_Q = 4, max_D = 2, 
                        max_iter = 50, alpha = 0.05, n_job = -1, trend = 'ct', information_criterion = 'oob',
                        out_of_sample_size = int(len(df_comp)*0.2))

In [32]:
model_auto.summary()

0,1,2,3
Dep. Variable:,y,No. Observations:,6274.0
Model:,"SARIMAX(0, 0, 3)x(2, 0, [1, 2], 5)",Log Likelihood,-9581.139
Date:,"Wed, 29 Mar 2023",AIC,19182.278
Time:,16:05:30,BIC,19249.719
Sample:,0,HQIC,19205.645
,- 6274,,
Covariance Type:,opg,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
intercept,0.0283,0.039,0.716,0.474,-0.049,0.106
drift,-2.361e-06,1.06e-05,-0.223,0.824,-2.31e-05,1.84e-05
ma.L1,-0.0242,0.009,-2.752,0.006,-0.041,-0.007
ma.L2,-0.0503,0.008,-6.351,0.000,-0.066,-0.035
ma.L3,-0.0840,0.008,-10.746,0.000,-0.099,-0.069
ar.S.L5,-0.0949,0.724,-0.131,0.896,-1.514,1.324
ar.S.L10,-0.1821,0.203,-0.899,0.369,-0.579,0.215
ma.S.L5,0.0420,0.724,0.058,0.954,-1.377,1.461
ma.S.L10,0.1657,0.231,0.717,0.473,-0.287,0.619

0,1,2,3
Ljung-Box (L1) (Q):,0.14,Jarque-Bera (JB):,9004.63
Prob(Q):,0.71,Prob(JB):,0.0
Heteroskedasticity (H):,0.86,Skew:,-0.23
Prob(H) (two-sided):,0.0,Kurtosis:,8.85
