# GARCH Model

- While autoregressive models are good at predicting univariate time series data with trends (ARIMA) and seasonality (SARIMA), they assume that the variance of the errors does not change over time.

- In time series where varinace changes over time due to volatility, the series is said to be heteroscedastic (typically financial data). 

- Heteroscedastic time series can sometimes be adjusted by transforming the data (log-transform, power transform).

- If the change in variance can be correlated over time, then it can be modelled using an AR process like ARCH or GARCH.

In [None]:
!pip install arch

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [None]:
daily_cases_india = pd.read_csv('../../cleaned_datasets/india/daily_cases_india.csv', parse_dates=['Date'], index_col=0)
cum_cases_india = pd.read_csv('../../cleaned_datasets/india/cum_cases_india.csv', parse_dates=['Date'])

In [None]:
cum_cases_india

In [None]:
daily_cases_india

In [None]:
daily_cases_india.iloc[0] = cum_cases_india.iloc[0]

In [None]:
daily_cases_india.to_csv('../../cleaned_datasets/india/daily_cases_india.csv')
daily_cases_india

In [None]:
daily_confirmed = daily_cases_india.drop(['Recovered', 'Deaths', 'Active'], axis=1)
daily_confirmed

### Plotting the cumulative cases over time

In [None]:
daily_confirmed.plot(figsize=(6,6))

### Checking for stationarity

We will plot the rolling average of the plot at different window sizes and compare the plots to see if the mean is constant.

We will also plot the rolling standard deviation to check for stationarity

In [None]:
daily_ts = daily_confirmed.set_index(['Date'])['Confirmed']

def roll_stats(ts, window, title):
    ''' Function to find rolling mean and rolling std dev and plot them'''
    rollmean = ts.rolling(window = window).mean()
    rollstd = ts.rolling(window = window).std()
    print(rollmean, rollstd)

    plt.figure(figsize=(8,8))
    close = plt.plot(ts, color = 'blue', label = 'Original')
    mean = plt.plot(rollmean, color = 'red', label = 'Rolling Mean')
    std = plt.plot(rollstd, color = 'green', label = 'Rolling Standard Dev')
    plt.legend(loc = 'best')
    plt.title(title)
    plt.show()

In [None]:
# 7-day moving average
roll_stats(daily_ts, 7, '7-day rolling statistics for Daily cases')

In [None]:
# 30-day moving average
roll_stats(daily_ts, 30, '30-day rolling statistics for Daily cases')

From visual inspection, the time series is not stationary and needs to be differenced.

### Augmented Dickey-Fuller Test

- $H_0:$ Presence of a unit root (Time Series is not stationary)
- $H_1:$ There is no unit root (Time-series is stationary)

$DF_t = \frac{\gamma}{SE(\gamma)}$

The more negative $DF_t$, the stronger evidence for rejecting $H_0$.   
If p-value < 0.001, we can reject $H_0$ and say that the time series is stationary

In [None]:
from statsmodels.tsa.stattools import adfuller

def run_dicky_fuller(ts):
  '''Function to run Augmented Dicky Fuller test on the passed time series and report the statistics from the test'''
  print("Observations of Dickey-fuller test")
  dftest = adfuller(ts,autolag='AIC')
  dfoutput=pd.Series(dftest[0:4],index=['Test Statistic','p-value','#lags used','number of observations used'])

  for key,value in dftest[4].items():
      dfoutput['critical value (%s)'%key]= value
  print(dfoutput)

run_dicky_fuller(daily_ts)

Since the p-value > 0.001, the time series is not stationary and it needs to be transformed/differenced.

### Check for trend and seasonality

In [None]:
from statsmodels.tsa.seasonal import seasonal_decompose

def decomp_mult(ts, period):
    decomp = seasonal_decompose(ts, model='multiplicable', period=period)
    fig = plt.figure()  
    fig = decomp.plot() 
    fig.set_size_inches(16, 9)
    
# Some data are incorrect and < 0
daily_pos = daily_ts[daily_ts.index[daily_ts >= 0]]

# Add a constant for the 0's
const_added = daily_pos + 1

# Period = 5
decomp_mult(const_added, 5)

In [None]:
# Period = 30
decomp_mult(const_added, 30)

In [None]:
# Period = 100
decomp_mult(const_added, 100)

As we can see, there are two peaks in the trend, and noticable seasonality.

### First-order differencing

The time series is differenced to try and make the mean constant.

In [None]:
daily_ts_diff_1 = daily_ts - daily_ts.shift(1)
daily_ts_diff_1.plot(figsize=(8,8))

In [None]:
# 7-day moving average
roll_stats(daily_ts_diff_1, 7, '7-day rolling statistics for first-order differenced cases')

### Second-order differencing

In [None]:
daily_ts_diff_2 = daily_ts_diff_1 - daily_ts_diff_1.shift(1)
daily_ts_diff_2.plot(figsize=(8,8))

In [None]:
# 7-day moving average
roll_stats(daily_ts_diff_2, 7, '7-day rolling statistics for second-order differenced cases')

The TS of second-prder differenced data looks stationary. As we can see, the volatility is conditional and it looks like a good candidate for a GARCH model.

## GARCH Model

- $GARCH(p, q)$
    - $p$: number of lag residual errors
    - $q$: number of lag variances

- Formula
    - $a_t = \varepsilon_t \sqrt{\omega + \sum_{i=1}^{p} \alpha_i a_{t-i}^2  + \sum_{i=1}^{q} \beta_i \sigma_{t-i}^2 }$

#### ACF and PACF plots

- To find the parameters of the GARCH model

In [None]:
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf 

fig = plt.figure(figsize=(12,8))
ax1 = fig.add_subplot(211)
fig = plot_acf(daily_ts_diff_2.dropna()**2, lags=100, ax = ax1)
ax2 = fig.add_subplot(212)
fig = plot_pacf(daily_ts_diff_2.dropna()**2, lags=100, ax = ax2)

Possible model: GARCH(2, 2)

### Train-test split

In [None]:
percent_80 = int(len(daily_ts_diff_2)*0.8)

train = daily_ts_diff_2.iloc[:percent_80].dropna()
test = daily_ts_diff_2.iloc[percent_80:]

fig, ax = plt.subplots()
fig.set_size_inches(8, 8)

ax.plot(train, color = 'blue', label = 'Train')
ax.plot(test, color = 'red', label = 'Test')
ax.legend(loc = 'best')
plt.show()

In [None]:
model = arch_model(train, mean='Zero', vol='GARCH', p=2, q=2)
model_fit = model.fit()
yhat = model_fit.forecast(horizon=len(test))


yhatvar = pd.DataFrame(pd.DataFrame(test).reset_index()['Date'])
    
yhatvar['var'] = yhat.variance.values[-1, :]

plt.plot(yhatvar['Date'], yhatvar['var'])
plt.plot(test**2)
plt.show()