# VAR India

Vector auto-regressive models - VAR, VMA, VARIMA on Covid-19 Cases. For all models, the first-differenced dataset is used (I = 1).

## [Setup](#setup)
1. [Imports](#imports)
2. [Results table](#results_init)
3. [Ingestion](#ingestion)
4. [Plotting](#plotting_init)
5. [Statistical tests](#stattests1)
    1. [Johansen co-integration test](#jci_init)
    2. [Augmented DF Test](#adf_init)
6. [Differencing](#diff_init)
7. [Train test split](#traintest_init)

## Long-Term Forecasting

### [VAR Model](#var_model)
1. [Find order p of VAR](#var_p)
2. [VAR(1) Model](#var1)
3. [Plots of first differenced predictions](#diff_plot_var)
4. [Undifferencing and plotting](#undiff_var)
5. [Store metrics - MAE, MAPE, MSE](#store_var)
6. [Plot Train, Test, Forecast](#plot_final_var)

    
### [VMA Model](#vma_model)
1. [Find order q of VMA](#vma_q)
2. [VMA(1) Model](#vma1)
3. [Plots of first differenced predictions](#diff_plot_vma)
4. [Undifferencing and plotting](#undiff_vma)
5. [Store metrics - MAE, MAPE, MSE](#store_vma)
6. [Plot Train, Test, Forecast](#plot_final_vma)


### [VARMA Model](#varma_model)
1. [Find order (p, q) of VARMA](#varma_pq)
2. [VARMA(1, 1) Model](#varma11)
3. [Plots of first differenced predictions](#diff_plot_varma)
4. [Undifferencing and plotting](#undiff_varma)
5. [Store metrics - MAE, MAPE, MSE](#store_varma)
6. [Plot Train, Test, Forecast](#plot_final_varma)

## [Short-Term/Rolling Forecasting](#shortterm)

1. [VAR(1)](#var_roll)
2. [VMA(1)](#vma_roll)
3. [VARMA(1,1)](#varma_roll)

## [Final Metrics](#final_results)

<a name=setup></a>

# Setup

Install `darts` by running: (or run the cell)

1. `pip install darts`

If the installation fails, check their [Github](https://github.com/unit8co/darts#installation-guide) or try running:

1. `conda install -c conda-forge prophet`
2. `conda install pytorch torchvision torchaudio -c pytorch`
3. `pip install darts`

In [None]:
# Comment out if already installed
# !pip install darts

<a name=imports></a>
## Imports

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
import importlib

from darts import TimeSeries
from darts.models.forecasting.varima import VARIMA

import preprocessing
import plotting
import stats_testing
import metrics

warnings.filterwarnings("ignore")

<a name=results_init></a>
## Results table

In [None]:
results_columns = ['model', 'mse', 'mape', 'mae']
results_table = pd.DataFrame(columns=results_columns)

<a name=ingestion></a>

## Ingestion

In [None]:
daily_cases_india = pd.read_csv('../../cleaned_datasets/india/daily_cases_india.csv', parse_dates=['Date'])
cum_vacc_india = pd.read_csv('../../cleaned_datasets/india/cum_vacc_india_cleaned.csv', parse_dates=['Date'])

In [None]:
daily_cases_india.dtypes

In [None]:
daily_cases_india

In [None]:
cum_vacc_india.dtypes

In [None]:
first_vacc = cum_vacc_india.iloc[0].Date
last_vacc = cum_vacc_india.iloc[-1].Date

vacc_dates = pd.date_range(start=first_vacc, end=last_vacc)

In [None]:
cases_vacc = daily_cases_india.merge(cum_vacc_india, how='outer', left_on='Date', right_on='Date')
cases_vacc = cases_vacc[["Date", "Confirmed", "Total_Doses"]]
cases_vacc

In [None]:
indexed = cases_vacc.set_index('Date')
preprocessing.fill_date_gaps(indexed, method='ffill', dates_range=vacc_dates)
indexed

In [None]:
series = TimeSeries.from_dataframe(indexed)

<a name=plotting_init></a>
## Plot initial data

In [None]:
importlib.reload(plotting)

plotting.plot_dataframe(indexed.Confirmed, title='India', color='b', figpath='../../figures/data/india_daily_cases.eps')

In [None]:
plotting.plot_dataframe(indexed.Total_Doses, title='India', color='b', figpath='../../figures/data/india_cum_vacc.eps')


In [None]:
plotting.plot_side_by_side(train=series.pd_dataframe())

<a name=stattests1></a>
## Statistical tests

<a name=jci_init></a>

### Johansen co-integration test

In [None]:
stats_testing.cointegration_test(series.pd_dataframe())

From JCT, the two time series are not correlated.

<a name=adf_init></a>

### Augmented DF Test

In [None]:
# ADF Test on each column
for name, column in series.pd_dataframe().iteritems():
    stats_testing.run_dicky_fuller(column)
    print('\n')

The TS is not stationary

<a name=diff_init></a>
## Differencing

## First order differencing

In [None]:
df_diff_1 = series.pd_dataframe().diff().dropna()
df_diff_1

In [None]:
plotting.plot_side_by_side(train=df_diff_1)


In [None]:
stats_testing.cointegration_test(df_diff_1)

In [None]:
# ADF Test on each column
for name, column in df_diff_1.iteritems():
    stats_testing.run_dicky_fuller(column)
    print('\n')

From JCT, the two time series are not correlated.

<a name=traintest_init></a>
## Train-test split

### Train-test split

In [None]:
importlib.reload(plotting)

train, test = preprocessing.train_test_split(series, fraction=0.9)

plotting.plot_side_by_side(train=train.pd_dataframe(), test=test.pd_dataframe())

<a name=var_model></a>

# VAR Model

<a name=var_p></a>
## Finding the order p for VAR(p)

Using PACF plot

In [None]:
from statsmodels.graphics.tsaplots import plot_pacf, plot_acf

pacf_var_confirmed = plot_pacf(train['Confirmed'].pd_dataframe().diff().dropna(), lags=25)

In [None]:
from darts.models.forecasting.varima import VARIMA

VARIMA.gridsearch(parameters={'p': [1, 7, 8, 9, 10], 'd': [1]}, series=train, n_jobs=2, val_series=test)

<a name=var1></a>
## VARI(7, 1) Model

In [None]:
model = VARIMA(p=7, d=1)
model.fit(train)
forecasted = model.predict(len(test))
forecasted.pd_dataframe()

<a name=store_var></a>

## Store metrics

In [None]:
importlib.reload(metrics)

results_table = results_table.append({
    'model': 'VARI(7,1)',
    'mse': metrics.mean_squared_error(test['Confirmed'].pd_dataframe(), forecasted['Confirmed'].pd_dataframe()),
    'mape': metrics.MAPE(test['Confirmed'].pd_dataframe(), forecasted['Confirmed'].pd_dataframe()),
    'mae': metrics.mean_absolute_error(test['Confirmed'].pd_dataframe(), forecasted['Confirmed'].pd_dataframe())
}, ignore_index=True)

results_table

<a name=plot_final_var></a>

## Plot Train, Test, Forecast

In [None]:
importlib.reload(plotting)
# Plot of daily cases

plotting.plot_train_test_fore(train=train.pd_dataframe().Confirmed, test=test[['Confirmed']].pd_dataframe(), fore=forecasted[['Confirmed']].pd_dataframe(), title='India - Long Term', start_date='2021-03-01', figpath='../../figures/vari/india_vari.eps')


Clearly, a VAR model is not good enough to make predictions

<a name=vma_model></a>

# VIMA Model

<a name=vma_q></a>
## Find order q of VMA

In [None]:
acf_varma_confirmed = plot_acf(train['Confirmed'].pd_dataframe().diff().dropna(), lags=25)

In [None]:
VARIMA.gridsearch(parameters={'p': [0], 'd': [1], 'q': [1, 7, 14]}, series=train, n_jobs=2, val_series=test)

<a name=vma1></a>

## VIMA(1, 1)

In [None]:
model_ma = VARIMA(p=0, d=1, q=1)
model_ma.fit(train)
forecasted_ma = model_ma.predict(len(test))
forecasted_ma.pd_dataframe()

<a name=store_vma></a>

## Store metrics

In [None]:
results_table = results_table.append({
    'model': 'VIMA(1,1)',
    'mse': metrics.mean_squared_error(test['Confirmed'].pd_dataframe(), forecasted_ma['Confirmed'].pd_dataframe()),
    'mape': metrics.MAPE(test['Confirmed'].pd_dataframe(), forecasted_ma['Confirmed'].pd_dataframe()),
    'mae': metrics.mean_absolute_error(test['Confirmed'].pd_dataframe(), forecasted_ma['Confirmed'].pd_dataframe())
}, ignore_index=True)

results_table

<a name=plot_final_vma></a>

## Plot Train, Test, Forecast

In [None]:
importlib.reload(plotting)

plotting.plot_train_test_fore(train=train.pd_dataframe().Confirmed, test=test[['Confirmed']].pd_dataframe(), fore=forecasted_ma[['Confirmed']].pd_dataframe(), title='Daily cases')


<a name=varma_model></a>

# VARIMA

<a name=varma_pq></a>

## Find order (p,d,q) of VARIMA

PACF/ACF

In [None]:
pacf_varma_confirmed = plot_pacf(train.pd_dataframe()['Confirmed'].diff().dropna(), lags=25)
acf_varma_confirmed = plot_acf(train.pd_dataframe()['Confirmed'].diff().dropna(), lags=25)

VARIMA(8, 2)

In [None]:
_, order = VARIMA.gridsearch(parameters={'p': [1, 7, 8, 9], 'd': [1], 'q': [1, 7, 14]}, series=train, n_jobs=2, val_series=test)
order

<a name=varma11></a>

## VARIMA(7, 1, 1)

In [None]:
model_varima = VARIMA(**order)
model_varima.fit(train)
forecasted_varima = model_varima.predict(len(test))
forecasted_varima.pd_dataframe()

<a name=store_varma></a>

## Store metrics

In [None]:
results_table = results_table.append({
    'model': 'VARIMA(7,1,1)',
    'mse': metrics.mean_squared_error(test['Confirmed'].pd_dataframe(), forecasted_varima['Confirmed'].pd_dataframe()),
    'mape': metrics.MAPE(test['Confirmed'].pd_dataframe(), forecasted_varima['Confirmed'].pd_dataframe()),
    'mae': metrics.mean_absolute_error(test['Confirmed'].pd_dataframe(), forecasted_varima['Confirmed'].pd_dataframe())
}, ignore_index=True)

results_table

<a name=plot_final_varma></a>

## Plot Train, Test, Forecast

In [None]:
# Plot of daily cases
plotting.plot_train_test_fore(train=train.pd_dataframe().Confirmed, test=test[['Confirmed']].pd_dataframe(), fore=forecasted_varima[['Confirmed']].pd_dataframe(), title='India - Long Term', start_date='2021-03-01', figpath='../../figures/varima/india_varima.eps')


<a name=shortterm></a>

# Rolling forecasts

In [None]:
history = train.copy()
predicted = pd.DataFrame(columns=[
    'VAR_Confirmed', 
    'VAR_Total_Doses', 
    'VMA_Confirmed', 
    'VMA_Total_Doses', 
    'VARIMA_Confirmed', 
    'VARIMA_Total_Doses'
], index=test.pd_dataframe().index)


# predicted
for t in range(len(test)):
    
    # 3 models
    var = VARIMA(p=9, d=1, q=0)
    vma = VARIMA(p=0, d=1, q=1)
    varima = VARIMA(p=7, d=1, q=1)

    var.fit(history)
    vma.fit(history)
    varima.fit(history)


    forecasted_varima = model_varima.predict(len(test))
    forecasted_varima.pd_dataframe()
    
    
    # Predictions
    yhat_var = var.predict(n=1)
    yhat_vma = vma.predict(n=1)
    yhat_varima = varima.predict(n=1)    
    
    # Confirmed cases
    predicted.iloc[t]['VAR_Confirmed'] = yhat_var['Confirmed'].values()[0][0]
    predicted.iloc[t]['VMA_Confirmed'] = yhat_vma['Confirmed'].values()[0][0]
    predicted.iloc[t]['VARIMA_Confirmed'] = yhat_varima['Confirmed'].values()[0][0]
    
    # Total doses
    predicted.iloc[t]['VAR_Total_Doses'] = yhat_var['Total_Doses'].values()[0][0]
    predicted.iloc[t]['VMA_Total_Doses'] = yhat_vma['Total_Doses'].values()[0][0]
    predicted.iloc[t]['VARIMA_Total_Doses'] = yhat_varima['Total_Doses'].values()[0][0]
    
    history = history.append(test[t])

In [None]:
predicted

<a name=var_roll></a>

## VAR(7)

In [None]:
# plotting.plot_fore_test(test=test.pd_dataframe()['Confirmed'], fore=predicted[['VAR_Confirmed']], title='VAR')

In [None]:
importlib.reload(metrics)

metrics.MAPE(test['Confirmed'].pd_dataframe(), predicted['VAR_Confirmed'])

In [None]:
results_table = results_table.append({
    'model': 'VARI(9,1) - rolling',
    'mse': metrics.mean_squared_error(test['Confirmed'].pd_dataframe(), predicted['VAR_Confirmed']),
    'mape': metrics.MAPE(test['Confirmed'].pd_dataframe(), predicted['VAR_Confirmed']),
    'mae': metrics.mean_absolute_error(test['Confirmed'].pd_dataframe(), predicted['VAR_Confirmed'])
}, ignore_index=True)

results_table

In [None]:
plotting.plot_train_test_fore(train=train.pd_dataframe().Confirmed, test=test[['Confirmed']].pd_dataframe(), fore=predicted[['VAR_Confirmed']], title='India - Short Term', figpath='../../figures/vari/india_vari_rolling.eps')


<a name=vma_roll></a>

## VMA(1)

In [None]:
# plotting.plot_fore_test(test=test.pd_dataframe()['Confirmed'], fore=predicted[['VMA_Confirmed']], title='VMA')

In [None]:
results_table = results_table.append({
    'model': 'VIMA(1,1) - rolling',
    'mse': metrics.mean_squared_error(test['Confirmed'].pd_dataframe(), predicted[['VMA_Confirmed']]),
    'mape': metrics.MAPE(test['Confirmed'].pd_dataframe(), predicted[['VMA_Confirmed']]),
    'mae': metrics.mean_absolute_error(test['Confirmed'].pd_dataframe(), predicted[['VMA_Confirmed']])
}, ignore_index=True)

results_table

In [None]:
# plotting.plot_train_test_fore(train=train.pd_dataframe().Confirmed, test=test[['Confirmed']].pd_dataframe(), fore=predicted[['VMA_Confirmed']], title='VAR - Daily cases')



<a name=varma_roll></a>

## VARIMA(7,1,1)

In [None]:
results_table = results_table.append({
    'model': 'VARIMA(7,1,1) - rolling',
    'mse': metrics.mean_squared_error(test['Confirmed'].pd_dataframe(), predicted[['VARIMA_Confirmed']]),
    'mape': metrics.MAPE(test['Confirmed'].pd_dataframe(), predicted[['VARIMA_Confirmed']]),
    'mae': metrics.mean_absolute_error(test['Confirmed'].pd_dataframe(), predicted[['VARIMA_Confirmed']])
}, ignore_index=True)

results_table

In [None]:
importlib.reload(metrics)
metrics.RMSE(test['Confirmed'].pd_dataframe(), predicted[['VARIMA_Confirmed']], title='VARIMA(7,1,1) - rolling')
metrics.MSE(test['Confirmed'].pd_dataframe(), predicted[['VARIMA_Confirmed']], title='VARIMA(7,1,1) - rolling')

In [None]:
plotting.plot_fore_test(test=test.pd_dataframe()['Confirmed'], fore=predicted[['VARIMA_Confirmed']], title='VARIMA')

In [None]:
importlib.reload(plotting)

plotting.plot_train_test_fore(train=train.pd_dataframe().Confirmed, test=test[['Confirmed']].pd_dataframe(), fore=predicted[['VARIMA_Confirmed']], title='India - Short Term', figpath='../../figures/varima/india_varima_rolling.eps')


<a name=final_results></a>

# Final Results

In [None]:
results_table.to_csv('var_india_results.csv')

In [None]:
results_table

In [None]:
x=results_table['mse'][0]
print(f'MSE of is {x:.5e}')