# VAR USA

Vector auto-regressive models

1. [Imports](#imports)
2. [Ingestion](#ingestion)
3. [Plotting](#plotting)
4. [Statistical tests](#stattests1)
5. [Differencing](#diff)


### VAR with First-Order Differencing
1. [Train test split - first order differencing](#traintest1)
2. [Find order p of VAR](#var_p_1)
3. [VAR(8) Model](#var8_1)
4. [Plots of first differenced predictions](#diff1_plot)
5. [Undifferencing and predicting](#undiff_1)
6. [MAPE](#mape1)
7. [Rolling forecasts](#roll1)


### VARMA
1. [VARMA](#varma)
2. [Rolling forecasts](#roll2)

<a name=setup></a>

# Setup

Install `darts` by running: (or run the cell)

1. `pip install darts`

If the installation fails, check their [Github](https://github.com/unit8co/darts#installation-guide) or try running:

1. `conda install -c conda-forge prophet`
2. `conda install pytorch torchvision torchaudio -c pytorch`
3. `pip install darts`

In [None]:
# Comment out if already installed
# !pip install darts

<a name=imports></a>
## Imports

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
import importlib

from darts import TimeSeries
from darts.models.forecasting.varima import VARIMA

import preprocessing
import plotting
import stats_testing
import metrics

warnings.filterwarnings("ignore")

## Results table

In [None]:
results_columns = ['model', 'mse', 'mape', 'mae']
results_table = pd.DataFrame(columns=results_columns)

<a name=ingestion></a>
## Ingestion

In [None]:
daily_cases_usa = pd.read_csv('../../cleaned_datasets/usa/daily_cases_usa.csv', parse_dates=['Date'])
cum_vacc_usa = pd.read_csv('../../cleaned_datasets/usa/cum_vacc_usa_cleaned.csv', parse_dates=['Date'])

In [None]:
daily_cases_usa.dtypes

In [None]:
daily_cases_usa

In [None]:
cum_vacc_usa.dtypes

In [None]:
first_vacc = cum_vacc_usa.iloc[0].Date
last_vacc = cum_vacc_usa.iloc[-1].Date

vacc_dates = pd.date_range(start=first_vacc, end=last_vacc)

In [None]:
cases_vacc = daily_cases_usa.merge(cum_vacc_usa, how='outer', left_on='Date', right_on='Date')
cases_vacc = cases_vacc[["Date", "Confirmed", "Total_Doses"]]
cases_vacc

In [None]:
indexed = cases_vacc.set_index('Date')
preprocessing.fill_date_gaps(indexed, method='ffill', dates_range=vacc_dates)
indexed

In [None]:
series = TimeSeries.from_dataframe(indexed)

<a name=plotting></a>
## Plot initial data

In [None]:
plotting.plot_side_by_side(train=series.pd_dataframe())

<a name=stattests1></a>
## Statistical tests

### Johansen co-integration test

In [None]:
stats_testing.cointegration_test(series.pd_dataframe())

The two datasets are correlated.

### Augmented DF Test

In [None]:
# ADF Test on each column
for name, column in series.pd_dataframe().iteritems():
    stats_testing.run_dicky_fuller(column)
    print('\n')

The TS is not stationary

<a name=diff></a>
## Differencing

## First order differencing

In [None]:
df_diff_1 = series.pd_dataframe().diff().dropna()
df_diff_1

In [None]:
plotting.plot_side_by_side(train=df_diff_1)

In [None]:
stats_testing.cointegration_test(df_diff_1)

In [None]:
# ADF Test on each column
for name, column in df_diff_1.iteritems():
    stats_testing.run_dicky_fuller(column)
    print('\n')

<a name=traintest1></a>
## Train-test split

### Train-test split - first order differenced

In [None]:
train, test = preprocessing.train_test_split(series, fraction=0.9)

plotting.plot_side_by_side(train=train.pd_dataframe(), test=test.pd_dataframe())

<a name=var_model></a>

# VAR Model

<a name=var_p_1></a>
## Finding the best value of p for VAR(p)

Using PACF plot

In [None]:
from statsmodels.graphics.tsaplots import plot_pacf, plot_acf

pacf_var_confirmed = plot_pacf(df_diff_1['Confirmed'], lags=25)

Possible values of p - 3, 6, 10

In [None]:
from darts.models.forecasting.varima import VARIMA

VARIMA.gridsearch(parameters={'p': [3, 6, 10], 'd': [1]}, series=train, n_jobs=2, val_series=test)

<a name=var8_1></a>
## VAR(3,1) Model

In [None]:
model = VARIMA(p=3, d=1)
model.fit(train)
forecasted = model.predict(len(test))
forecasted.pd_dataframe()

## Store metrics

In [None]:
results_table = results_table.append({
    'model': 'VARI(3,1)',
    'mse': metrics.mean_squared_error(test['Confirmed'].pd_dataframe(), forecasted['Confirmed'].pd_dataframe()),
    'mape': metrics.MAPE(test['Confirmed'].pd_dataframe(), forecasted['Confirmed'].pd_dataframe()),
    'mae': metrics.mean_absolute_error(test['Confirmed'].pd_dataframe(), forecasted['Confirmed'].pd_dataframe())
}, ignore_index=True)

results_table

<a name=plot_final_var></a>

## Plot Train, Test, Forecast

In [None]:
importlib.reload(plotting)
# Plot of daily cases

plotting.plot_train_test_fore(train=train.pd_dataframe().Confirmed, test=test[['Confirmed']].pd_dataframe(), fore=forecasted[['Confirmed']].pd_dataframe(), title='USA', start_date='2021-03-01', figpath='../../figures/vari/usa_vari.eps')


# VMA Model

In [None]:
acf_varma_confirmed = plot_acf(train['Confirmed'].pd_dataframe().diff().dropna(), lags=25)

Possible values of q - 2, 6, 7, 9, 13, 14

In [None]:
VARIMA.gridsearch(parameters={'p': [0], 'd': [1], 'q': [2,6,7]}, series=train, n_jobs=2, val_series=test)

## VMA(1,7)

In [None]:
model_ma = VARIMA(p=0, d=1, q=7)
model_ma.fit(train)
forecasted_ma = model_ma.predict(len(test))
forecasted_ma.pd_dataframe()

In [None]:
results_table = results_table.append({
    'model': 'VIMA(1,7)',
    'mse': metrics.mean_squared_error(test['Confirmed'].pd_dataframe(), forecasted_ma['Confirmed'].pd_dataframe()),
    'mape': metrics.MAPE(test['Confirmed'].pd_dataframe(), forecasted_ma['Confirmed'].pd_dataframe()),
    'mae': metrics.mean_absolute_error(test['Confirmed'].pd_dataframe(), forecasted_ma['Confirmed'].pd_dataframe())
}, ignore_index=True)

results_table

<a name=plot_final_vma></a>

## Plot Train, Test, Forecast

In [None]:
plotting.plot_train_test_fore(train=train.pd_dataframe().Confirmed, test=test[['Confirmed']].pd_dataframe(), fore=forecasted_ma[['Confirmed']].pd_dataframe(), title='Daily cases')


<a name=varma></a>

## VARIMA

In [None]:
pacf_varma_confirmed = plot_pacf(train.pd_dataframe()['Confirmed'].diff().dropna(), lags=25)
acf_varma_confirmed = plot_acf(train.pd_dataframe()['Confirmed'].diff().dropna(), lags=25)

In [None]:
_, order = VARIMA.gridsearch(parameters={'p': [3, 6, 8, 9], 'd': [1], 'q': [2, 6, 7]}, series=train, n_jobs=2, val_series=test)
order

<a name=varma11></a>

## VARIMA(3, 1, 6)

In [None]:
model_varima = VARIMA(**order)
model_varima.fit(train)
forecasted_varima = model_varima.predict(len(test))
forecasted_varima.pd_dataframe()

In [None]:
results_table = results_table.append({
    'model': 'VARIMA(3,1,6)',
    'mse': metrics.mean_squared_error(test['Confirmed'].pd_dataframe(), forecasted_varima['Confirmed'].pd_dataframe()),
    'mape': metrics.MAPE(test['Confirmed'].pd_dataframe(), forecasted_varima['Confirmed'].pd_dataframe()),
    'mae': metrics.mean_absolute_error(test['Confirmed'].pd_dataframe(), forecasted_varima['Confirmed'].pd_dataframe())
}, ignore_index=True)

results_table

<a name=plot_final_varma></a>

## Plot Train, Test, Forecast

In [None]:
# Plot of daily cases
plotting.plot_train_test_fore(train=train.pd_dataframe().Confirmed, test=test[['Confirmed']].pd_dataframe(), fore=forecasted_varima[['Confirmed']].pd_dataframe(), title='USA', start_date='2021-03-01', figpath='../../figures/varima/usa_varima.eps')


<a name=roll2></a>

## Rolling Forecasts

In [None]:
history = train.copy()
predicted = pd.DataFrame(columns=[
    'VAR_Confirmed', 
    'VAR_Total_Doses', 
    'VMA_Confirmed', 
    'VMA_Total_Doses', 
    'VARIMA_Confirmed', 
    'VARIMA_Total_Doses'
], index=test.pd_dataframe().index)


# predicted
for t in range(len(test)):
    
    # 3 models
    var = VARIMA(p=3, d=1, q=0)
    vma = VARIMA(p=0, d=1, q=7)
    varima = VARIMA(p=3, d=1, q=6)

    var.fit(history)
    vma.fit(history)
    varima.fit(history)


    forecasted_varima = model_varima.predict(len(test))
    forecasted_varima.pd_dataframe()
    
    
    # Predictions
    yhat_var = var.predict(n=1)
    yhat_vma = vma.predict(n=1)
    yhat_varima = varima.predict(n=1)    
    
    # Confirmed cases
    predicted.iloc[t]['VAR_Confirmed'] = yhat_var['Confirmed'].values()[0][0]
    predicted.iloc[t]['VMA_Confirmed'] = yhat_vma['Confirmed'].values()[0][0]
    predicted.iloc[t]['VARIMA_Confirmed'] = yhat_varima['Confirmed'].values()[0][0]
    
    # Total doses
    predicted.iloc[t]['VAR_Total_Doses'] = yhat_var['Total_Doses'].values()[0][0]
    predicted.iloc[t]['VMA_Total_Doses'] = yhat_vma['Total_Doses'].values()[0][0]
    predicted.iloc[t]['VARIMA_Total_Doses'] = yhat_varima['Total_Doses'].values()[0][0]
    
    history = history.append(test[t])

<a name=plot_final_varma></a>

## Plot Train, Test, Forecast

In [None]:
# Plot of daily cases
plotting.plot_train_test_fore(train=train.pd_dataframe().Confirmed, test=test[['Confirmed']].pd_dataframe(), fore=forecasted_varima[['Confirmed']].pd_dataframe(), title='Daily cases')


In [None]:
predicted

## VAR(3)

In [None]:
plotting.plot_fore_test(test=test.pd_dataframe()['Confirmed'], fore=predicted[['VAR_Confirmed']], title='VAR')


In [None]:
results_table = results_table.append({
    'model': 'VARI(7,1) - rolling',
    'mse': metrics.mean_squared_error(test['Confirmed'].pd_dataframe(), predicted['VAR_Confirmed']),
    'mape': metrics.MAPE(test['Confirmed'].pd_dataframe(), predicted['VAR_Confirmed']),
    'mae': metrics.mean_absolute_error(test['Confirmed'].pd_dataframe(), predicted['VAR_Confirmed'])
}, ignore_index=True)

results_table

In [None]:
plotting.plot_train_test_fore(train=train.pd_dataframe().Confirmed, test=test[['Confirmed']].pd_dataframe(), fore=predicted[['VAR_Confirmed']], title='USA', start_date='2021-03-01', figpath='../../figures/vari/usa_vari_rolling.eps')


## VMA(2)

In [None]:
plotting.plot_fore_test(test=test.pd_dataframe()['Confirmed'], fore=predicted[['VMA_Confirmed']], title='VMA')


In [None]:
results_table = results_table.append({
    'model': 'VIMA(1,1) - rolling',
    'mse': metrics.mean_squared_error(test['Confirmed'].pd_dataframe(), predicted[['VMA_Confirmed']]),
    'mape': metrics.MAPE(test['Confirmed'].pd_dataframe(), predicted[['VMA_Confirmed']]),
    'mae': metrics.mean_absolute_error(test['Confirmed'].pd_dataframe(), predicted[['VMA_Confirmed']])
}, ignore_index=True)

results_table

In [None]:
importlib.reload(plotting)

plotting.plot_train_test_fore(train=train.pd_dataframe().Confirmed, test=test[['Confirmed']].pd_dataframe(), fore=predicted[['VMA_Confirmed']], title='VAR - Daily cases')




## VARMA(3,2)

In [None]:
results_table = results_table.append({
    'model': 'VARIMA(7,1,1) - rolling',
    'mse': metrics.mean_squared_error(test['Confirmed'].pd_dataframe(), predicted[['VARIMA_Confirmed']]),
    'mape': metrics.MAPE(test['Confirmed'].pd_dataframe(), predicted[['VARIMA_Confirmed']]),
    'mae': metrics.mean_absolute_error(test['Confirmed'].pd_dataframe(), predicted[['VARIMA_Confirmed']])
}, ignore_index=True)

results_table

In [None]:
plotting.plot_fore_test(test=test.pd_dataframe()['Confirmed'], fore=predicted[['VARIMA_Confirmed']], title='VARIMA')



In [None]:
plotting.plot_train_test_fore(train=train.pd_dataframe().Confirmed, test=test[['Confirmed']].pd_dataframe(), fore=predicted[['VARIMA_Confirmed']], title='USA', start_date='2021-03-01', figpath='../../figures/varima/usa_varima_rolling.eps')


In [None]:
results_table