<a href="https://colab.research.google.com/github/danielbauer1979/MSDIA_PredictiveModelingAndMachineLearning/blob/main/GB886_VII_11_VAR.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Multivariate Time Series Example

To consider a multivariate time series example, let's go back to the interest rate data we introduced at the beginning of this module.

As a quick background, when governments issue treasury securities to borrow money, investors are willing to accept different intrest rates depending on when the government will pay back (i.e., the *maturity* of the bonds).

Motivated by insights from finance that are beond this course, let's consider a selection of the data where we only use three maturities: A short nmaturity of three months, a medium maturity of five years, and a long maturity of ten years.

Let's look at the data:

In [None]:
import pandas as pd
!git clone https://github.com/danielbauer1979/MSDIA_PredictiveModelingAndMachineLearning.git
dat_yields = pd.read_csv('MSDIA_PredictiveModelingAndMachineLearning/GB886_VII_11_Yields.csv')
dat_yields.head()

In [None]:
dat_yields.tail()

In [None]:
import matplotlib.pyplot as plt
plt.plot(dat_yields['date'], dat_yields['DGS3MO'], label='3 Mo')
plt.plot(dat_yields['date'], dat_yields['DGS5'], label='5 Yr')
plt.plot(dat_yields['date'], dat_yields['DGS10'], label='10 Yr')
plt.xlabel('Date')
plt.ylabel('Yield')
plt.legend()
plt.show()


So, we have data between 1993 and and 2022. As is clear we have quite a bit of variation.  

We want to forecast the interest rates, which clearly may be helpful for banks and other financial institutions.

It appears, in line with the stock example from before, that there is a stochastic trend (this looks similar to a random walk). So, let's take a look at differences:

In [None]:
plt.plot(dat_yields['DGS3MO'].diff(), label='3 Mo')
plt.plot(dat_yields['DGS5'].diff(), label='5 Yr')
plt.plot(dat_yields['DGS10'].diff(), label='10 Yr')
plt.xlabel('Date')
plt.ylabel('Change in Yield')
plt.legend()
plt.show()

This seems more like white noise!

Our objective is to forecast the interest rates for the next year. Again motivated by finance concepts, let's run a **Vector-AutorRegression** (VAR) model for the differences in the three rates:
$dy_t = c + A_1 \; dy_{t-1} + A_2 \; dy_{t-2} +... ɛ_t$
where:
$$
dy_t = (dy_{1t}, dy_{2t}, dy_{3t})'
$$
and $dy_{1t}=y_{1,t}-y_{1,t-1}$ is the (differenced) 3-m rate,  $dy_{2t}$ is the 5-y rate, and  $dy_{3t}$ is the 10-y rate.

We use VAR functionality from statsmodels:

In [None]:
from statsmodels.tsa.api import VAR
model = VAR(dat_yields[['DGS3MO', 'DGS5', 'DGS10']].diff().dropna())
results = model.fit(maxlags=5, ic='aic')
print(results.summary())

So, we note that we essentially get a model for each of the components. And we also note the errors $ɛ$ are quote correlated!

In deciding for five lags, I (somewhat informally) ran the model for different lags (see the 'maxlags' in the command) and it appears that 5 lags produced the lowest AIC:

In [None]:
results.aic

Check by changing up the number of lags.

We can now use the model to generate forecasts for the yields. Let's forecast over twelve months:

In [None]:
forecast = results.forecast(dat_yields[['DGS3MO', 'DGS5', 'DGS10']].diff().dropna().values[-5:], steps=12)
print(forecast)

However, one key issue is that we modeled the differences in yields, not so much the yields themselves. So let's aggregate to forecast yields:

In [None]:
last_observed = dat_yields[['DGS3MO', 'DGS5', 'DGS10']].iloc[-1]
forecasted_yields = last_observed.values + forecast.cumsum(axis=0)
print(forecasted_yields)

In [None]:
plt.plot(forecasted_yields[:, 0], label='3 Mo Forecast')
plt.plot(forecasted_yields[:, 1], label='5 Yr Forecast')
plt.plot(forecasted_yields[:, 2], label='10 Yr Forecast')
plt.xlabel('Month')
plt.ylabel('Yield')
plt.legend()
plt.show()

So, the forecasts are a little boring in that the most likely forecasts are fairly steady. However, and this is a key advantage of this multi-variate time series model, we can look at the joint likely evolution of the yields going forward. Let's e.g. generate 10 possible scenarios:

In [None]:
import numpy as np
np.random.seed(21)

simulations = []
for _ in range(10):
  sim = results.simulate_var(steps=17)
  simulated_yields = last_observed.values + sim.cumsum(axis=0)
  simulations.append(simulated_yields)

# Plot the simulations
for sim in simulations:
  plt.plot(sim)
plt.xlabel('Step')
plt.ylabel('Yield')
plt.title('Simulated Forecasted Yields')
plt.show()


This may be helpful for our bank!!