![QuantConnect Logo](https://cdn.quantconnect.com/web/i/icon.png)
<hr>

### Kalman Filters and Pairs Trading

There are a few Python packages out there for Kalman filters, but we're adapting this example and the Kalman filter class code from [this article](https://www.quantstart.com/articles/kalman-filter-based-pairs-trading-strategy-in-qstrader) and demonstrating how you can implement similar ideas using QuantConnect!

Briefly, a Kalman filter is a [state-space model](https://en.wikipedia.org/wiki/State-space_representation) applicable to linear dynamic systems -- systems whose state is time-dependent and state variations are represented linearly. The model is used to estimate unknown states of a variable based on a series of past values. The procedure is two-fold: a prediction (estimate) is made by the filter of the current state of a variable and the uncertainty of the estimate itself. When new data is available, these estimates are updated. There is a lot of information available about Kalman filters, and the variety of their applications is pretty astounding, but for now, we're going to use a Kalman filter to estimate the hedge ratio between a pair of equities.

The idea behind the strategy is pretty straightforward: take two equities that are cointegrated and create a long-short portfolio. The premise of this is that the spread between the value of our two positions should be mean-reverting. Anytime the spread deviates from its expected value, one of the assets moved in an unexpected direction and is due to revert back. When the spread diverges, you can take advantage of this by going long or short on the spread.

To illustrate, imagine you have a long position in AAPL worth \\$2000 and a short position in IBM worth \\$2000. This gives you a net spread of \\$0. Since you expected AAPL and IBM to move together, then if the spread increases significantly above \\$0, you would short the spread in the expectation that it will return to \\$0, it's natural equilibrium. Similarly, if the value drops significantly below \\$0, you would long the spread and capture the profits as its value returns to \\$0. In our application, the Kalman filter will be used to track the hedging ratio between our equities to ensure that the portfolio value is stationary, which means it will continue to exhibit mean-reversion behavior.

##### Note: Run the final cell first so the remaining cells will execute

In [2]:
# QuantBook Analysis Tool 
# For more information see [https://www.quantconnect.com/docs/research/overview]
import numpy as np
from math import floor
import matplotlib.pyplot as plt
from KalmanFilter import KalmanFilter

qb = QuantBook()
symbols = [qb.AddEquity(x).Symbol for x in ['VIA', 'VIAB']]

Now, we initialize the Kalman Filter, grab our data, and then run the Kalman Filter update process over the data.

In [3]:
kf = KalmanFilter()
history = qb.History(qb.Securities.Keys, datetime(2019, 1, 1), datetime(2019, 1, 11), Resolution.Daily)
prices = history.unstack(level=1).close.transpose()
for index, row in prices.iterrows():
    via = row.loc[str(symbols[0].ID)]
    viab = row.loc[str(symbols[1].ID)]
    forecast_error, prediction_std_dev, hedge_quantity = kf.update(via, viab)
    print(f'{forecast_error} :: {prediction_std_dev} :: {hedge_quantity}')

[26.19257561] :: [[0.03162278]] :: 0
[25.84020912] :: [[0.28741791]] :: 1786
[0.13404521] :: [[0.29639079]] :: 1795
[-0.42714423] :: [[0.31514705]] :: 1768
[0.24662073] :: [[0.31807877]] :: 1783
[0.38152379] :: [[0.31687043]] :: 1807
[0.12279125] :: [[0.31913823]] :: 1815


In an algorithm, the <em>kf.qty</em> variable is the number of shares to invested in VIAB, and <em>hedge_quantity</em> is the amount to trade in the opposite direction for VIA

##### Code for the Kalman Filter

In [1]:
import numpy as np
from math import floor

class KalmanFilter:
    def __init__(self):
        self.delta = 1e-4
        self.wt = self.delta / (1 - self.delta) * np.eye(2)
        self.vt = 1e-3
        self.theta = np.zeros(2)
        self.P = np.zeros((2, 2))
        self.R = None
        self.qty = 2000

    def update(self, price_one, price_two):
        # Create the observation matrix of the latest prices
        # of TLT and the intercept value (1.0)
        F = np.asarray([price_one, 1.0]).reshape((1, 2))
        y = price_two

        # The prior value of the states \theta_t is
        # distributed as a multivariate Gaussian with
        # mean a_t and variance-covariance R_t
        if self.R is not None:
            self.R = self.C + self.wt
        else:
            self.R = np.zeros((2, 2))

        # Calculate the Kalman Filter update
        # ----------------------------------
        # Calculate prediction of new observation
        # as well as forecast error of that prediction
        yhat = F.dot(self.theta)
        et = y - yhat

        # Q_t is the variance of the prediction of
        # observations and hence \sqrt{Q_t} is the
        # standard deviation of the predictions
        Qt = F.dot(self.R).dot(F.T) + self.vt
        sqrt_Qt = np.sqrt(Qt)

        # The posterior value of the states \theta_t is
        # distributed as a multivariate Gaussian with mean
        # m_t and variance-covariance C_t
        At = self.R.dot(F.T) / Qt
        self.theta = self.theta + At.flatten() * et
        self.C = self.R - At * F.dot(self.R)
        hedge_quantity = int(floor(self.qty*self.theta[0]))
        
        return et, sqrt_Qt, hedge_quantity