<h2>
Backtesting a Pairs Trading Strategy
</h2>
<p>
This notebook is a sequel to the notebook Exploratory Statistics of Pairs Trading
(https://github.com/IanLKaplan/pairs_trading/blob/master/pairs_trading.ipynb). The previous notebook explores the algorithms for selecting
pairs and the statistics of pairs trading. This statistical exploration provides the foundation for the strategy that his backtested in
this notebook. For a discussion of pairs trading, the algorithms used to select pairs and the background for the strategy that is
backetested in this notebook please see the pevious notebook.
</p>
<h3>
Pairs Trading Strategy
</h3>
<h4>
In-sample and out-of-sample time periods
</h4>
<ul>
<li>
<p>
In-sample period: six months (126 trading days)
</p>
</li>
<li>
<p>
Out-of-sample (trading) period: three months (63 trading days)
</p>
</li>
</ul>
<h4>
Strategy
</h4>
<p>
For each in-sample period:
</p>
<ol>
<li>
Get pairs for each S&P 500 industrial sector
</li>
<li>
Select the pairs with close price series correlation greater than or equal to 0.75
</li>
<li>
Select the high correlation pairs that show Granger cointegration
</li>
<li>
Sort the spread time series for the selected pairs by volatility (high to low volatility). Pairs with spread that has
high volatility (standard deviation) are more likely to be profitable.
</li>
<li>
Select the top M volatile pairs.
</li>
<li>
Remove pairs that have the same stock
</li>
<li>
Select N pairs from the unique pair list
</li>
</ol>
<h4>
Out-of-sample trading period
</h4>
<p>
This is not an academic exercise.  The pairs trading backtest is intended to be as close to actual trading as possible.
This backtest is intended to help understand whether this strategy is worth pursuing for actual trading.
</p>
<p>
At the start date of the backtest, there is an investment of N dollars (e.g., $100,000). At the end of each trading period,
all positions are closed. The resulting cash is used in the next trading period.
</p>
<p>
For each pair (in the N pair set) in the out-of-sample trading period:
</p>
<ol>
<li>
Calculate the spread value for the current trading day.
</li>
<li>
If the spread value has returned to the in-sample mean and there is an open long/short position, close the position and update profit and loss.
"Return to mean" is complicated by the fact that the mean may be overshot.
</li>
<li>
If there is no open position for the pair and the spread value is above or below the trading value (e.g., standard deviation times 0.75)
open a long/short position.
</li>
<li>
If the end of the trading period is reached, close all open positions and update the profit and loss.
</li>
</ol>
<p>
Positions are opened for whole share values.
</p>
<p>
The results of the backtest should provide the following statistics
</p>
<h4>
Trading Period Statistics
</h4>
<ol>
<li>
Position for each trade and P/L for each trade.
</li>
<li>
Return for each pair in the trading period
</li>
<li>
Overalll return for the trading period
</li>
<li>
Standard deviation for the trading period
</li>
<li>
Number of pairs that had a loss and a profit
</li>
<li>
Maximum drawdown for the trading period
</li>
</ol>
<h4>
Yearly Results
</h4>
<li>
Yearly return
</li>
<li>
Yearly standard deviation
</li>
<li>
Yearly maximum drawdown
</li>
<li>
Sharpe Ratio
</li>
<li>
VaR and CVaR
</li>
</ol>
<h4>
Data structures
</h4>
<ul>
<li>
Pairs list for the trading period
</li>
<li>
Current trading capital balance
</li>
<li>
Trade position and P/L for each trade in the trading period.
</li>
<li>
Quarterly and yearly statistics. Once the statistics are calculated the trade position data can be discarded.
</li>
</ul>

In [None]:
import os
from datetime import datetime
from multiprocessing import Pool
from typing import List, Tuple, Dict

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import statsmodels.api as sm
from numpy import log
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.vector_ar.vecm import coint_johansen
from tabulate import tabulate

from coint_analysis.coint_analysis_result import CointAnalysisResult, CointInfo
from coint_data_io.coint_matrix_io import CointMatrixIO
#
# Local libraries
#
from plot_ts.plot_time_series import plot_ts, plot_two_ts
from read_market_data.MarketData import MarketData

from s_and_p_filter import s_and_p_directory, s_and_p_stock_file
s_and_p_file = s_and_p_directory + os.path.sep + s_and_p_stock_file

start_date_str = '2007-01-03'
start_date: datetime = datetime.fromisoformat(start_date_str)

trading_days = 252
half_year = int(trading_days/2)
quarter = int(trading_days/4)
