<h2>
Pairs Trading
</h2>
<p>
This document is available on the GitHub repository https://github.com/IanLKaplan/pairs_trading
</p>
<blockquote>
<p>
Pairs trading is an approach that takes advantage of the
mispricing between two (or more) co-moving assets, by
taking a long position in one(many) and shorting the
other(s), betting that the relationship will hold and that
prices will converge back to an equilibrium level.
</p>
<p>
<i>Definitive Guide to Pairs Trading</i> availabel from <a href="https://hudsonthames.org/">Hudson and Thames</a>
</p>
</blockquote>
<p>
Pairs trading is a statistical arbitrage trading strategy.
</p>
<blockquote>
<p>
Statistical arbitrage and pairs trading tries to solve this problem using price relativity. If two assets share the same
characteristics and risk exposures, then we can assume that their behavior would be similar as well. This has
the benefit of not having to estimate the intrinsic value of an asset but rather just if it is under or overvalued
relative to a peer(s). We only have to focus on the relationship between the two, and if the spread happens
to widen, it could be that one of the securities is overpriced, the other is underpriced, or the mispricing is a
combination of both.
</p>
<p>
<i>Definitive Guide to Pairs Trading</i> availabel from <a href="https://hudsonthames.org/">Hudson and Thames</a>
</p>
</blockquote>
<p>
Pairs trading algorithms have been reported to yield portfolios with Sharpe ratios in excess of 1.0 and returns of 10% or
higher. Pairs trading takes both long and short positions, so the portfolio tends to be market neutral. A pairs trading portfolio
can have drawdowns, but the drawdowns are much less than a benchmark like the S&P 500.
</p>
<p>
Markets tend toward efficiency and many quantitative approaches fade over time as they are adopted by hedge funds. Pairs trading
goes back to the mid-1980s. The approach still seems to be profitable. The reason for this could be that there are a vast
number of possible pairs and the pairs portfolio's tend to be fairly small (5 to 20 pairs, in most cases). This may always
leave unexploited pairs in the market.
</p>
<p>
This Python notebook investigates pairs trading algorithms in an attempt reproduce the reported results.
</p>
<h3>
Overview
</h3>
<p>
Pairs trading algorithms attempt to identify pairs of stocks that have mean reverting behavior. Mean reversion takes place
around a common mean between the two stocks. The strategy is based on the idea that when one of the pairs is above or below
the mean, it will tend to revert back to the mean.
</p>
<p>
The stock that is above the mean is shorted, on the speculation that it's price will decrease as it reverts toward the mean.
The stock below the mean is bought, on the speculation that it's will increase as it reverts toward the mean.
</p>
<p>
Pairs trading is a "market neutral" strategy since both long a short positions are taken in related stocks.
</p>
<p>
One of the primary references used in this notebook is the book <i>Pairs Trading</i> by Ganapathy Vidyamurthy.
</p>
<p>
Pairs trading involves two logical steps:
</p>
<ol>
<li>
<p>
Identify a pair of stocks that are likely to have mean reverting behavior using a lookback period.
</p>
</li>
<li>
<p>
Trading the stocks using the long/short strategy over the trading period.
</p>
</li>
</ol>
<h2>
Pairs Selection
</h2>
<h3>
S&P 500 Industry Sectors
</h3>
<p>
In choosing pairs, we want to find stocks that have similar market characteristics. Such stocks are more likely to have a
common mean with a relatively narrow spread. In the book <i>Pairs Trading</i> the author suggests using a factor model
to select stock pairs. In the book factor models and asset pricing theory (APT) are discussed, but the author does not
suggest which factors would be useful in pairs selection.
</p>
<p>
Macroeconomic and microeconomic factors are used to build factor models.
</p>
<p>
Macroeconomic factors include factors like company intrest rate exposure and economic cycles exposure (e.g., recession,
economic expansion).  These factors may be reasonable choices for long term investment portfolios, but are not appropriate
for a pairs trading portfolio which relies on short term trading.
</p>
<p>
Microeconomic factors include features like momentum, price earnings, gross profits to assets, EBITDA and other corporate factors.
These factors may be effective in building an investment protfolio but they are not strong indicators of market similarity.
</p>
<p>
In this notebook the S&P 500 industry sectors are used as an initial filter for pairs selection.  Stocks within a sector
are more likely to exhibit related behavior making them better canidates for pairs trading. By limiting pairs to industry
sectors the number of possible pairs combinations is also limited.
</p>
<p>
The S&P 500 stocks and their related industries were downloaded from barchart.com.  The files are included in this GitHub repository.
</p>
<p>
The S&P 500 sectors are:
</p>
<ol>
<li>
Consumer discressionary
</li>
<li>
Consumer staples
</li>
<li>
Energy
</li>
<li>
Financials
</li>
<li>
Health care
</li>
<li>
Industrials
</li>
<li>
Info tech
</li>
<li>
Materials
</li>
<li>
Real estate
</li>
<li>
Communication services
</li>
<li>
Utilities
</li>
</ol>
<p>

</p>


In [12]:
import os
import pandas as pd
from tabulate import tabulate

s_and_p_file = 's_and_p_sector_components/sp_stocks.csv'


def read_stock_data(path: str) -> pd.DataFrame:
    s_and_p_stocks = pd.DataFrame
    if os.access(path, os.R_OK):
        s_and_p_stocks = pd.read_csv(s_and_p_file)
    else:
        print(f'Could not read file {s_and_p_file}')
    return s_and_p_stocks


def extract_sectors(stocks_df: pd.DataFrame) -> dict:
    """
    Columns in the DataFrame are Symbol,Name,Sector
    :param stocks_df:
    :return:
    """
    sector: str = ''
    sector_l: list = list()
    stock_sectors = dict()
    for t, stock_info in stocks_df.iterrows():
        if sector != stock_info['Sector']:
            if len(sector_l) > 0:
                stock_sectors[sector] = sector_l
                sector_l = list()
            sector = stock_info['Sector']
        sector_l.append(stock_info['Symbol'])
    stock_sectors[sector] = sector_l
    return stock_sectors


def calc_pair_counts(sector_info: dict) -> pd.DataFrame:
    column_label = ['num stocks', 'num pairs']
    sectors = list(sector_info.keys())
    counts_l: list = list()
    n_l: list = list()
    for sector in sectors:
        n = len(sector_info[sector])
        n_l.append(n)
        count = sum(range(1, n-1))
        counts_l.append(count)
    info_df = pd.DataFrame(n_l)
    info_df = pd.concat([info_df, pd.DataFrame(counts_l)], axis=1)
    info_df.columns = column_label
    sum_pairs = sum(counts_l)
    blank_df = pd.DataFrame([' '])
    sum_df = pd.DataFrame([sum_pairs])
    row_df = pd.concat([blank_df, sum_df], axis=1)
    row_df.columns = column_label
    info_df = pd.concat([info_df, row_df], axis=0)
    sectors.append('Sum')
    info_df.index = sectors
    return info_df


stock_info_df = read_stock_data(s_and_p_file)
sectors_df = extract_sectors(stock_info_df)
info_df = calc_pair_counts(sectors_df)


<p>
The table below shows the number of unique pairs for each sector and the total number of pairs. By drawing pairs from sectors, rather than
the whole S&P 500 set of stocks, the number of possible pairs is reduced from 124,251.
</p>

In [13]:

print(tabulate(info_df, headers=[*info_df.columns], tablefmt='fancy_grid'))

╒════════════════════════╤══════════════╤═════════════╕
│                        │ num stocks   │   num pairs │
╞════════════════════════╪══════════════╪═════════════╡
│ information-technology │ 76           │        2775 │
├────────────────────────┼──────────────┼─────────────┤
│ communication-services │ 26           │         300 │
├────────────────────────┼──────────────┼─────────────┤
│ energies               │ 21           │         190 │
├────────────────────────┼──────────────┼─────────────┤
│ utilities              │ 29           │         378 │
├────────────────────────┼──────────────┼─────────────┤
│ real-estate            │ 31           │         435 │
├────────────────────────┼──────────────┼─────────────┤
│ consumer-discretionary │ 58           │        1596 │
├────────────────────────┼──────────────┼─────────────┤
│ financials             │ 66           │        2080 │
├────────────────────────┼──────────────┼─────────────┤
│ industrials            │ 72           │       



<h3>
Linear Regression and Cointegration
</h3>
<p>
In their paper, the authors Quynh Bui and Robert Ślepaczuk write that they found correlation (on the stock prices) to be a
stronger identification of pairs than either cointegration or the
<a href="http://bearcave.com/misl/misl_tech/wavelets/hurst/index.html">Hurst exponent</a>.
</p>
<p>
The problem with using correlation alone is that stocks with high correlation may do not necessarly have mean reversion
behavior. A better approach is to use correlation as the first filter and then apply cointegration to determine whether
the stocks are mean reverting.
</p>
<h3>
Cointegration
</h3>
<p>
Cointegration is used to recognize asset pairs that are canidates for pairs trading. See the reference section for discussions on cointegration.
</p>
<p>
There are two common cointegration tests:
</p>
<ul>
<li>
Engle-Granger (linear regression apporach).
</li>
<li>
Johansen
</li>
</ul>
<p>
A number of authors perfer the Johansen test. Kris Longmore (of Robot Wealth, see references)
prefers to Engle-Granger linear regression. Engle-Granger is easier to understand and is used in this notebook.
</p>


<h2>
References
</h2>
<ol>
<li>
<i>Pairs Trading: Quantitative Method and Analysis</i> by Ganapathy Vidyamurthy, 2004, John Wiley and Sons
</li>
<li>
<a href="https://www.researchgate.net/publication/5217081_Pairs_Trading_Performance_of_a_Relative_Value_Arbitrage_Rule">Pairs Trading: Performance of a Relative Value Arbitrage Rule</a>, February 2006
</li>
<li>
<a href="https://www.quantconnect.com/tutorials/strategy-library/intraday-dynamic-pairs-trading-using-correlation-and-cointegration-approach">Intraday Dynamic Pairs Trading using Correlation and Cointegration</a>
</li>
<li>
<a href="https://bsic.it/pairs-trading-building-a-backtesting-environment-with-python/">Pairs Trading: building a backtesting environment with Python</a>
</li>
<li>
<a href="https://www.sciencedirect.com/science/article/pii/S037843712100964X">Applying Hurst Exponent in pair trading strategies
on Nasdaq 100 index</a>
by Quynh Bui and Robert Ślepaczuk
</li>
<li>
<a href="https://www.sciencedirect.com/science/article/pii/S2214845021000880">Pairs trading: is it applicable to exchange-traded funds?</a>
</li>
<li>
<a href="https://hudsonthames.org/an-introduction-to-cointegration/">An Introduction to Cointegration for Pairs Trading By Yefeng Wang</a>
</li>
<li>
<a href="https://www.tradelikeamachine.com/blog/cointegration-pairs-trading/part-1-using-cointegration-for-a-pairs-trading-strategy"><i>Using Cointegration for a Pairs Trading Strategy</i> Martyn Tinsley</a>
</li>
<li>
<a href="https://letianzj.github.io/cointegration-pairs-trading.html">Quantitative Trading and Systematic Investing by Letian Wang</a> This
post includes a discussion on how the results of Johansen cointegration can be interpreted.
</li>
<li>
<a href="https://robotwealth.com/practical-pairs-trading/">Pairs Trading on the Robot Wealth blog by Kris Longmore</a>
</li>
<li>
<a href="https://pykalman.github.io/">pykalman: the dead-simple Kalman Filter, Kalman Smoother, and EM library for Python</a>
</li>
</ol>