## Investigating a Pairs Trading Strategy

Pairs Trading is a mean reversion strategy that uses the idea that if two simillar moving stocks divert from their usual trend, they should theoretically move back to their average. There are many ways to do this, in this script we will be using the ratio between their daily percentage change to determine the mean; however, many others just use price data which is another option we can eventually test. 

To backtest this strategy, we will use data from the Yahoo Finance API and structure the data using Pandas. 

In [118]:
#Installing all Libraries Needed
!pip install yfinance
import yfinance as yf
import pandas as pd
import numpy as np



We will need to get historical data from YFinance in two different sections. We need data to calculate the statistical values that our strategy is based off (get_p_data), and then we need data to backtest that strategy on to see how well it performs (get_b_data). 

In [119]:
def get_p_data(stock): 
    stock = yf.Ticker(stock)
    stockhist = pd.DataFrame(stock.history(period='340d'))
    split = int(len(stockhist)/2)
    stockhist = stockhist[:split]
    return stockhist
def get_b_data(stock): 
    stock = yf.Ticker(stock)
    stockhist = pd.DataFrame(stock.history(period='340d'))
    split = int(len(stockhist)/2)
    stockhist = stockhist[split:]
    return stockhist

Next, we need the ratios in their percent change for every day.

In [120]:
 def calc_ratios(stock1,stock2,porb='P'):
    if porb == 'B':
        stock1hist = get_b_data(stock1)
        stock2hist = get_b_data(stock2)
    else:
        stock1hist = get_p_data(stock1)
        stock2hist = get_p_data(stock2)
    pct_change = pd.DataFrame()
    name1 = stock1 + ' Change'
    name2 = stock2 + ' Change'
    pct_change[name1] = stock1hist['Close'].pct_change()
    pct_change[name2] = stock2hist['Close'].pct_change()
    pct_change['Ratio'] = np.divide(pct_change[name1], pct_change[name2])
    return pct_change.drop(pct_change.index[0])

This strategy utilizes the average of those ratios and identifies anywhere where that ratio significantly diverts from that average. This part of the script calculates that mean, removes any significant outliers, and then calculates the mean again with the cleaner data, along with the upper and lower boundraies that identify the diversions.  

In [125]:
def calc_stats(stock1,stock2):
    ratios = calc_ratios(stock1, stock2)
    maximum = 0 
    maxindex = 0
    minimum = 1000000
    minindex = 0 
    
#This part of the script removes the largest and smallest ratio, allowing us to base our eventual mean on more accurate data. 
    for i in range(len(ratios['Ratio'])):
        if ratios['Ratio'][i] > maximum:
            maximum = ratios['Ratio'][i]
            maxindex = i 
        elif ratios['Ratio'][i] < minimum:
            minimum = ratios['Ratio'][i]
            minindex = i 

    ratios=ratios.drop(ratios.index[maxindex])
    ratios=ratios.drop(ratios.index[minindex])

#Here we are calculating our mean and standard deviations in order to remove any further outliers. 
    mean = np.average(ratios['Ratio'])
    std = (np.std(ratios['Ratio']))
    upperstd = mean + std
    lowerstd = mean - std
    upperout = mean +(1.5*std)
    lowerout = mean -(1.5*std)
    
    ratios = ratios[ratios['Ratio'] < upperout]
    ratios = ratios[ratios['Ratio'] > lowerout]
    
#Once the outliers are removed we can recalculate our more accurate mean as well as out upper and lower boundries for the strategy.  
    mean = np.average(ratios['Ratio'])
    std = (np.std(ratios['Ratio']))
    upperstd = mean + (2.25*std)
    lowerstd = mean - (2.25*std)
    return (mean,upperstd, lowerstd)

Now that we have our mean, upper bound, and lower bound, we can use those values to backtest if the strategy works, which is what this section of the script does. We will be buying one of the stocks and shorting the other during this backtest and seeing if the result of doing both trades in a profit. 

*Credits to Aryan Padmanabhan for guiding me on how to simplify this backtesting code 

In [133]:
def back_testing(stock1, stock2):
    pos = "none"
    win = 0
    total = 0
    enterTimes = []
    enterPrices = []

    stock1Data = get_b_data(stock1)
    stock2Data = get_b_data(stock2)
    ratioData = calc_ratios(stock1,stock2,'B')
    statvalues = calc_stats(stock1,stock2)
    mean = statvalues[0]
    upperstd = statvalues[1]
    lowerstd = statvalues[2]
    name1 = stock1 + ' Change'
    name2 = stock2 + ' Change'

    dayData1 = stock1Data["Close"]
    dayData2 = stock2Data['Close']
    nextData1 = stock1Data["Close"].shift(-4)
    nextData2 = stock2Data["Close"].shift(-4)
    
    buy = []
    sell = []
    for i in ratioData.index:
        if ratioData[name1][i] > ratioData[name2][i] :
            sell.append(stock1)
            buy.append(stock2)
        else:
            sell.append(stock2)
            buy.append(stock1)
    ratioData['Buy']=buy
    ratioData['Sell']=sell
    
    for i in range(len(dayData1.index)-1):
        if pos == 'none':
            if ratioData['Ratio'][i] > upperstd or ratioData['Ratio'][i] < lowerstd:
                pos = 'entered'
        elif pos == 'entered':
            price1 = dayData1[i]
            price2 = dayData2[i]
            nextPrice1 = nextData1[i]
            nextPrice2 = nextData2[i]
            if ratioData['Buy'][i] == stock1:
                buy_pct_change = (nextPrice1 - price1)/price1
                sell_pct_change = abs(price2-nextPrice2)/price2
            else:
                buy_pct_change = (nextPrice2 - price2)/price2
                sell_pct_change = abs(price1-nextPrice1)/price1
            if (buy_pct_change + sell_pct_change) > 0.01:
                win += 1
                total += 1
                pos = 'none'
            else:
                total += 1
                pos = 'none'
    try: 
        winrate = (win/total) * 100 
    except:
        winrate = 'NA'
     
    print (f'Strategy Results with {stock1} and {stock2} --  Trades: {win}/{total} | Success Rate: {winrate}%')
                
            
    
    

Choosing which pairs to use this strategy with is is a whole different task. Ideally, you want 2 stocks that are simillar but at the same time move differently enough for there to be enough diversions for the strategy to identify as tradiing opportunities. There are ways to identify this using a heat map which is an upgrade that can be made to this strategy in the future. 

Here are the stocks I chose to test:

In [142]:
back_testing('SPY','QQQ')
back_testing('SPY','SLB')
back_testing('QQQ','XLE')
back_testing('UNH','AAPL')
back_testing('SPY','XLE')



Strategy Results with SPY and QQQ --  Trades: 10/18 | Success Rate: 55.55555555555556%
Strategy Results with SPY and SLB --  Trades: 16/25 | Success Rate: 64.0%
Strategy Results with QQQ and XLE --  Trades: 12/17 | Success Rate: 70.58823529411765%
Strategy Results with UNH and AAPL --  Trades: 17/23 | Success Rate: 73.91304347826086%
Strategy Results with SPY and XLE --  Trades: 10/12 | Success Rate: 83.33333333333334%


Across this specific time frame the script has determined a success rates ranging from 55.5% to 83.3% (mattering on the stocks we are using), which could possibly be profitable in the long term especially in combination with other strategies. However, there are some things we need to take into consideration before considering this viable. 

1: This strategy heavily relies on a stable market. In times of extremely high volitilty the statistical calulations the strategy currently uses falls apart. This could be fixed with more rigourous statistical models, but right now we must be wary of volitility as a factor. 

2: This strategy also heavily relies on this time frame of data that we decide to use, both to base our statistical values on as well as the time frame we back test on. In my experimentation, a larger time frame includes data points that aren't relevent enough to the present, and a shorter time frame tens to not inculde enough data to calculate an accurate enough mean. So the time frame I chose to use in this script is basically in a Goldilocks zone which is really finicky and will need to be adjusted over time.

## Conclusion
Overall, the results don't lead me to confidently conclude that this strategy is a success or a failure as there are too many variable factors, especially in long term implementation. Using Pairs Trading as a way to add an extra layer of confidence to another trading strategy would be better than using this by itself. With more rigourous statistical models and more experimentation I hope to improve this script over time and make a stronger conclusion in the future. 