# Pairs Trading with PCA Approach Theory Part


## References
- Advanced Pairs Trading: The Principal Component Analysis (PCA) Approach. https://docs.google.com/presentation/d/18zzKkUS_YKczjl6EoaDUmHAQ7isHz0F_I_6lcc_u_rA/edit#slide=id.gb9c45b0df4_0_90
- Applying Research: PCA and Pairs Trading. https://www.quantconnect.com/docs/v2/research-environment/applying-research/pca-and-pairs-trading
- Statistical Arbitrage in the U.S. Equities Market(2008) https://math.nyu.edu/~avellane/AvellanedaLeeStatArb071108.pdf
- Ornstein Uhlenbeck Mean Reversion Process.https://medium.com/the-quant-journey/ornstein-uhlenbeck-mean-reversion-process-3da2f8d19a0
- ETF Screener. https://finance.yahoo.com/etfs/?count=25&offset=0

## 1. Stock Returns Decomposition

### Idiosyncratic Components

#### Returns of Stocks

- Formula: $R_i=\beta_i*R_{mkt}+\tilde{R_i}$
  - $R_i$: uncorrelated idiosyncratic componenet
  - Remaining: returns related to systematic componenet
  - $i$: stock index
- Rewrite for multi-factor: $R_i=\sum_{j=1}^m \beta_{i,j}R_{mkt_j}+\tilde{R_i}$
  - $j$: factor index, assume has $m$ factors
- Then the idiosyncratic components can be rewritten as: $\tilde{R_i}=R_i-\sum_{j=1}^m \beta_{i,j}R_{mkt_j}$

#### Market-neutral Portfolio
- Formula: $\bar{\beta_j}=\sum_{i=1}^N\beta_{ij}Q_i=0$
  - Notation
    - $Q_i$: i=1,2,3,...N, amounts invested in each stock
  - Intuition: market neutral portfolio vanishes portfolio betas in average level.As concequences, the portfolio returns are <u>only affected by idiosyncratic componenent</u>.
    - Mathmatical expression: $\sum_{i=1}^NQ_iR_i=\sum_{i=1}^NQ_i\tilde{R_i}$


## 2. PCA Approach

### Stock Return Matrix
- General idea: use historical share price data on <u>cross-section</u> of $N$ stock in $M$ trading days.
- Formula: $R_{ik}=\frac{S_{i(t_0-(k-1)\Delta t)}-S_{i(t_0-k\Delta t)}}{S_{i(t_0-k\Delta t)}}$, for $k=1,2,...,M$ and $i=1,2,...,N$
  - $t_0$: given date, going back $M+1$ days as a matrix
  - $S_{it}$: price of stock $i$ at time $t$
  - $\Delta t=1/252$: daily observations
  - $k$: time index

### Return Standardization
- Formula: $Y_{ik}=\frac{R_{ik}-\bar{R_i}}{\bar{\sigma_i}}$
  - $\bar{R_{i}}=\frac{1}{M}\sum_{k=1}^MR_{ik}$
  - $\bar{\sigma}_i^2=\frac{1}{M-1}\sum_{k=1}^M(R_{ik}-\bar{R_i})^2$

### Correlation Matrix Eigenvectors特征向量
- Correlation matrix: $\rho_{ij}=\frac{1}{M-1}\sum_{k=1}^MY_{ik}Y_{jk}$
- `Eigenvalues` ranking list: $N≥\lambda_1>\lambda_2≥\lambda_3≥...≥\lambda_N≥0$
  - amount of variance along each eigenvector
  - it means **first** eigenvector accounts for **the largest spread** among data, the second eigenvector accounts for the second largest spread among data...
- Corresponding `eigenvectors`: $v^{(j)}=(v_1^{j},...,v_N^{j})$, for $j=1,2,...,N$
  - represent the *directions* of maximum variance
  - in the same eigenvector curve, the nearby coefficients belonging to firms in the same industry sometime is not so true, because of noise leading incoherence.

### Eigenportfolio Creation
- The Eigenportfolio returns can be:
  - Formula: $F_{jk}=\sum_{i=1}^N\frac{v_i^{(j)}}{\bar{\sigma_i}}R_{ik}=\sum_{i=1}^NQ_i^{(j)}R_{ik}$, where $j=1,2,...m$
    - firms with larger market cap tend to have smaller volatility


## 3. PCA Trading Strategy

### Modeling the Idiosyncratic Componenets with OU Process
- Formula: $d \tilde{X_i}(t)=\alpha_idt+dX_i(t)$
  - $\alpha_i$: drift of idiosyncratic component
  - $\alpha_idt$: the *excess rate of return* of the stock compared with market or sector return as benchmark over a certain period.
    - measures <u>systematic deviations</u> from sector
  - $dX_i(t)$: the <u>increment</u> of a stationary stochastic process which models price fluctuations related with over-reactions or idiosyncratic changes of stock price which <u>irrelevant to industry sector</u>.
    - OU Process, lag=1(AR(1)):
      - $dX_i(t)=𝜅_i(m_i-X_i(t))dt+\sigma_idW_i(t)$
        - $𝜅$: speed of mean-reversion. In our trading strategy, stocks with <u>fast mean-reversion</u> are our only interests and reject other stocks.
          - if $𝜅$>>1: stock reverts **quickly** to its mean, the <u>effect of drift</u> can be negligible.
        - $m_i$: mean value
  - assume $\alpha_i$, $𝜎_i$, $𝜅_i$, $m_i$ vary slowly in relation to $dW(t_i)$

### The S-Score

- Formula: $s_i=\frac{X_i(t)-m_i}{\sigma_{eq,i}}$
  - $\sigma_{eq,i}=\frac{\sigma_i}{\sqrt[2]{2\kappa_i}}$, the equilibrium variance
- Intuition: 
  - s-score measures how far away a given stock or eigenportfolio is from the <u>theoretical equilibrium value</u> associated with the model.
  - means the distance to equilibrium of cointegrated residual in unit std.
- Useful cases: only when **eigenportfolio shows a mean reversion speed($𝜅$) > threshold**, S-score is needed to be calculated.

### Trading Signal Generation Based on Mean Reversion and `S-Score`
- $s_i<-\bar{s_{bo}}$: open a long position
- $s_i<+\bar{s_{bc}}$: close long position
- $s_i<+\bar{s_{so}}$: open a short position
- $s_i>-\bar{s_{sc}}$: close short position
- In original paper, the specific close and open position thresholds' settings based on 2000~2004 ETF factors:
  - $\bar{s_{bo}}=\bar{s_{so}}=1.25$
  - $\bar{s_{bc}}=0.75$
  - $\bar{s_{sc}}=0.50$
- Opening a short position, **selling $1** of corresponding stock and **buying respective beta** values of stocks from scaled eigenvectors.

### Strategy Rationale
- Open trade eigenportfolio:
  - show good mean reversion speed($𝜅$)
  - S-score far from the equilibrum.
- Close trade eigenportfolio:
  - when S-score near 0

### Strategy Application in Detail
- Timing: 
  - Formation period=252 trading days, for correlation matrix estimation. 
  - Trading period=60 trading days, roughly one earnings cycle, for idiosyncratic component estimation.
- Slippage/transcation cost: 5bps
- Numbers of PCA factor: $m$=15 in orginial paper.
  - 15 PCA factors method returns > Actual ETF factor > Synthetic ETF factor



In [None]:
#import python libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
!pip install yfinance
import yfinance as yf
!pip install backtrader
import backtrader as bt
!pip install pyfolio
import pyfolio as pf

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting backtrader
  Downloading backtrader-1.9.76.123-py2.py3-none-any.whl (410 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m410.1/410.1 kB[0m [31m3.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: backtrader
Successfully installed backtrader-1.9.76.123
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pyfolio
  Downloading pyfolio-0.9.2.tar.gz (91 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m91.1/91.1 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting empyrical>=0.5.0
  Downloading empyrical-0.5.5.tar.gz (52 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m52.8/52.8 kB[0m [31m5



In [None]:
#grasp stocks data in different industries from yfinance
ETF=['BLOK','QLD','XLY','QQQ','XLG','XLK','ESG','XLE','PTH','GLD','SLV','XLF','SPY','SECT','XLB','VO','XLU','EMLP','KURE','GOAU','XLY','PTF','IGM','IWY','LIT','CHIS','KGRN']
startdate='2019-01-01'
enddate='2023-03-31'

In [None]:
data=yf.download(ETF,startdate,enddate,progress=False)
data.head()

Unnamed: 0_level_0,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,...,Volume,Volume,Volume,Volume,Volume,Volume,Volume,Volume,Volume,Volume
Unnamed: 0_level_1,BLOK,CHIS,EMLP,ESG,GLD,GOAU,IGM,IWY,KGRN,KURE,...,SLV,SPY,VO,XLB,XLE,XLF,XLG,XLK,XLU,XLY
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
2019-01-02,12.655462,13.388494,18.128712,56.25832,121.330002,11.061152,169.879044,69.095276,16.076799,17.013,...,12310400,126925200,631200,9096300,24892600,62945000,136100,15442900,25173900,6840800
2019-01-03,12.385299,13.388494,18.204924,54.757732,122.43,11.244219,162.628052,66.829712,15.675898,16.42,...,12933200,144140700,843900,9834000,18024100,65729700,64500,24946700,21587500,6346000
2019-01-04,12.934068,13.388494,18.763769,56.726494,121.440002,11.234584,170.399857,69.586975,16.322517,17.070999,...,17854300,142628800,668500,8950600,21351500,64638400,68100,20767800,19003500,7269100
2019-01-07,13.254887,13.388494,18.941587,57.242691,121.860001,11.203752,172.610504,70.194321,16.322517,16.969999,...,7432300,103139100,3359300,7468300,18056700,48167000,66900,11908600,16267700,6263100
2019-01-08,13.322428,13.388494,19.237942,57.593117,121.529999,11.224948,174.693481,70.965584,16.322517,17.252001,...,6776400,102512600,624200,10328600,18692300,90114700,20200,13002600,16643500,9391000




In [None]:
#change prices to returns
prices=data['Adj Close']
returns=prices.pct_change().dropna()
returns.head()

Unnamed: 0_level_0,BLOK,CHIS,EMLP,ESG,GLD,GOAU,IGM,IWY,KGRN,KURE,...,SLV,SPY,VO,XLB,XLE,XLF,XLG,XLK,XLU,XLY
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2019-01-03,-0.021348,0.0,0.004204,-0.026673,0.009066,0.01655,-0.042683,-0.032789,-0.024937,-0.034856,...,0.013049,-0.023863,-0.019888,-0.028358,-0.009918,-0.022481,-0.029296,-0.050468,-0.000192,-0.021652
2019-01-04,0.044308,0.0,0.030697,0.035954,-0.008086,-0.000857,0.047789,0.041258,0.041249,0.039647,...,-0.001356,0.033496,0.032808,0.039319,0.034024,0.03322,0.035715,0.04432,0.014808,0.033094
2019-01-07,0.024804,0.0,0.009477,0.0091,0.003458,-0.002744,0.012973,0.008728,0.0,-0.005916,...,-0.004073,0.007885,0.013265,0.00351,0.014865,0.001237,0.006114,0.008943,-0.006822,0.022612
2019-01-08,0.005096,0.0,0.015646,0.006122,-0.002708,0.001892,0.012068,0.010988,0.0,0.016618,...,0.001363,0.009395,0.011889,0.010494,0.007735,0.000823,0.00887,0.00838,0.012402,0.011056
2019-01-09,0.0064,0.0,0.002641,0.00458,0.006418,0.018198,0.009055,0.006385,0.0,0.02191,...,0.006127,0.004673,0.008811,0.000385,0.015842,0.004936,0.00293,0.012946,-0.006031,0.005084




In [None]:
#calculate the S_scores and betas
