consider two regressions with opposite dependent variable and independent variable:

$$
\begin{align}
y = \alpha_{y}+\beta_{y}x+\epsilon_{y} \tag{1}\\
x = \alpha_{x}+\beta_{x}y+\epsilon_{x} \tag{2}\\
\end{align}
$$

If both of the equations hold, we can get that $\beta_{x}\beta_{y} = 1$, $\alpha_{y} = -\alpha_{x}\beta_{y}$.
We also have

$$
\begin{align}
E(y) = \alpha_{y}+\beta_{y}E(x) \tag{1}\\
E(x) = \alpha_{x}+\beta_{x}E(y) \tag{2}\\
\end{align}
$$

then 

$E(y)= \alpha_{y}+\beta_{y}(\alpha_{x}+\beta_{x}E(y))$

$E(y)(1-\beta_{x}\beta_{y}) = \alpha_{y}+\beta_{y}\alpha_{x}$

So, if we want this equation always be true, we have

$$
\begin{align}
\alpha_{y} = \alpha_{x} = 0 \tag{1}\\
\beta_{x}\beta_{y} = 1 \tag{2}\\
\end{align}
$$

Only if $\alpha_{y} = \alpha_{x} = 0$, $\beta_{y}\beta_{x} = 1$, but that not always happens.

Let's find out what $\beta_{y}\beta_{x}$ exactly is:

$$
\begin{align}
y - E(y) = \beta_{y}(x-E(x)) + \epsilon_{y} \tag{1}\\
x - E(x) = \beta_{x}(y-E(y)) + \epsilon_{x} \tag{2}\\
\end{align}
$$

then, $(y-E(y))(x-E(x))  \\ =  [\beta_{y}(x-E(x)) + \epsilon_{y}][\beta_{x}(y-E(y)) + \epsilon_{x}]
\\ = \beta_{y}\beta{x}(x-E(x))(y-E(y)) + \beta_{y}\epsilon_{x}(x-E(x)) + \beta_{x}\epsilon_{y}(y-E(y)) + \epsilon_{x}\epsilon_{y}$

do expectation on both sides:

$E(x-E(x))(y-E(y)) \\ = E{(x-E(x))[\beta_{y}(x-E(x)) + \epsilon_{y}]}  \\ = E[\beta_{y}(x-E(x))^2 + (x-E(x))\epsilon_{y}] \\ = \beta_{y}\sigma_{x}^2$

$cov(x,y) = \rho_{xy}\sigma_{x}\sigma_{y} = \beta_{y}\sigma_{x}^2 = \beta_{x}\sigma_{y}^2$

$R_{y}^2 = 1 - \frac{\sigma_{x}^2}{\sigma_{\epsilon_{y}}^2}$, $R_{x}^2 = 1 - \frac{\sigma_{y}^2}{\epsilon_{x}^2}$

**Finally we can get $\beta_{y}\beta_{x} = R^2$.**

When we trade a spread, we long one stock and short another. If we use OLS regression on the two stocks and calculate the hedge ratio $\beta$, this hedge ratio is not constant ($\beta_{1}$ is not equal to $\frac{1}{\beta_{2}}$) when we switch the independent variable and dependent variable. 

The OLS fit is not symmetrical because of a critical mathematical assumption behind the OLS algorithm; namely, that Y is a random variable and the sole source of variance, and that the X values are fi􏱱xed constants with zero variance. However, in a trading strategy, that is not good. We want the algorithm treat the two stocks symmetrically.

As a result, we can use TLS(total least square) instead of OLS, which treats X and Y symmetrically.

In practice, we can use PCA since PCA uses vertical distance between dots and regression line.

In [69]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'
import pandas as pd
import numpy as np
import pandas_datareader.data as web
from sklearn.linear_model import LinearRegression

In [70]:
# the first pair
AAPL = web.DataReader(name='AAPL',data_source='yahoo',start='2010-01-01')
GOOG = web.DataReader(name='GOOG',data_source='yahoo',start='2010-01-01')
# the second pair
IBM = web.DataReader(name='IBM',data_source='yahoo',start='2010-01-01')
SPY = web.DataReader(name='SPY',data_source='yahoo',start='2010-01-01')
# the third pair
DIA = web.DataReader(name='DIA',data_source='yahoo',start='2010-01-01')
# SPY

### AAPL & GOOG

#### OLS

In [71]:
def test_ols(data1,data2,name1,name2):    
    data1 = data1[['Adj Close']]
    data2 = data2[['Adj Close']]
    res11 = LinearRegression().fit(data1,data2)
    res12 = LinearRegression().fit(data2,data1)
    print('When '+ name1 +' is independent variable, beta = {:.4}'.format(res11.coef_[0][0]))
    print('When '+ name2 +' is independent variable, beta = {:.4}'.format(res12.coef_[0][0]))
    print('beta1 * beta2 = {:.2}'.format(res11.coef_[0][0] * res12.coef_[0][0]))
    
test_ols(AAPL,GOOG,'AAPL','GOOG')

When AAPL is independent variable, beta = 5.139
When GOOG is independent variable, beta = 0.1759
beta1 * beta2 = 0.9


#### TLS

In [72]:
def test_tls(data1,data2,name1,name2): 
    data1 = data1[['Adj Close']]
    data2 = data2[['Adj Close']]
    pca = PCA(n_components = 2)
    _ = pca.fit(pd.concat([data1,data2],axis=1))
    loadings = pca.components_ * np.sqrt(pca.explained_variance_)

    pca2 = PCA(n_components = 2)
    _ = pca2.fit(pd.concat([data2,data1],axis=1))
    loadings2 = pca2.components_ * np.sqrt(pca2.explained_variance_)
    print('When ' + name1 + ' is independent variable, beta = {:.4}'.format(-loadings2[0][0] / loadings2[1][0]))
    print('When ' + name2 + ' is independent variable, beta = {:.4}'.format(loadings[0][0] / loadings[1][0]))
    print('beta1 * beta2 = {:.2}'.format(loadings[0][0] / -loadings[1][0] * loadings2[0][0] / loadings2[1][0]))
    
test_tls(AAPL,GOOG,'AAPL','GOOG')

When AAPL is independent variable, beta = 5.668
When GOOG is independent variable, beta = 0.1764
beta1 * beta2 = 1.0


### IBM & SPY

In [73]:
print('OLS : ')
test_ols(IBM,SPY,'IBM','SPY')
print('===============================================')
print('TLS : ')
test_tls(IBM,SPY,'IBM','SPY')

OLS : 
When IBM is independent variable, beta = 0.8462
When SPY is independent variable, beta = 0.05505
beta1 * beta2 = 0.047
TLS : 
When IBM is independent variable, beta = -17.04
When SPY is independent variable, beta = -0.05868
beta1 * beta2 = 1.0


### DIA & SPY

In [74]:
print('OLS : ')
test_ols(DIA,SPY,'DIA','SPY')
print('===============================================')
print('TLS : ')
test_tls(DIA,SPY,'DIA','SPY')

OLS : 
When DIA is independent variable, beta = 1.117
When SPY is independent variable, beta = 0.8875
beta1 * beta2 = 0.99
TLS : 
When DIA is independent variable, beta = 1.122
When SPY is independent variable, beta = 0.8911
beta1 * beta2 = 1.0


In [75]:
print('Done !')

Done !
