# Tutorial 6 - Linear Regression among two variables
- Attempts to establish how X causes Y

### Correlation vs Linear Regression

![image.png](attachment:image.png)

**Similarities:**
- Quantify the direction and strength of the relationship

**Differences:**
- Correlation is a single statistic
- Linear regression produces an equation

**Resources:**
1.	The difference betweencorrelati0onand linear regression. https://www.graphpad.com/support/faq/what-is-the-difference-between-correlation-and-linear-regression/
2.	Linear regression model in Python. https://www.kdnuggets.com/2019/03/beginners-guide-linear-regression-python-scikit-learn.html


In [5]:
#pip install scikit-learn


Collecting scikit-learn
  Downloading scikit_learn-1.0.2-cp39-cp39-win_amd64.whl (7.2 MB)
Collecting joblib>=0.11
  Downloading joblib-1.1.0-py2.py3-none-any.whl (306 kB)
Collecting threadpoolctl>=2.0.0
  Downloading threadpoolctl-3.1.0-py3-none-any.whl (14 kB)
Installing collected packages: threadpoolctl, joblib, scikit-learn
Successfully installed joblib-1.1.0 scikit-learn-1.0.2 threadpoolctl-3.1.0
Note: you may need to restart the kernel to use updated packages.


You should consider upgrading via the 'D:\program\python.exe -m pip install --upgrade pip' command.


In [6]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
import pandas_datareader as pdr
import datetime as dt
import matplotlib.pyplot as plt
%matplotlib notebook

## Generate some random data by np.random.randn()

In [15]:
X = np.random.randn(50)
X.mean()
X.max(0)
X

array([-1.29120668,  0.89876875,  0.62136298, -1.77485735, -1.32332358,
        0.44460251,  0.34618638,  0.2274354 ,  1.01687967, -1.61773737,
       -0.79597595,  0.73135185, -0.3230113 , -0.34359892,  0.7604484 ,
        0.27383557,  0.2067572 ,  0.58148857, -0.78694845,  0.95124811,
        0.58560806,  0.78643257,  0.45970807, -0.6711942 , -0.16192873,
        0.96953064,  0.86044378,  1.44559703, -1.50096065, -0.45781373,
        0.63862038,  0.36806444, -0.16382281,  0.47196025, -0.24560493,
       -0.71465557,  0.82533663,  0.21642004,  0.35961498,  0.65395731,
        1.28014845, -0.10775422, -0.98605364,  0.76457436, -0.06721022,
       -1.21225775,  1.51450854, -0.65910376,  0.18893577,  0.74196295])

In [7]:
X = np.random.randn(5000)
Y = np.random.randn(5000)

fig, ax = plt.subplots()
ax.scatter(X, Y, alpha=.2)

<IPython.core.display.Javascript object>

<matplotlib.collections.PathCollection at 0x1af1da8bbb0>

## Calculate Liner Regression

In [16]:
tickers = ['AAPL', 'TWTR', 'IBM', 'MSFT', '^GSPC']
start = dt.datetime(2020, 1, 1)

data = pdr.get_data_yahoo(tickers, start)

In [17]:
data = data['Adj Close']

In [18]:
data.head()

Symbols,AAPL,TWTR,IBM,MSFT,^GSPC
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2020-01-02,73.894325,32.299999,115.726624,157.289886,3257.850098
2020-01-03,73.175934,31.52,114.80368,155.331345,3234.850098
2020-01-06,73.758995,31.639999,114.598587,155.732849,3246.280029
2020-01-07,73.412117,32.540001,114.675468,154.312927,3237.179932
2020-01-08,74.593033,33.049999,115.632599,156.770889,3253.050049


In [19]:
log_returns = np.log(data/data.shift())

In [20]:
log_returns

Symbols,AAPL,TWTR,IBM,MSFT,^GSPC
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2020-01-02,,,,,
2020-01-03,-0.009769,-0.024445,-0.008007,-0.012530,-0.007085
2020-01-06,0.007936,0.003800,-0.001788,0.002581,0.003527
2020-01-07,-0.004714,0.028048,0.000671,-0.009159,-0.002807
2020-01-08,0.015958,0.015551,0.008312,0.015803,0.004890
...,...,...,...,...,...
2022-02-18,-0.009400,-0.031831,-0.004974,-0.009678,-0.007192
2022-02-22,-0.017973,-0.041344,-0.003464,-0.000730,-0.010195
2022-02-23,-0.026205,-0.005176,-0.015042,-0.026234,-0.018584
2022-02-24,0.016543,0.065568,-0.000820,0.049831,0.014846


In [32]:
def linear_regression(ticker_a, ticker_b):
    X = log_returns[ticker_a].iloc[1:].to_numpy().reshape(-1, 1)
    Y = log_returns[ticker_b].iloc[1:].to_numpy().reshape(-1, 1)

    lin_regr = LinearRegression()
    lin_regr.fit(X, Y)

    Y_pred = lin_regr.predict(X)

    alpha = lin_regr.intercept_[0]
    beta = lin_regr.coef_[0, 0]

    fig, ax = plt.subplots()
    ax.set_title("Alpha: " + str(round(alpha, 5)) + ", Beta: " + str(round(beta, 3)))
    
    # plot the X, Y returns
    ax.scatter(X, Y)
    # Plot line X and Y_pred
    ax.plot(X, Y_pred, c='r')

In [33]:
linear_regression("AAPL", "^GSPC")

<IPython.core.display.Javascript object>

In [34]:
linear_regression("AAPL", "MSFT")

<IPython.core.display.Javascript object>

In [35]:
linear_regression("AAPL", "TWTR")

<IPython.core.display.Javascript object>

# END