# Tutorial 5 - Correlation (资产和风险的相关性分析）
- Correlation measures association, but doesn't show if x causes y or vice versa
- Correlation is a statistic that measures the degree to which two variables move in relation to each other.
- In finance, the correlation can measure the movement of a stock with that of a benchmark index, such as the S&P 500.


### Formula
- $r = \frac{\sum(X - \bar{X})(Y - \bar{Y})}{\sqrt{\sum(X - \bar{X})^2 \sum(Y - \bar{Y})^2}}$
- $r$: the correlation factor
- $\bar{X}$: the average observations of $X$
- $\bar{Y}$: the average observations of $Y$

What does it mean?
- $r$ ranges between -1 and 1 (both inclusive)
- $r = 1$: Perfect positive correlation
- $r = -1$: Perfect negative correlation
- $r = 0$: No correlation at all

### Resources
- Correlation https://www.investopedia.com/terms/c/correlation.asp
- SP500 by Market Cap https://www.slickcharts.com/sp500

In [4]:
import pandas as pd
import yfinance as yf
import datetime as dt
import numpy as np

In [8]:
tickers = ['AAPL', 'MSFT', 'TWTR', 'IBM']
start = "2020-01-01"

data = yf.download(tickers, start)

[*********************100%***********************]  4 of 4 completed


In [7]:
data.head()

Unnamed: 0_level_0,Adj Close,Adj Close,Adj Close,Adj Close,Close,Close,Close,Close,High,High,...,Low,Low,Open,Open,Open,Open,Volume,Volume,Volume,Volume
Unnamed: 0_level_1,AAPL,IBM,MSFT,TWTR,AAPL,IBM,MSFT,TWTR,AAPL,IBM,...,MSFT,TWTR,AAPL,IBM,MSFT,TWTR,AAPL,IBM,MSFT,TWTR
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
2020-01-02,73.449402,110.232506,155.76178,32.299999,75.087502,129.46463,160.619995,32.299999,75.150002,129.942642,...,158.330002,31.959999,74.059998,129.063095,158.779999,32.310001,135480400,3293436,22622100,10694420.0
2020-01-03,72.735313,109.353386,153.822281,31.52,74.357498,128.432129,158.619995,31.52,75.144997,128.92926,...,158.059998,31.26,74.287498,127.695984,158.320007,31.709999,146322800,2482890,21116200,14440378.0
2020-01-06,73.31488,109.158012,154.21991,31.639999,74.949997,128.202682,159.029999,31.639999,74.989998,128.336517,...,156.509995,31.16,73.447502,127.552582,157.080002,31.23,118387200,2537073,20813700,12585831.0
2020-01-07,72.970078,109.23127,152.813751,32.540001,74.597504,128.288712,157.580002,32.540001,75.224998,129.024857,...,157.320007,31.719999,74.959999,127.810707,159.320007,31.799999,108872000,3232977,21634100,13484461.0
2020-01-08,74.143898,110.142975,155.247818,33.049999,75.797501,129.359467,160.089996,33.049999,76.110001,129.885284,...,157.949997,32.349998,74.290001,128.59465,158.929993,32.349998,132079200,4545916,27746500,14637344.0


## Find close value of the tickers

In [9]:
data = data['Adj Close']

In [10]:
data.head()

Unnamed: 0_level_0,AAPL,IBM,MSFT,TWTR
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2020-01-02,73.449379,110.232506,155.761795,32.299999
2020-01-03,72.735313,109.353386,153.822311,31.52
2020-01-06,73.314888,109.158028,154.219894,31.639999
2020-01-07,72.9701,109.231285,152.813782,32.540001
2020-01-08,74.143906,110.14296,155.247849,33.049999


## Calculate the returns for the tickers

In [11]:
log_returns = np.log(data/data.shift())

In [12]:
log_returns

Unnamed: 0_level_0,AAPL,IBM,MSFT,TWTR
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2020-01-02,,,,
2020-01-03,-0.009769,-0.008007,-0.012530,-0.024445
2020-01-06,0.007937,-0.001788,0.002581,0.003800
2020-01-07,-0.004714,0.000671,-0.009159,0.028048
2020-01-08,0.015958,0.008312,0.015803,0.015551
...,...,...,...,...
2023-02-13,0.018632,0.012823,0.030765,
2023-02-14,-0.004234,-0.009804,0.003128,
2023-02-15,0.013808,0.002863,-0.008025,
2023-02-16,-0.010484,-0.010317,-0.026983,


## Calculate the correclation with .corr function call

In [13]:
log_returns.corr()

Unnamed: 0,AAPL,IBM,MSFT,TWTR
AAPL,1.0,0.457332,0.804531,0.442221
IBM,0.457332,1.0,0.467942,0.269426
MSFT,0.804531,0.467942,1.0,0.453954
TWTR,0.442221,0.269426,0.453954,1.0


## Get SP500 index value and calculate the correlations

In [14]:
sp500 = yf.download("^GSPC", start)

[*********************100%***********************]  1 of 1 completed


In [15]:
log_returns['SP500'] = np.log(sp500['Adj Close']/sp500['Adj Close'].shift())

In [16]:
log_returns.corr()

Unnamed: 0,AAPL,IBM,MSFT,TWTR,SP500
AAPL,1.0,0.457332,0.804531,0.442221,0.824752
IBM,0.457332,1.0,0.467942,0.269426,0.672019
MSFT,0.804531,0.467942,1.0,0.453954,0.855459
TWTR,0.442221,0.269426,0.453954,1.0,0.510814
SP500,0.824752,0.672019,0.855459,0.510814,1.0


## Define a function to calculate correlations

In [17]:
def test_correlation(ticker):
    df = yf.download(ticker, start)
    lr = log_returns.copy()
    lr[ticker] = np.log(df['Adj Close']/df['Adj Close'].shift())
    return lr.corr()

In [18]:
test_correlation("LQD")

[*********************100%***********************]  1 of 1 completed


Unnamed: 0,AAPL,IBM,MSFT,TWTR,SP500,LQD
AAPL,1.0,0.457332,0.804531,0.442221,0.824752,0.2897
IBM,0.457332,1.0,0.467942,0.269426,0.672019,0.220319
MSFT,0.804531,0.467942,1.0,0.453954,0.855459,0.3169
TWTR,0.442221,0.269426,0.453954,1.0,0.510814,0.216833
SP500,0.824752,0.672019,0.855459,0.510814,1.0,0.360486
LQD,0.2897,0.220319,0.3169,0.216833,0.360486,1.0


In [19]:
test_correlation("TLT")

[*********************100%***********************]  1 of 1 completed


Unnamed: 0,AAPL,IBM,MSFT,TWTR,SP500,TLT
AAPL,1.0,0.457332,0.804531,0.442221,0.824752,-0.15546
IBM,0.457332,1.0,0.467942,0.269426,0.672019,-0.267279
MSFT,0.804531,0.467942,1.0,0.453954,0.855459,-0.129329
TWTR,0.442221,0.269426,0.453954,1.0,0.510814,-0.097514
SP500,0.824752,0.672019,0.855459,0.510814,1.0,-0.223706
TLT,-0.15546,-0.267279,-0.129329,-0.097514,-0.223706,1.0


Notice that TLT has negtive coorelations with many other tickers 

## Define an Visualization function to visulise any two tickers

In [20]:
import matplotlib.pyplot as plt
%matplotlib notebook

In [24]:
def visualize_correlation(ticker1, ticker2):
    df = yf.download([ticker1, ticker2], start)
    df = df['Adj Close']
    df = df/df.iloc[0]
    fig, ax = plt.subplots()
    df.plot(ax=ax)

In [26]:
visualize_correlation("AAPL", "TLT")

[*********************100%***********************]  2 of 2 completed


<IPython.core.display.Javascript object>

In [27]:
visualize_correlation("^GSPC", "TLT")

[*********************100%***********************]  2 of 2 completed


<IPython.core.display.Javascript object>

# End

In [None]:
2023.2.18