# Estimating Correlation - Applied


$$\rho_{j,k} = \frac{\sigma_{j,k}}{\sigma_j \sigma_k}$$

Where:  
$\rho_{j,k} = $ Correlation between stocks $j$ and $k$  
$\sigma_{j,k} = $ Covariance between stocks $j$ and $k$  
$\sigma_j = $ Standard deviation (total risk) of stock $j$

In [2]:
# Import packages
import pandas as pd
import numpy as np

In [3]:
df = pd.read_csv(r"C:\Users\rokhs\OneDrive\Courses\Investment Analysis & Portfolio Management with Python\43\estimating_correlation\data\15stocks_price.csv")  # stock price data

# Convert dates to timestamps and set date column as the index
df['date_gsheets'] = pd.to_datetime(df['date_gsheets'])
df.set_index('date_gsheets', inplace=True)

# Calculate returns for all securities
returns_df = df.pct_change(1)

# Drop / delete missing observations
returns_df.dropna(inplace=True)

In [4]:
returns_df.head()

Unnamed: 0_level_0,AAPL,KO,NFLX,BRK.B,DIS,IBM,VZ,WMT,GE,TSLA,MA,AMZN,MSFT,UN,V
date_gsheets,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
2012-01-04 16:00:00,0.005277,-0.006273,0.113372,-0.011329,0.014096,-0.004079,-0.013088,-0.010277,0.010893,-0.013177,-0.032845,-0.00849,0.023534,-0.013459,-0.017864
2012-01-05 16:00:00,0.011175,-0.004591,-0.013925,0.001693,0.016731,-0.004743,-0.006886,-0.004857,-0.000539,-0.021292,-0.010946,0.000563,0.010219,0.001161,0.007513
2012-01-06 16:00:00,0.010382,-0.006342,0.088261,-0.007019,0.01038,-0.011481,-0.015665,-0.007068,0.005391,-0.007743,-0.026958,0.028152,0.015535,-0.035373,-0.011774
2012-01-09 16:00:00,-0.001492,0.0,0.137875,-0.001309,-0.004009,-0.005204,0.001044,0.003051,0.01126,0.012635,0.008457,-0.022178,-0.013163,0.013526,-0.007943
2012-01-10 16:00:00,0.003485,0.005802,-0.024234,0.014812,-0.003019,-0.001542,0.005212,-0.002366,-0.007423,0.013578,0.008676,0.004368,0.003605,0.005635,-0.001201


In [5]:
# Estimate the covariance between Apple and Coca Cola
cov_aapl_ko = np.cov(returns_df['AAPL'], returns_df['KO'])[0][1]

In [6]:
cov_aapl_ko

2.7988347139283865e-05

In [7]:
# Estimate the standard deviation of Apple and Coca Cola
std_aapl = returns_df['AAPL'].std()
std_ko = returns_df['KO'].std()

In [8]:
# Estimate the correlation between Apple and Coca Cola
corr_aapl_ko = cov_aapl_ko / (std_aapl * std_ko)

In [9]:
corr_aapl_ko

0.20541727242368524

In [21]:
# Estimate the correlations across all securities
corr_matrix = returns_df.corr().round(2)

In [22]:
corr_matrix

Unnamed: 0,AAPL,KO,NFLX,BRK.B,DIS,IBM,VZ,WMT,GE,TSLA,MA,AMZN,MSFT,UN,V
AAPL,1.0,0.21,0.12,0.31,0.27,0.26,0.17,0.17,0.26,0.2,0.35,0.26,0.33,0.25,0.3
KO,0.21,1.0,0.09,0.47,0.37,0.32,0.37,0.31,0.35,0.14,0.36,0.23,0.32,0.41,0.33
NFLX,0.12,0.09,1.0,0.19,0.15,0.12,0.04,0.1,0.16,0.23,0.25,0.3,0.21,0.15,0.22
BRK.B,0.31,0.47,0.19,1.0,0.54,0.48,0.42,0.33,0.55,0.22,0.53,0.33,0.43,0.42,0.5
DIS,0.27,0.37,0.15,0.54,1.0,0.34,0.34,0.27,0.43,0.23,0.43,0.31,0.35,0.33,0.41
IBM,0.26,0.32,0.12,0.48,0.34,1.0,0.3,0.23,0.43,0.18,0.38,0.23,0.39,0.3,0.36
VZ,0.17,0.37,0.04,0.42,0.34,0.3,1.0,0.29,0.37,0.11,0.28,0.17,0.29,0.31,0.27
WMT,0.17,0.31,0.1,0.33,0.27,0.23,0.29,1.0,0.24,0.11,0.25,0.14,0.22,0.23,0.25
GE,0.26,0.35,0.16,0.55,0.43,0.43,0.37,0.24,1.0,0.19,0.4,0.26,0.34,0.33,0.38
TSLA,0.2,0.14,0.23,0.22,0.23,0.18,0.11,0.11,0.19,1.0,0.27,0.26,0.21,0.18,0.23


#### Most of the corrolations are weak. corr(V, MA)=0.77 is the strongest correlation (Visa and MasterCard).

In [28]:
# Explore the correlations of Netflix and Tesla with all other securities
corr_matrix[['NFLX', 'TSLA']]



# regardless of which security you look at the correlation of Netflix with that security is consistently weaker than the correlation of Tesla with that security.
#(Except for AMZN!)

Unnamed: 0,NFLX,TSLA
AAPL,0.12,0.2
KO,0.09,0.14
NFLX,1.0,0.23
BRK.B,0.19,0.22
DIS,0.15,0.23
IBM,0.12,0.18
VZ,0.04,0.11
WMT,0.1,0.11
GE,0.16,0.19
TSLA,0.23,1.0


In [26]:
# Explore the correlations of Apple, Mastercard, and Microsoft with all other securities
corr_matrix[['AAPL', 'MA', 'MSFT']]

#regardless of which security you look at, the correlation of Apple with that security is consistently weaker or lower than the correlation of MasterCard with that security and indeed compared to the correlation of Microsoft with that security.

Unnamed: 0,AAPL,MA,MSFT
AAPL,1.0,0.35,0.33
KO,0.21,0.36,0.32
NFLX,0.12,0.25,0.21
BRK.B,0.31,0.53,0.43
DIS,0.27,0.43,0.35
IBM,0.26,0.38,0.39
VZ,0.17,0.28,0.29
WMT,0.17,0.25,0.22
GE,0.26,0.4,0.34
TSLA,0.2,0.27,0.21


In [31]:
# Explore the correlations of Coca Cola and Berkshire Hathaway with all other securities
corr_matrix[['KO', 'BRK.B']]



Unnamed: 0,KO,BRK.B
AAPL,0.21,0.31
KO,1.0,0.47
NFLX,0.09,0.19
BRK.B,0.47,1.0
DIS,0.37,0.54
IBM,0.32,0.48
VZ,0.37,0.42
WMT,0.31,0.33
GE,0.35,0.55
TSLA,0.14,0.22


### It makes perfect sense that we chose to invest more of our money in Netflix, over Tesla and Apple, over MasterCard and Microsoft when we're

#### The risk of a portfolio of weakly correlated stocks is lower than the risk of a portfolio of strongly correlated stocks.

##### Minimising risk resulted in some (seemingly) counterintuitive results.

##### Weights can be allocated to riskier securities despite trying to minimise the risk of the portfolio.

##### Higher weights can be allocated to one stock over another, despite both stocks having approximately equal risk.

##### These “counterintuitive” allocations occurred because those securities have weaker correlations with the rest of the assets in the portfolio, compared to the alternative / “intuitive” securities.

### To minimise risk, greater weights were allocated to stocks which had weaker relationships with other securities