<a href="https://colab.research.google.com/github/Brent-Morrison/Misc_scripts/blob/master/rolling_ipc_py.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Rolling portfolio correlations - Python   

This short notebook will introduce a method to calculate rolling correlations of time series data using Python.  The intent is not to create a rolling pairwise correlation, but rather the average of each pairwise correlation of a group of time series.  Data is a group of stocks prices forming two portfolios.  

This is a companion to the R implementation of the same functionality.

Data will come from a csv saved in my github repository.

In [1]:
import pandas as pd
import numpy as np 
import datetime
import random

### The data 

Like the R version of this exercise, I am pulling some stock data from my github repository. 

In [2]:
csv = 'https://github.com/Brent-Morrison/Misc_scripts/raw/master/daily_price_ts_vw_20201018.csv'
daily_price_ts_vw_20201018 = pd.read_csv(csv)

daily_price_ts_vw_20201018.tail()

Unnamed: 0,symbol,sector,date_stamp,close,adjusted_close,volume,sp500
1751,AMD,2,2020-07-29,76.09,76.09,132969679.0,3258.439941
1752,AMD,2,2020-07-30,78.2,78.2,80286888.0,3246.219971
1753,AMD,2,2020-07-31,77.43,77.43,71699667.0,3271.120117
1754,AMD,2,2020-08-03,77.67,77.67,42628817.0,3294.610107
1755,AMD,2,2020-08-04,85.04,85.04,155676106.0,3306.51001


In [3]:
daily_price_ts_vw_20201018.head()

Unnamed: 0,symbol,sector,date_stamp,close,adjusted_close,volume,sp500
0,AAL,1,2020-01-02,29.09,28.988,6275633.0,3257.850098
1,AAL,1,2020-01-03,27.65,27.5531,14020066.0,3234.850098
2,AAL,1,2020-01-06,27.32,27.2242,6108646.0,3246.280029
3,AAL,1,2020-01-07,27.22,27.1246,6197079.0,3237.179932
4,AAL,1,2020-01-08,27.84,27.7424,10497296.0,3253.050049


# Averaging a correlation matrix  

Construct a small data frame to serve as dummy data for development.

In [4]:
mtrx = np.array([[1,2,3,4], [2,1,5,6], [3,5,1,7], [4,6,7,1]])
mtrx

array([[1, 2, 3, 4],
       [2, 1, 5, 6],
       [3, 5, 1, 7],
       [4, 6, 7, 1]])

In [5]:
# To data frame
mtrx_df = pd.DataFrame(data=mtrx, columns=['Col1', 'Col2', 'Col3', 'Col4'])
mtrx_df

Unnamed: 0,Col1,Col2,Col3,Col4
0,1,2,3,4
1,2,1,5,6
2,3,5,1,7
3,4,6,7,1


In the R version of this I went through the example of constructing a function to return the average of a correlation matrix.  Using numpy and pandas we can do this in one step by extracting the indices of the upper triangle and taking the mean of these.


In [6]:
print('***** MEAN OF NP.TRUI_INDICES *****')
print(mtrx_df.values[np.triu_indices_from(mtrx_df.values,1)].mean())
print()
print('-------------------------------------------------')
print()
print('***** MEAN OF NP.TRUI_INDICES - NP.NANMEAN *****')
print(np.nanmean(mtrx_df.values[np.triu_indices_from(mtrx_df.values,1)]))

***** MEAN OF NP.TRUI_INDICES *****
4.5

-------------------------------------------------

***** MEAN OF NP.TRUI_INDICES - NP.NANMEAN *****
4.5


# Getting back to the stock data.  

The code below creates a correlation matrix of the returns of the stocks loaded in the ```daily_price_ts_vw_20201018``` data initially loaded.

In [7]:
# Add returns
daily_price_ts_vw_20201018['rtn_log_1d'] = daily_price_ts_vw_20201018.groupby('symbol').adjusted_close.apply(lambda x: np.log(x).diff(periods=1))

# Correlatioon matrix
daily_price_ts_vw_20201018.pivot(index = 'date_stamp', columns='symbol', values='rtn_log_1d').corr()

symbol,AAL,AAN,AAPL,AAWW,ABM,ACCO,ACM,ADBE,ADI,ADT,AKAM,AMD
symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
AAL,1.0,0.534266,0.325731,0.305814,0.470569,0.504536,0.509805,0.219577,0.373639,0.428278,0.101893,0.236381
AAN,0.534266,1.0,0.511547,0.206917,0.490153,0.565205,0.606753,0.428532,0.5753,0.554222,0.1824,0.410408
AAPL,0.325731,0.511547,1.0,0.333001,0.581298,0.361228,0.601103,0.825177,0.723448,0.478252,0.581029,0.682798
AAWW,0.305814,0.206917,0.333001,1.0,0.419544,0.352761,0.515143,0.455668,0.531586,0.368625,0.059849,0.331483
ABM,0.470569,0.490153,0.581298,0.419544,1.0,0.501236,0.630795,0.549705,0.560218,0.387388,0.33346,0.411126
ACCO,0.504536,0.565205,0.361228,0.352761,0.501236,1.0,0.688587,0.276039,0.48533,0.525182,0.100282,0.272751
ACM,0.509805,0.606753,0.601103,0.515143,0.630795,0.688587,1.0,0.615331,0.707587,0.588721,0.246442,0.494776
ADBE,0.219577,0.428532,0.825177,0.455668,0.549705,0.276039,0.615331,1.0,0.772001,0.432399,0.542109,0.6959
ADI,0.373639,0.5753,0.723448,0.531586,0.560218,0.48533,0.707587,0.772001,1.0,0.544029,0.334023,0.63337
ADT,0.428278,0.554222,0.478252,0.368625,0.387388,0.525182,0.588721,0.432399,0.544029,1.0,0.208034,0.413764


Note that the ```corr``` function excludes NA's by default.  This is in contrast to R's ```cor``` which requires the ```use = 'pairwise.complete.obs'``` argument.  

Now to calculate the mean of the correlation matrix.

In [8]:
daily_price_ts_cor = daily_price_ts_vw_20201018.pivot(index = 'date_stamp', columns='symbol', values='rtn_log_1d').corr()

np.nanmean(daily_price_ts_cor.values[np.triu_indices_from(daily_price_ts_cor.values,1)])

0.45738142042743374

That agrees to the result we got from R.  All good so far.  

# As a function  

In R we began by implementing a rolling IPC function, and then applied it on a rolling basis.  
  
Below, the rolling functionality is included in the initial function. 

In [9]:
def roll_pfol_corr(x):
  roll_corr_mtx = x.pivot(index='date_stamp', columns='symbol',values='rtn_log_1d').rolling(120).corr()
  dates = roll_corr_mtx.index.get_level_values(0).unique()
  mean_corr = [np.nanmean(roll_corr_mtx.loc[date].values[np.triu_indices_from(roll_corr_mtx.loc[date].values,1)]) for date in dates]
  s = pd.Series(mean_corr, index=dates)
  return s

roll_pfol_corr(daily_price_ts_vw_20201018)

  after removing the cwd from sys.path.


date_stamp
2020-01-02         NaN
2020-01-03         NaN
2020-01-06         NaN
2020-01-07         NaN
2020-01-08         NaN
                ...   
2020-07-29    0.469060
2020-07-30    0.470194
2020-07-31    0.466010
2020-08-03    0.466050
2020-08-04    0.465898
Length: 149, dtype: float64

Now to apply by group.

In [10]:
roll_daily = daily_price_ts_vw_20201018.groupby('sector').apply(roll_pfol_corr).T

# Generate list of dates to filter by
# This needs converting to list
# me_dates = pd.date_range(start='2020-02-11', end='2020-07-31', freq='BM')  

# Use this instead
me_dates= ['2020-02-28', '2020-03-31', '2020-04-30', '2020-05-29','2020-06-30', '2020-07-31']

roll_daily.loc[me_dates]

  after removing the cwd from sys.path.


sector,1,2
date_stamp,Unnamed: 1_level_1,Unnamed: 2_level_1
2020-02-28,,
2020-03-31,,
2020-04-30,,
2020-05-29,,
2020-06-30,0.491526,0.672605
2020-07-31,0.495892,0.637774


These results differ to those returned in R.  

The July IPC for sector 1 is OK with 0.495892, this agrees to that in R (after equalising the look back period).  Sector 2 however is out by a significant amount, 0.5700418 in R versus 0.63777 above. 

This is due to the Pandas rolling correlation function treating pairs of observations that are not complete differently to R.  To my knowledge there is no equivalent to R's ```pairwise.complete.obs``` argument. From the [documentation](https://pandas.pydata.org/docs/reference/api/pandas.core.window.rolling.Rolling.corr.html) - *in the case of missing elements, only complete pairwise observations will be used*.

======================================================================================================================================================================================================================================