## **Douglas Barley, DATA618 Quantitative Finance**

### **Correlation of Two Datasets**

**September 6, 2022**

In [41]:
import pandas as pd
import datetime as dt
import numpy as np
import scipy.stats
import matplotlib.pyplot as plt

### **GDP v Fed Funds Effective Rate**

In [42]:
# get the GDP data
gdp = pd.read_csv("https://raw.githubusercontent.com/douglasbarley/DATA618/main/WeeklyData/Week2/united-states-gdp-growth-rate.csv")
gdp['year'] = gdp['date'].str[6:].astype(int)
gdp['GDP Growth Rate'] = gdp.iloc[:,1] / 100
gdp.set_index('year')
gdp = pd.DataFrame(gdp, columns = ['year','GDP Growth Rate'])
gdp.head(100)

Unnamed: 0,year,GDP Growth Rate
0,1961,0.023000
1,1962,0.061000
2,1963,0.044000
3,1964,0.058000
4,1965,0.064000
...,...,...
56,2017,0.022557
57,2018,0.029189
58,2019,0.022889
59,2020,-0.034046


In [43]:
# get the fed funds effective rate data
fed = pd.read_csv("https://raw.githubusercontent.com/douglasbarley/DATA618/main/WeeklyData/Week2/FEDFUNDS.csv")
fed['year'] = fed['DATE'].str[:4].astype(int)
fed = pd.DataFrame(fed, columns = ['year','FEDFUNDS'])
fed = fed.groupby('year', as_index=False).agg(avg_FedFunds=pd.NamedAgg(column='FEDFUNDS',aggfunc='mean'))
fed.set_index('year')
fed.head(100)

Unnamed: 0,year,avg_FedFunds
0,1954,1.008333
1,1955,1.785000
2,1956,2.728333
3,1957,3.105000
4,1958,1.572500
...,...,...
64,2018,1.831667
65,2019,2.158333
66,2020,0.375833
67,2021,0.080000


In [44]:
# only use rows that exist in both dataframes for the correlation
# the date range overlap is 1961 to 2021
# gdp only contains the above range, so we modify annualfed to match
fed_limit = fed.loc[(fed['year'] > 1960) & (fed['year'] < 2022)]
fed_limit.reset_index(drop=True)


Unnamed: 0,year,avg_FedFunds
0,1961,1.955000
1,1962,2.708333
2,1963,3.178333
3,1964,3.496667
4,1965,4.075000
...,...,...
56,2017,1.001667
57,2018,1.831667
58,2019,2.158333
59,2020,0.375833


In [45]:
# find the correlation between the gdp and fed_limit datasets
gdp['GDP Growth Rate'].corr(fed_limit['avg_FedFunds'])

0.18433209492580302

Correlation between the GDP and Fed Funds datasets is 0.1843. This indicates a slightly positive but negligible correlation between the datasets.

### **Add the Consumer Price Index (CPI) dataset**

In [46]:
# get the consumer price index data
cpi = pd.read_csv("https://raw.githubusercontent.com/douglasbarley/DATA618/main/WeeklyData/Week2/BLS%20CPI%201913-2022.csv")
cpi = pd.DataFrame(cpi, columns = ['Year','Avg'])
cpi.set_index('Year')

Unnamed: 0_level_0,Avg
Year,Unnamed: 1_level_1
1913,9.900
1914,10.000
1915,10.100
1916,10.900
1917,12.800
...,...
2018,251.107
2019,255.657
2020,258.811
2021,270.970


In [47]:
# modify cpi to match the year range
cpi_limit = cpi.loc[(cpi['Year'] > 1960) & (cpi['Year'] < 2022)]
cpi_limit.head(100)


Unnamed: 0,Year,Avg
48,1961,29.900
49,1962,30.200
50,1963,30.600
51,1964,31.000
52,1965,31.500
...,...,...
104,2017,245.120
105,2018,251.107
106,2019,255.657
107,2020,258.811


In [48]:
# find the correlation between the gdp and cpi_limit datasets
gdp['GDP Growth Rate'].corr(cpi_limit['Avg'])

0.16941530593536577

Correlation between the GDP and CPI datasets is 0.1694. This indicates a slightly less positive yet negligible correlation between the datasets than between GDP and Fed Funds.

In [49]:
# find the correlation between the fed_limit and cpi_limit datasets
fed_limit['avg_FedFunds'].corr(cpi_limit['Avg'])

-0.345214447515283

Correlation between the Fed Funds and CPI datasets is -0.3452. This indicates a low negative correlation between the Fed Funds and CPI datasets.