# Irrational Capital Participant Notebook

## Public Data Exploration

The purpose of this notebook is to give the participant a chance to work with a small subset of our (non-proprietary) human capital data. This data is employee ratings pulled from glassdoor by our data aggregation partner Thinknum. After gaining an uderstanding of the data, we will implement and test a naive investment strategy using this data. The participant is encouraged to try out other strategies and test them against the returns data provided. 

Begin by reading in the data from github

In [None]:
import pandas as pd
import plotly.express as px

In [None]:
# read data back in from github
gd = pd.read_csv('https://raw.githubusercontent.com/IrrationalCapital/DeepFin/main/glassdoor_DeepFin_sample.csv')

### Exploration of the Public Data

Basic EDA of the public data - similar to what we did with the private data.

In [None]:
# define Trade Year as Survey Year + 1 then get list of all columns
gd['TradeYear'] = gd['SurveyYear']+1
gd.columns

In [None]:
# look at an example row
pd.set_option('display.max_rows',75)
gd.sample(1,random_state = 34).T

In [None]:
rating_cols = gd.columns[gd.columns.str.match('Rating:')]
fig = px.bar(gd[rating_cols].count(), title = 'Number of Ratings Per Category',height = 400)
fig.update_layout(showlegend = False,xaxis_title=None,yaxis_title=None)
fig.show()

In [None]:
figdat3 = (gd.reset_index()
          .melt(value_vars = rating_cols)
          .rename(columns = {'value':'Rating','variable':'Total Responses'})
          .groupby('Rating')
          .count())
fig = px.bar(figdat3,y = 'Total Responses',title = 'Distribution of Glassdoor Ratings')
fig.update_layout(showlegend = False,xaxis_title=None)
fig.show()

In [None]:
figdat4 = (gd.melt(value_vars = rating_cols)
          .rename(columns = {'value':'Rating','variable':'Category'})
          .groupby(['Category','Rating'])
          .size()
          .rename('Total Responses')
          .to_frame()
          .reset_index())
fig = px.bar(figdat4,x = 'Rating',y = 'Total Responses',facet_col = 'Category',facet_col_wrap = 3,height = 500,
            title = 'Rating Distribution by Category')
fig.for_each_annotation(lambda a: a.update(text=a.text.split("=")[-1]))
fig.show()

In [None]:
plotdat = gd[sorted(rating_cols)].dropna(axis = 0, how='all').dropna(axis = 1, how = 'all').corr()
for i,r in enumerate(plotdat.index):
    for j,c in enumerate(plotdat.columns):
        if i < j:
            plotdat.loc[r,c] = None

px.imshow(plotdat
          ,height = 800,color_continuous_scale='Portland'
          ,title = 'Correlation Between Ratings')

### Incorporate Returns

Next we'll bring in the return stream for the public data and test a naive strategy based on this data. 

In [None]:
# read data back in from github
ret = pd.read_csv('https://raw.githubusercontent.com/IrrationalCapital/DeepFin/main/glassdoor_DeepFin_returns.csv')
ret.head(10)

In [None]:
def run_strategy(ret, top_asset_ids):
  ret['strat'] = 'Bottom Half'
  ret.loc[ret.AssetID.isin(top_asset_ids),'strat'] = 'Top Half'
  ret1 = ret.groupby(['strat','Date'])['Close'].sum().reset_index().sort_values('Date')
  ret1['DayChange'] = (ret1.groupby('strat')['Close'].pct_change() + 1).fillna(1)
  ret1['Return'] = (ret1.groupby('strat')['DayChange'].cumprod())-1

  return ret1

In [None]:
gdm = gd.melt(id_vars = ['AssetID'], value_vars = gd.columns[gd.columns.str.match('Rating')])
ar = gdm.groupby(['AssetID'])['value'].agg(['mean'])
ar['rank'] = ar['mean'].rank(ascending = False,pct = True)
ar = ar.reset_index()
ro_top = ar.loc[ar['rank'] < .5,'AssetID']

In [None]:
strat1 = run_strategy(ret,ro_top)
strat1 = strat1.rename(columns = {'strat':'Mean Overall Rating'})
fig = px.line(strat1,x='Date',y='Return',color = 'Mean Overall Rating'
              ,title = 'Performance of Top Half vs Bottom Half by Overall Rating')
fig.update_layout(yaxis_tickformat=',.0%',xaxis_title=None
                  ,legend={'orientation':"h",'yanchor':"bottom", 'y':-.35, 'xanchor':'center', 'x':.5})