<a href="https://colab.research.google.com/github/danmartin25/Hockey_Model/blob/main/Hockey_Model_Mark_I.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Notes**
1. On-ice EV xG+/-

2. On-ice EV G+/-

3. On-ice PP G+/- above average

4. On-ice SH G+/- above average

5. GSAx

6. Individual points above average (depending on position and role)

In [1]:
# Import Dependencies
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
#Import Even-Strength On-Ice Totals Data
skater_EV_totals_raw = pd.read_csv('https://raw.githubusercontent.com/danmartin25/Hockey_Model/main/EV%20On-Ice%20Totals.csv')
#print(skater_EV_totals_raw)
#skater_EV_totals_raw.head()

In [3]:
#Restrict Totals Data to GP,TOI,GF%,xGF%,GF,GA,xGF,xGA
skater_EV_totals = skater_EV_totals_raw.loc[:,['Player','Season','Team','Position','GP','TOI','GF%','xGF%','GF','GA','xGF','xGA']]
#skater_EV_totals

In [4]:
#Add columns for G+/-,xG+/-,G/s,xG/s
skater_EV_totals['G+/-'] = skater_EV_totals['GF'] - skater_EV_totals['GA']
skater_EV_totals['xG+/-'] = skater_EV_totals['xGF'] - skater_EV_totals['xGA']
#skater_EV_totals

In [5]:
#Import PP On-Ice Rates Data
skater_PP_totals_raw = pd.read_csv('https://raw.githubusercontent.com/danmartin25/Hockey_Model/main/PP%20On-Ice%20Totals.csv')
#print(skater_PP_totals_raw)
#skater_PP_totals_raw.head()

In [6]:
#Reduce PP On-Ice Totals Data
skater_PP_totals = skater_PP_totals_raw.loc[:,['Player','Season','Team','Position','GP','TOI','GF%','xGF%','GF','GA','xGF','xGA']]
#skater_PP_totals

In [7]:
#Add columns for G+/-,xG+/-,G/s,xG/s
skater_PP_totals['G+/-'] = skater_PP_totals['GF'] - skater_PP_totals['GA']
skater_PP_totals['xG+/-'] = skater_PP_totals['xGF'] - skater_PP_totals['xGA']
#skater_PP_totals

In [8]:
#Add column for mean G+/- and xG+/-
G_mean = skater_PP_totals['G+/-'].mean()
xG_mean = skater_PP_totals['xG+/-'].mean()
#G_mean
#xG_mean

In [9]:
#Add column for stats above average for G+/- and xG+/-
skater_PP_totals['GAA'] = skater_PP_totals['G+/-'] - G_mean
skater_PP_totals['xGAA'] = skater_PP_totals['xG+/-'] - xG_mean
#skater_PP_totals

In [10]:
#Drop columns
skater_PP_totals = skater_PP_totals.drop(columns = ['xGF%','xGF','xGA','xG+/-','xGAA'])
#skater_PP_totals

In [11]:
#Import SH On-Ice Totals Data
skater_SH_totals_raw = pd.read_csv('https://raw.githubusercontent.com/danmartin25/Hockey_Model/main/SH%20On-Ice%20Totals.csv')
#print(skater_SH_totals_raw)
#skater_SH_totals_raw.head()

In [12]:
#Reduce SH On-Ice Totals Data
skater_SH_totals = skater_SH_totals_raw.loc[:,['Player','Season','Team','Position','GP','TOI','GF%','xGF%','GF','GA','xGF','xGA']]
#skater_SH_totals

In [13]:
#Add columns for G+/-,xG+/-
skater_SH_totals['G+/-'] = skater_SH_totals['GF'] - skater_SH_totals['GA']
skater_SH_totals['xG+/-'] = skater_SH_totals['xGF'] - skater_SH_totals['xGA']
#skater_SH_totals

In [14]:
#Add column for mean G+/- and xG+/-
G_mean = skater_SH_totals['G+/-'].mean()
xG_mean = skater_SH_totals['xG+/-'].mean()
#G_mean
#xG_mean

In [15]:
#Add column for stats above average for G+/- and xG+/-
skater_SH_totals['GAA'] = skater_SH_totals['G+/-'] - G_mean
skater_SH_totals['xGAA'] = skater_SH_totals['xG+/-'] - xG_mean
#skater_SH_totals

In [16]:
skater_SH_totals = skater_SH_totals.drop(columns = ['xGF%','xGF','xGA','xG+/-','xGAA'])
#skater_SH_totals

In [17]:
#Import Player Box Stats
skater_box_totals_raw = pd.read_csv('https://raw.githubusercontent.com/danmartin25/Hockey_Model/main/Player%20Stats%20Box.csv')
#print(skater_box_totals_raw)
#skater_box_totals_raw.head()

In [18]:
#Reduce Data Stats
skater_box_totals = skater_box_totals_raw.loc[:,['Player','Season','Team','Position','GP','TOI','G','A1','A2','Points']]
#skater_box_totals.head()

In [19]:
#Look at positions to get correct F/D average points
#print(skater_box_totals['Position'].unique())
#print(skater_box_totals.count())

forwards_table = skater_box_totals.loc[(skater_box_totals.Position == "L") | (skater_box_totals.Position == "C") | (skater_box_totals.Position == "C/L") | (skater_box_totals.Position == "R") | (skater_box_totals.Position == "L/R") | (skater_box_totals.Position == "C/R")]
#forwards_table.head(10)
#print(forwards_table.count())

defensemen_table = skater_box_totals.loc[(skater_box_totals.Position == "D") | (skater_box_totals.Position == "D/L")]
#defensemen_table.head(10)
#print(defensemen_table.count())

In [20]:
#Get Mean of Points
Points_mean = skater_box_totals['Points'].mean()
#print(Points_mean)

Forwards_mean = forwards_table['Points'].mean()
#print(Forwards_mean)

Defensemen_mean = defensemen_table['Points'].mean()
#print(Defensemen_mean)

In [21]:
#Get PAA
#skater_box_totals['PAA'] = skater_box_totals['Points'] - Points_mean
#skater_box_totals

#Get PAA for Fwds and Dmen and combine

forwards_table['PAA'] = forwards_table['Points'] - Forwards_mean
#forwards_table.head(10)

defensemen_table['PAA'] = defensemen_table['Points'] - Defensemen_mean
#defensemen_table.head(10)


#Add dataframes back together and sort again

concat_frames = [forwards_table, defensemen_table]
skater_box_totals = pd.concat(concat_frames)
skater_box_totals = skater_box_totals.sort_index()
#skater_box_totals.head(10)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  import sys
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  # Remove the CWD from sys.path while we load stuff.


In [22]:
combined_player_df = pd.merge(skater_EV_totals, skater_PP_totals, how="left", on=["Player", "Player"])
#combined_player_df

In [23]:
combined_player_df = combined_player_df.drop(columns = ['GF%_x','GF_x','GA_x','Season_y','Team_y','Position_y','GP_y','GF%_y','GF_y','GA_y','G+/-_y'])
#combined_player_df

In [24]:
#Merge EV,PP,SH tables together
full_player_totals = pd.merge(combined_player_df, skater_SH_totals, how="left", on=["Player", "Player"])
#full_player_totals.head(10)

In [25]:
#Clean up data
full_player_totals = full_player_totals.drop(columns = ['Season','Team','Position','GP','GF%','GF','GA','G+/-'])
#full_player_totals.head(10)

In [26]:
#Merge EV,PP,SH and Points tables together
full_player_totals = pd.merge(full_player_totals, skater_box_totals, how="left", on=["Player", "Player"])
#full_player_totals.head(10)

  


In [27]:
#Drop Repeated Columns
full_player_totals = full_player_totals.drop(columns = ['Season','Team','Position','GP','G','A1','A2'])
#full_player_totals.head(10)


In [28]:
#Rename Columns
full_player_totals = full_player_totals.rename(columns = {'Season_x':'Season','Team_x':'Team','Position_x':'Position','GP_x':'GP','TOI_x':'TOI_EV','G+/-_x':'G_EV','xG+/-':'xG_EV','TOI_x':'TOI_PP','GAA_x':'GAA_PP','TOI_x':'TOI_SH','GAA_y':'GAA_SH','TOI_y':'TOI'})
#full_player_totals

In [29]:
#Replace NaN with 0
full_player_totals = full_player_totals.replace(np.NaN, 0)
full_player_totals

Unnamed: 0,Player,Season,Team,Position,GP,TOI_SH,xGF%,xGF,xGA,G_EV,xG_EV,TOI,GAA_PP,TOI_SH.1,GAA_SH,TOI.1,Points,PAA
0,A.J. Greer,21-22,N.J,L,9,70.45,58.74,3.16,2.22,0.06,0.94,0.00,0.000000,0.00,0.000000,70.70,2,-20.633276
1,Aaron Ekblad,19-20,FLA,D,67,1220.55,50.83,49.88,48.26,16.49,1.62,95.90,-2.792102,134.87,-13.441789,1537.92,41,26.890075
2,Aaron Ekblad,19-20,FLA,D,67,1220.55,50.83,49.88,48.26,16.49,1.62,95.90,-2.792102,134.87,-13.441789,878.18,22,7.890075
3,Aaron Ekblad,19-20,FLA,D,67,1220.55,50.83,49.88,48.26,16.49,1.62,95.90,-2.792102,134.87,-13.441789,1519.70,57,42.890075
4,Aaron Ekblad,19-20,FLA,D,67,1220.55,50.83,49.88,48.26,16.49,1.62,95.90,-2.792102,86.47,2.418211,1537.92,41,26.890075
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
29852,Zemgus Girgensons,19-20,BUF,C,69,783.73,49.54,25.39,25.86,-4.34,-0.47,54.37,-5.272102,81.28,-4.731789,828.20,18,-4.633276
29853,Zemgus Girgensons,21-22,BUF,C/L,56,668.30,45.90,20.39,24.03,-5.14,-3.64,54.37,-5.272102,147.87,-13.281789,951.77,19,-3.633276
29854,Zemgus Girgensons,21-22,BUF,C/L,56,668.30,45.90,20.39,24.03,-5.14,-3.64,54.37,-5.272102,147.87,-13.281789,828.20,18,-4.633276
29855,Zemgus Girgensons,21-22,BUF,C/L,56,668.30,45.90,20.39,24.03,-5.14,-3.64,54.37,-5.272102,81.28,-4.731789,951.77,19,-3.633276


In [30]:
#Import Goalie Data
goalie_stats_raw = pd.read_csv('https://raw.githubusercontent.com/danmartin25/Hockey_Model/main/Goalie%20Stats.csv')
#print(goalie_stats_raw)
#goalie_stats_raw.head()

In [31]:
#Reduce Goalie Stats
goalie_stats = goalie_stats_raw.loc[:,['Player','Season','Team','Position','GP','TOI','GA','Sv%','GSAx']]
goalie_stats.head()

Unnamed: 0,Player,Season,Team,Position,GP,TOI,GA,Sv%,GSAx
0,Aaron Dell,19-20,S.J,G,33,1834.23,91.06,90.84,1.1
1,Aaron Dell,20-21,N.J,G,7,319.17,21.45,86.01,-9.54
2,Aaron Dell,21-22,BUF,G,12,565.38,37.95,89.36,-7.35
3,Adam Huska,21-22,NYR,G,1,59.7,7.53,82.31,-4.03
4,Adam Werner,19-20,COL,G,2,87.83,4.85,91.05,-1.09


In [32]:
#skater_PP_totals_raw = pd.read_csv('https://raw.githubusercontent.com/danmartin25/Hockey_Model/main/PP%20On-Ice%20Rates.csv')
#print(skater_PP_rates_raw)
#skater_PP_rates_raw.head()

**Stuff to Do Next - (have no more time, so quick thoughts that I have not checked, like the next one would be so easy to check if I had another 5 mins)**

In [33]:
#Are these right? duplicate naming issues for "TOI_SH" and no PP?
full_player_totals.columns

Index(['Player', 'Season', 'Team', 'Position', 'GP', 'TOI_SH', 'xGF%', 'xGF',
       'xGA', 'G_EV', 'xG_EV', 'TOI', 'GAA_PP', 'TOI_SH', 'GAA_SH', 'TOI',
       'Points', 'PAA'],
      dtype='object')

Need to create a dataframe for each team. Use the loc function on player table first to get all the players for one team. Then do the same thing with goalies. We will probably need to keep them seperate tables since they do not have similar variables

Not necessary at this stage but eventually we are going to want to consolidate these code blocks. Ex, the code to create all of the team rosters should be in one block with no output. For now you can do 64 blocks (32 for skaters, 32 for goalies) and show output so we can see if there are any errors, but once we get past that stage we will comment out the output and put it all in one block. If we ever need to go back in check you just take out the '#' and run the code to see the output.

Need to add a column for 'status'. I am thinking the two states we use are 'active' and 'inactive' for the starting lineups. Will be 18 skaters and 1 goalie when we update daily but doesnt matter now. Should put this in before all of the loc functions so it shows up for the team dataframes. Note: will have to be added to both the full_player_totals dataframe and the goalie_stats dataframe.

Add folder in githib for the csv's, update github links in code to pull data