In [None]:
dt['F'].value_counts()

In [None]:
dt['D'].value_counts()

-**keep only games that have 12 forwards and 6 defensemen.**

In [None]:
dt = dt.groupby(['Season', 'GameNumber']).filter(lambda x: ((x['F'] == 12) & (x['D'] == 6)).all())

In [None]:
dt.shape

### summary analysis

In [None]:
dt.describe()

In [None]:
dt.groupby(['Win'])['F1', 'F2', 'D1', 'D2'].describe()

### estimate roster model

- regress **win** on the difference in number of players by position and quality per team. Add a constant to the predictors and use **OLS**. The purpose is to deterimine the impact each roster position has on home team success.

- **pivot table using game number as index by whether a team is visitor (1) or home (2)**. The table will display the quality of each player per position and team. The next step is to join columns by team and player quality value. We will have for each team 10 columns ( 5 positions x 2 type of player quality). We will rename the columns as following: VC1 shows the amount of elite centers for the visitor team, HC1 displays the amount of elite centers for the home team etc. We rename the columns and sort them based on team, position and quality. 

In [None]:
dy = pd.pivot_table(dy, index=['Season', 'GameNumber'], columns=['A'], values=['F1', 'F2', 'D1', 'D2'])
dy = dy.reset_index()
dy.columns = ['_'.join(str(s).strip() for s in col if s) for col in dy.columns]
dy = dy.reset_index()
dy = dy.rename(columns={'F1_1.0': 'VF1', 'F2_1.0': 'VF2', 'D1_1.0': 'VD1', 'D2_1.0': 'VD2', 'F1_2.0': 'HF1', 'F2_2.0': 'HF2', 'D1_2.0': 'HD1', 'D2_2.0': 'HD2'})
dy = dy[['Season', 'GameNumber', 'VF1', 'VF2', 'VD1', 'VD2', 'HF1', 'HF2', 'HD1', 'HD2']]
dy.sort_values(['Season', 'GameNumber'], ascending=[True, True], inplace=True)
dy.head()

In [None]:
dy.shape

In [None]:
dy.to_csv('season_game_roster.csv', index='False', sep=',')

# Roster Analysis

## season_level_analysis

#### $WinPc = \beta_{0} + \beta_{1}MeanF_{1} + \beta_{2}MeanF_{2} + \beta_{3}MeanD_{1} + \beta_{4}MeanD_{2} + e_{s}$

- merge season_team dataset (dz) and season_team_roster_ranking (dv) for roster analysis at the season level. Use **ds** as the merging dataset.

In [None]:
ds = dv.merge(dz, on=['Season', 'TeamCode'], how='left')
ds.head()

- display the diffence in quality of forwards (DF) and defensemen (DD) per team.

In [None]:
ds['DF'] = ds['MeanF1'] - ds['MeanF2']
ds['DD'] = ds['MeanD1'] - ds['MeanD2']

- mean goals for and mean goals against per team.

In [None]:
result.params

- regress **win** on the difference in number of players by position and quality per team. Add a constant to the predictors and use **Logit**. The purpose is to deterimine the impact each roster position has on home team success

In [None]:
y = dt['Win']  
X = sm.add_constant(dt[['F1', 'D1', 'F2', 'D2']] )
result = sm.Logit(y, X).fit()
result.summary()

- regress **goal differential** on the difference in number of players by position and quality per team. Add a constant to the predictors and use **OLS**. The purpose is to deterimine the impact each roster position has on home team success

In [None]:
y = dt['GD']  
X = sm.add_constant(dt[['F1', 'D1', 'F2', 'D2']] )
result = sm.OLS(y, X).fit()
result.summary()

- regress **win** on the differential of forwards and defensemen per team. Add a constant to the predictors and use **OLS**.

In [None]:
y = dt['Win']  
X = sm.add_constant(dt[['DF', 'DD']] )
result = sm.OLS(y, X).fit()
result.summary()

- regress **win** on the differential of forwards and defensemen per team. Add a constant to the predictors and use **Logit**.

In [None]:
y = dt['Win']  
X = sm.add_constant(dt[['DF', 'DD']] )
result = sm.Logit(y, X).fit()
result.summary()

- regress **win** on the differential of forwards per team. Add a constant to the predictors and use **OLS**.

In [None]:
y = dt['Win']  
X = sm.add_constant(dt[['DF']] )
result = sm.OLS(y, X).fit()
result.summary()

- regress **win** on the differential of forwards per team. Add a constant to the predictors and use **Logit**.

In [None]:
dl['VF'].value_counts()

In [None]:
dl['VD'].value_counts()

In [None]:
dl['HF'].value_counts()

In [None]:
dl['HD'].value_counts()

In [None]:
dl.describe()

In [None]:
dl = dl[['Season', 'GameNumber', 'VTeamCode', 'HTeamCode', 'HGF', 'VGF', 'GD','WinTeam',
         'VF1', 'VF2', 'VD1', 'VD2', 
         'HF1', 'HF2', 'HD1', 'HD2']]

- determine if a game was won by the home or visitor team.
- compute the difference in quality of forwards and defensemen between home and visitor team per game (DF1, DF2, DD1, DD2). 

In [None]:
dl['HomeWin'] = dl.apply(lambda x: 1 if x['WinTeam']=='HOME' else 0, axis=1)
dl['DF1'] = dl['HF1'] - dl['VF1']
dl['DF2'] = dl['HF2'] - dl['VF2']
dl['DD1'] = dl['HD1'] - dl['VD1']
dl['DD2'] = dl['HD2'] - dl['VD2']

In [None]:
dl.groupby(['WinTeam'])['DF1', 'DF2', 'DD1', 'DD2'].describe()

- regress **home win** on the difference in number of home and visitor players by position and quality (DF1, DF2, DD1, DD2). Add a constant to the predictors and use OLS. The purpose is to deterimine the impact each roster position has on home team success.

In [None]:
dm.shape

In [None]:
dm.columns

In [None]:
dm = dm.rename(columns={'PlayerNumber': 'EventPlayerNumber', 'TeamCode': 'EventTeamCode', 'PlayerName': 'EventPlayerName' })
dm = dm[['Season', 'GameNumber', 'GameDate', 'Period', 'AdvantageType', 'Zone', 'EventNumber', 'EventType', 'EventDetail', 'EventTeamCode', 'EventPlayerNumber', 'EventPlayerName', 'EventTimeFromZero', 'EventTimeFromTwenty', 'VTeamCode', 'VPlayer', 'VPosition', 'HTeamCode', 'HPlayer', 'HPosition', 'ShotType', 'ShotResult', 'Length', 'PenaltyType']]
dm = dm.sort_values(['Season', 'GameNumber', 'Period', 'EventNumber'], ascending=[True, True, True, True])

- fill in advantage type with even strength 'EV' and event player number with 'TEAM'

In [None]:
dm['AdvantageType'] = dm['AdvantageType'].fillna('EV')
dm['EventPlayerNumber'] = dm['EventPlayerNumber'].fillna('TEAM')

- save new datast as play by play

In [None]:
dm.to_csv('play_by_play.csv', index='False', sep=',')

#### create new data set and keep variables: 
- (a) game number.
- (b) visitor team information.
- (c) home team information.

In [None]:
df = dm[['Season', 'GameNumber', 'VTeamCode', 'VPlayer', 'VPosition', 'HTeamCode', 'HPlayer', 'HPosition']]
df = df.sort_values(['Season', 'GameNumber'], ascending=[True, True])
df.head()

- merge season_game_data (dg) on new dataset

In [None]:
df = pd.merge(df, dg, on=['Season', 'GameNumber', 'VTeamCode', 'HTeamCode'], how='left')
df.head()    

- reshape the data to have home and visitor team observatons under the same coloumns. 

In [None]:
a = [col for col in df.columns if 'Player' in col]
b = [col for col in df.columns if 'Position' in col]
c = [col for col in df.columns if 'TeamCode' in col]
d = [col for col in df.columns if 'GF' in col]
e = [col for col in df.columns if 'GA' in col]
df = pd.lreshape(df, {'PlayerNumber' : a, 'PlayerPosition' : b, 'TeamCode' : c, 'GF' : d, 'GA' : e })
df.head()

- import player rankings

In [None]:
dp = pd.read_csv('player_rank_manual.csv')
dp = dp.drop('Unnamed: 0', axis=1)

- **display each player by team per game. Drop duplicates.**

In [None]:
dw = pd.merge(df, dp, on=['Season', 'TeamCode', 'PlayerNumber', 'PlayerPosition'], how='left')

- create column that displays the position and roster count by team per game. 

In [None]:
dw = dw[dw.PlayerPosition!='G']
dw = dw.drop_duplicates(['Season', 'GameNumber', 'TeamCode', 'PlayerNumber'])
dw['RosterCount'] = dw.groupby(['Season', 'GameNumber', 'TeamCode'])['PlayerNumber'].transform('count')
dw['Position'] = dw.apply(lambda x: 'D' if x['PlayerPosition']=='D' else 'F', 1)
dw['PositionCount'] = dw.groupby(['Season', 'GameNumber', 'TeamCode', 'Position'])['PlayerNumber'].transform('count')
dw.head()

- count the amount of forwards and defensemen by team per game.

In [None]:
dw['FCount'] = dw.apply(lambda x: x['PositionCount'] if x['Position']=='F' else np.NaN, 1)
dw['DCount'] = dw.apply(lambda x: x['PositionCount'] if x['Position']=='D' else np.NaN, 1)
dw['FCount'] = dw.groupby(['Season','GameNumber', 'TeamCode'])['FCount'].apply(lambda x: x.ffill().bfill())
dw['DCount'] = dw.groupby(['Season','GameNumber', 'TeamCode'])['DCount'].apply(lambda x: x.ffill().bfill())
dw.head()

### keep games that have only 12 F and 6 D per team!!!!

In [None]:
y = dl['HomeWin']  
X = sm.add_constant(dl[['DF1', 'DD1', 'DF2', 'DD2']] )
result = sm.OLS(y, X).fit()
result.summary()

- By increasing the differential of **elite** player quality in forwards and defense (home team – visitor team) by one unit, home win **increases** by 0.4 and 1 game respectfully.
- By increasing the differential of **secondary** player quality in forwards and defense (home team – visitor team) by one unit, home win **decreases** by 0.4 and 1 game respectfully.

- regress **home win** on the difference in number of elite home and visitor players by position (DF1, DD1). Add a constant to the predictors and use **OLS**. The purpose is to deterimine the impact each roster position has on home team success.

In [None]:
y = dl['HomeWin']  
X = sm.add_constant(dl[['DF1', 'DD1']] )
result = sm.OLS(y, X).fit()
result.summary()

- regress **home win** on the difference in number of elite home and visitor players by position (DF1, DD1). Add a constant to the predictors and use **Logit**. The purpose is to deterimine the impact each roster position has on home team success.

In [None]:
y = ds.WinPc
y1 = ds.meanGF
y2 = ds.meanGA
x1 = ds.DF
x2 = ds.DD

f, axs = plt.subplots(2,3,figsize=(15,8))

plt.subplot(2, 3, 1)
plt.scatter(x1, y)
plt.xlabel('DF')
plt.ylabel('Win %')

plt.subplot(2, 3, 2)
plt.scatter(x1, y1)
plt.ylabel('MeanGF')
plt.xlabel('DF')

plt.subplot(2, 3, 3)
plt.scatter(x1, y2)
plt.ylabel('MeanGA')
plt.xlabel('DF')

plt.subplot(2, 3, 4)
plt.scatter(x2,y, color = 'r')
plt.xlabel('DD')
plt.ylabel('Win %')

plt.subplot(2, 3, 5)
plt.scatter(x2, y1, color = 'r')
plt.ylabel('MeanGF')
plt.xlabel('DD')

plt.subplot(2, 3, 6)
plt.scatter(x2, y2, color = 'r')
plt.ylabel('MeanGA')
plt.xlabel('DD')

plt.tight_layout()

plt.show()



## season_game_level_analysis

#### $HomeWin = \beta_{0} + \beta_{1}DF_{1} + \beta_{2}DF_{2} + \beta_{3}DD_{1} + \beta_{4}DD_{2} + e_{s,g}$

- merge season game data (dg) and season game roster (dy).

In [None]:
dl = dg.merge(dy, on=['Season', 'GameNumber'], how='left')
dl.head()

- determine if the home or away team won the game.

In [None]:
dl['WinTeam'] = dl.apply(lambda x: 'HOME' if x['GD'] > 0 else 'AWAY', axis=1)

- Calculate the difference between player quality per game for all positions with respect to home team ( Home Team - Visitor Team). There are 5 positions and 2 types of player quality. This will give us a total of 10 differenecs. 

In [None]:
dl.shape

- total of forwards and defensemen by team per game.

In [None]:
dl['VF'] = dl['VF1'] + dl['VF2']
dl['VD'] = dl['VD1'] + dl['VD2']
dl['HF'] = dl['HF1'] + dl['HF2']
dl['HD'] = dl['HD1'] + dl['HD2']

- total of forwards and defensemen per game.

In [None]:
y = dl['GD']  
X = sm.add_constant(dl[['DF1', 'DD1', 'DF2', 'DD2']] )
result = sm.OLS(y, X).fit()
result.summary()

## season_game_team_level_analysis

#### $Win = \beta_{0} + \beta_{1}F_{1} + \beta_{2}F_{2} + \beta_{3}D_{1} + \beta_{4}D_{2} + e_{s,g,t}$

- use season game data (dg) and season game team roster (dx) to conduct season game team level analysis (dt).

In [None]:
dg.head()

In [None]:
dt = dg.merge(dx, on=['Season', 'GameNumber'], how='left')
dt.head()

- Sum up goals for and against by team per game and find the goal differential (GD) per game. Assign a value of 1 to the team that won the game. 

In [None]:
dt['GD'] = dt.apply(lambda x: (x['HGF'] - x['VGF']) if x['HTeamCode']== x['TeamCode'] else (x['VGF'] - x['HGF']), 1)
dt['Win'] = dt.apply(lambda x: 1 if x['WinTeam']== x['TeamCode'] else 0, 1)
dt['GF'] = dt.apply(lambda x: x['HGF'] if x['HTeamCode']== x['TeamCode'] else x['VGF'], 1)
dt['GA'] = dt.apply(lambda x: x['HGF'] if x['HTeamCode']!= x['TeamCode'] else x['VGF'], 1)
dt.head()

In [None]:
dt['F'] = dt['F1'] + dt['F2']
dt['D'] = dt['D1'] + dt['D2']

- display the difference of quality per position.

In [None]:
y = dt['Win']  
X = sm.add_constant(dt[['F1', 'D1', 'F2', 'D2']] )
result = sm.OLS(y, X).fit()
result.summary()

### scatter plots

- display how team win percent, mean goals for and mean goals against are effected by the difference in forwards and defensemen quality (DF).

In [None]:
ds['meanGF'] = ds['GF']/ ds['GP']
ds['meanGA'] = ds['GA']/ ds['GP']

In [None]:
ds.shape

In [None]:
ds.describe()

### estimate roster model 

- regress **team win percent** on the mean of players by position and quality (predictor variables). Add a constant to the predictors and use **OLS**. The purpose is to deterimine the impact each roster position has on team winning percent.

In [None]:
y = ds['WinPc']  
X = sm.add_constant(ds[['MeanF1', 'MeanD1', 'MeanF2', 'MeanD2']] )
result = sm.OLS(y, X).fit()
result.summary()

- regress **team win percent** on the mean of top forwards. Add a constant to the predictors and use **OLS**. The purpose is to deterimine the impact each roster position has on team winning percent.

In [None]:
y = ds['WinPc']  
X = sm.add_constant(ds[['MeanF1']] )
result = sm.OLS(y, X).fit()
result.summary()

- regress **team win percent** on the mean of players by position and quality (predictor variables). Add a constant to the predictors and use **Logit**.  The purpose is to deterimine the impact each roster position has on team winning percent.

In [None]:
y = ds['WinPc']  
X = sm.add_constant(ds[['MeanF1', 'MeanD1', 'MeanF2', 'MeanD2']] )
result = sm.Logit(y, X).fit()
result.summary()

- regress **team win percent** on the mean of top forwards. Add a constant to the predictors and use **Logit**. The purpose is to deterimine the impact each roster position has on team winning percent.

In [None]:
y = ds['WinPc']  
X = sm.add_constant(ds[['MeanF1']] )
result = sm.Logit(y, X).fit()
result.summary()

- regress **team win percent** on the difference in the mean quality of forwards (DF). Add a constant to the predictors and use **OLS**. The purpose is to deterimine the impact each roster positi0n has on team win percent.

In [None]:
y = ds['WinPc']  
X = sm.add_constant(ds[['DF']] )
result = sm.OLS(y, X).fit()
result.summary()

- regress **team win percent** on the difference in the mean quality of forwards (DF). Add a constant to the predictors and use **Logit**. The purpose is to deterimine the impact each roster positi0n has on team win percent.

In [None]:
y = dt['Win']  
X = sm.add_constant(dt[['DF']] )
result = sm.Logit(y, X).fit()
result.summary()

- regress **win** on the differential of defensemen per team. Add a constant to the predictors and use **OLS**.

In [None]:
y = dt['Win']  
X = sm.add_constant(dt[['DD']] )
result = sm.OLS(y, X).fit()
result.summary()

- regress **win** on the differential of defensemen per team. Add a constant to the predictors and use **Logit**.

In [None]:
y = dt['Win']  
X = sm.add_constant(dt[['DD']] )
result = sm.Logit(y, X).fit()
result.summary()

- regress **goal differential** on the differential of forwards and defensemen per team. Add a constant to the predictors and use **OLS**.

In [None]:
y = dt['GD']
X = sm.add_constant(dt[['DF', 'DD']] )
result = sm.OLS(y, X).fit()
result.summary()

- regress **goal differential** on the differential of forwards per team. Add a constant to the predictors and use **OLS**.

In [None]:
y = dt['GD']  
X = sm.add_constant(dt[['DF']] )
result = sm.OLS(y, X).fit()
result.summary()

- regress **goal differential** on the differential of defensemen per team. Add a constant to the predictors and use **OLS**.

In [None]:
y = dt['GD']  
X = sm.add_constant(dt[['DD']] )
result = sm.OLS(y, X).fit()
result.summary()

- regress **goals for ** on the differential of forwards per team. Add a constant to the predictors and use **OLS**.

In [None]:
dt['DF'] = dt['F1'] - dt['F2']
dt['DD'] = dt['D1'] - dt['D2']

In [None]:
y = ds['WinPc']  
X = sm.add_constant(ds[['DF']] )
result = sm.Logit(y, X).fit()
result.summary()

- regress **team win percent** on the difference in the mean quality of defensemen (DD). Add a constant to the predictors and use **OLS**. The purpose is to deterimine the impact each roster positi0n has on team win percent.

In [None]:
y = ds['WinPc']  
X = sm.add_constant(ds[['DD']] )
result = sm.OLS(y, X).fit()
result.summary()

- regress **team win percent** on the difference in the mean quality of defensemen (DD). Add a constant to the predictors and use **Logit**. The purpose is to deterimine the impact each roster positi0n has on team win percent.

In [None]:
y = ds['WinPc']  
X = sm.add_constant(ds[['DD']] )
result = sm.Logit(y, X).fit()
result.summary()

- regress **mean goals for** on the mean of players by position and quality (predictor variables). Add a constant to the predictors and use **OLS**. The purpose is to deterimine the impact each roster position has on team winning percent.

In [None]:
y = ds['meanGF']  
X = sm.add_constant(ds[['MeanF1', 'MeanD1', 'MeanF2', 'MeanD2']] )
result = sm.OLS(y, X).fit()
result.summary()

- regress **mean goals against** on the mean of players by position and quality (predictor variables). Add a constant to the predictors and use **OLS**. The purpose is to deterimine the impact each roster position has on team winning percent.

In [None]:
y = ds['meanGA']  
X = sm.add_constant(ds[['MeanF1', 'MeanD1', 'MeanF2', 'MeanD2']] )
result = sm.OLS(y, X).fit()
result.summary()

- regress **mean goals for** on the differential of players by position and quality (predictor variables). Add a constant to the predictors and use **OLS**. The purpose is to deterimine the impact each roster position has on team winning percent.

In [None]:
y = ds['meanGF']  
X = sm.add_constant(ds[['DF', 'DD']] )
result = sm.OLS(y, X).fit()
result.summary()

- regress **mean goals against** on the differential of players by position and quality (predictor variables). Add a constant to the predictors and use **OLS**. The purpose is to deterimine the impact each roster position has on team winning percent.

In [None]:
y = dt['GF']  
X = sm.add_constant(dt[['DF']] )
result = sm.OLS(y, X).fit()
result.summary()

- regress **goals for** on the differential of defensemen per team. Add a constant to the predictors and use **OLS**.

In [None]:
y = dt['GF']  
X = sm.add_constant(dt[['DD']] )
result = sm.OLS(y, X).fit()
result.summary()

- regress **goals against ** on the differential of forwards per team. Add a constant to the predictors and use **OLS**.

In [None]:
y = dt['GA']  
X = sm.add_constant(dt[['DF']] )
result = sm.OLS(y, X).fit()
result.summary()

- regress **goals against** on the differential of defensemen per team. Add a constant to the predictors and use **OLS**.

In [None]:
y = ds['meanGA']  
X = sm.add_constant(ds[['DF', 'DD']] )
result = sm.OLS(y, X).fit()
result.summary()

In [None]:
dy = dx

In [None]:
dy.loc[dy.groupby('GameNumber',as_index=False).head(1).index,'A'] = 1
dy = dy.fillna(2)

In [None]:
dv['TeamWin'] =  dv.apply(lambda x: 1 if x['TeamCode']==x['WinTeam'] else 0, 1)
dv['TeamLos'] =  dv.apply(lambda x: 1 if x['TeamCode']!=x['WinTeam'] else 0, 1)
dv.head()

- display games played, games won, games loss, goals for and goals against by team for the season.

In [None]:
dv['GP'] = dv.groupby(['Season', 'Position', 'TeamCode'])['GameNumber'].transform('count')
dv['GW'] = dv.groupby(['Season', 'Position', 'WinTeam'])['TeamWin'].transform('sum')
dv['GL'] = dv.groupby(['Season', 'Position', 'LossTeam'])['TeamLos'].transform('sum')
dv['GF'] = dv.groupby(['Season', 'Position', 'TeamCode'])['GF'].transform('sum')
dv['GA'] = dv.groupby(['Season', 'Position', 'TeamCode'])['GA'].transform('sum')
dv.head()

- create columns with the mean ranking for forward and defenseman by team per game.

In [None]:
dv['Rank_F'] = dv.apply(lambda x: x['Rank'] if x['Position']=='F' else np.NaN, 1)
dv['Rank_D'] = dv.apply(lambda x: x['Rank'] if x['Position']=='D' else np.NaN, 1)
dv['Rank_F'] = dv.groupby(['Season','GameNumber', 'TeamCode'])['Rank_F'].apply(lambda x: x.ffill().bfill())
dv['Rank_D'] = dv.groupby(['Season','GameNumber', 'TeamCode'])['Rank_D'].apply(lambda x: x.ffill().bfill())
dv.head()

- compute the mean per position by team for the season.

In [None]:
dv['Mean_F']= dv.groupby(['Season', 'TeamCode'])['Rank_F'].transform('mean')
dv['Mean_D']= dv.groupby(['Season', 'TeamCode'])['Rank_D'].transform('mean')
dv.head()

- display the quantity of wins and losses per team ( roster of 12 forwards and 6 defensemen)

In [None]:
dv['L'] = dv.apply(lambda x: x['GL'] if x['TeamCode']== x['LossTeam'] else (x['GP'] - x['GW']), 1)
dv['W'] = dv.apply(lambda x: x['GW'] if x['TeamCode']== x['WinTeam'] else (x['GP'] - x['GL']), 1)
dv.head()

- compute win and loss percent by team. Drop duplicate observations.

In [None]:
dv = dv[['Season', 'TeamCode', 'GP', 'L', 'W', 'GF', 'GA', 'Mean_F', 'Mean_D']]
dv = dv.drop_duplicates(['Season', 'TeamCode'])
dv['WinPc'] = dv['W']/ dv['GP']
dv['LossPc'] = dv['L']/ dv['GP']

dv = dv[['Season', 'TeamCode', 'GP','W', 'L', 'GF', 'GA', 'WinPc', 'LossPc', 'Mean_F', 'Mean_D']]
dv.head()

- rank teams based on win percent, mean forwards and mean defensemen. 

In [None]:
y = dl['HomeWin']  
X = sm.add_constant(dl[['DF1', 'DD1']] )
result = sm.Logit(y, X).fit()
result.summary()

- regress **home win** on the difference in number of secondary quality home and visitor players by position (DF2, DD2). Add a constant to the predictors and use **Logit**. The purpose is to deterimine the impact each roster position has on home team success.

In [None]:
dl['F'] = dl['VF'] + dl['HF']
dl['D'] = dl['VD'] + dl['HD']
dl.head()

- **keep games with 12 forwards and 6 defensemen per team.**

In [None]:
dl = dl[((dl['VF'] == 12) & (dl['VD'] == 6) & (dl['HF'] == 12) & (dl['HD'] == 6))]

In [None]:
y = dl['HomeWin']  
X = sm.add_constant(dl[['DF2', 'DD2']] )
result = sm.Logit(y, X).fit()
result.summary()

- regress **goal differential** on the difference in number of home and visitor players by position and quality (DF1, DF2, DD1, DD2). Add a constant to the predictors and use OLS. The purpose is to deterimine the impact each roster position has on goal differential.