# NHL Player Projections

Using machine learning and previous seasons' statistics to predict a player's statistics for the upcoming year.

Note: Data is from Natural Stat Trick.

Note 2: Ignoring goalies (for now).

## Notes for me

1. Going to have to worry about NaNs and -. Can probably just fill everything with 0's but need to consider if a player does not play in a particular season. - Using left join now, might help with this.
2. We have one extra player in the 2018-2019 bio than in 2018-2019 stats. Rookie? Retired player? Injury?

## Table of Contents

1. [Importing Python Libraries](#Importing-Python-Libraries)
2. [Reading the Data](#Reading-the-Data)


## Importing Python Libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [2]:
pd.set_option('display.max_columns', 200)
pd.set_option('display.max_rows', 1000)

## Reading the Data

In [3]:
path = '../data/'

In [4]:
# Function that will create a total dataframe for each type of situation and for a specified season
# Index condition is re
def read_situation_data(sit, season, index=True):
    
    '''
    Helper function that will create a total dataframe for each type of situation and for a specified season.
    Note: Index variable is really just there to make merging all the situations together easier.
    
    '''

    # Read individual data
    df_individual = pd.read_csv(path + season + ' ' + sit + ' ' + 'Individual.csv')
    df_individual.drop('Unnamed: 0', axis=1, inplace=True)

    # Read on ice data
    df_on_ice = pd.read_csv(path + season + ' ' + sit + ' ' + 'On-Ice.csv')
    df_on_ice.drop('Unnamed: 0', axis=1, inplace=True)

    # List of overlapping columns that we will merge on
    overlapping_cols = ['Player', 'Team', 'Position', 'GP', 'TOI']

    # Merge
    df_sit = pd.merge(df_individual, df_on_ice, how='outer', left_on=overlapping_cols, right_on=overlapping_cols)

    # Need to map a code to the column names so that we can easily them the situation from it.
    # However, not going to prefix 'Player', 'Team', 'Position', 'GP' as these do not vary with the situations
    codes_dict = {'All Strengths':'AS', 'Even Strength':'ES', 'Power Play':'PP', 'Penalty Kill':'PK'}
    df_sit.columns = [codes_dict[sit] + ' ' + str(col) for col in df_sit.columns]
    df_sit.rename(columns={codes_dict[sit] + ' Player':'Player',
                           codes_dict[sit] + ' Team':'Team',
                           codes_dict[sit] + ' Position':'Position',
                           codes_dict[sit] + ' GP':'GP'},inplace=True)
    
    # Set index to 'Player' based on condition
    if index:
        df_sit.set_index('Player', inplace=True)

    return df_sit

In [5]:
# read_situation_data('All Strengths', '2018-2019')

In [6]:
def read_season_data(season, index=True):
    
    '''
    Helper function that will create a total dataframe for a specified season.
    Stats are for the following situations: Even Strength (ES), Power Play (PP), and Penalty Kill (PK)
    For now, we will not aggregate (a.k.a. add All Situations (AS))
    Note: Similarly, index variable is really just there to make merging all the situations together easier.
    
    '''
    
    # List of overlapping columns that we will merge on
    overlapping_cols = ['Player', 'Team', 'Position', 'GP']
    
    # Use helper function to create dataframes for all 3 situations and merge on the overlapping columns.
    df_season = read_situation_data('Even Strength', season, index=False)
    df_season = pd.merge(df_season, read_situation_data('Power Play', season, index=False), how='outer', 
                         left_on=overlapping_cols, right_on=overlapping_cols)
    df_season = pd.merge(df_season, read_situation_data('Penalty Kill', season, index=False), how='outer', 
                         left_on=overlapping_cols, right_on=overlapping_cols)
    
    # Add season prefix to each column. For readability, we will only use the later year.
    # (i.e. for the 2015-2016 season, each column will start with 2016)
    later_year = season.split('-')[1]
    df_season.columns = [later_year + ' ' + str(col) for col in df_season.columns]
    df_season.rename(columns={later_year + ' Player':'Player'},inplace=True)
    
    # Set index to 'Player' based on condition
    if index:
        df_season.set_index('Player', inplace=True)
    
    return df_season

In [7]:
# read_season_data('2018-2019')

In [8]:
def read_data(seasons_list=['2015-2016','2016-2017','2017-2018','2018-2019']):
    
    '''
    This is the function that will read the data required for building the model.
    We will use the previous 3 years of the data to predict stats for the upcoming season.
    Also, unlike the previous read_data helper functions, we will add bio information but only for the most recent season in the list.
    This means, the dataset will have each player's biography for the season in which we will predict stats.
    Note: Codes for different situations: Even Strength (ES), Power Play (PP), and Penalty Kill (PK)
    Note 2: Because I'm only interested in players playing in the upcoming year, going to use a left join.
    
    '''
    
    # Just in case, sort seasons_list
    seasons_list.sort()
    
    # Read bio
    max_year = max(seasons_list)
    df = pd.read_csv(path + max_year + ' ' + 'Player Bios.csv', index_col='Player')
    df.drop('Unnamed: 0', axis=1, inplace=True)
    
    # Iterate through the seasons and merge their corresponding dataframes to df
    for season in seasons_list:
        df_season = read_season_data(season, index=True)
        df = pd.merge(df, df_season, how='left', left_index=True, right_index=True)
        
    return df

In [9]:
df = read_data()
display(df.head())
display(df.shape)

Unnamed: 0_level_0,Team,Position,Age,Date of Birth,Birth City,Birth State/Province,Birth Country,Nationality,Height (in),Weight (lbs),Draft Year,Draft Team,Draft Round,Round Pick,Overall Draft Position,2016 Team,2016 Position,2016 GP,2016 ES TOI,2016 ES Goals,2016 ES Total Assists,2016 ES First Assists,2016 ES Second Assists,2016 ES Total Points,2016 ES IPP,2016 ES Shots,2016 ES SH%,2016 ES ixG,2016 ES iCF,2016 ES iFF,2016 ES iSCF,2016 ES iHDCF,2016 ES Rush Attempts,2016 ES Rebounds Created,2016 ES PIM,2016 ES Total Penalties,2016 ES Minor,2016 ES Major,2016 ES Misconduct,2016 ES Penalties Drawn,2016 ES Giveaways,2016 ES Takeaways,2016 ES Hits,2016 ES Hits Taken,2016 ES Shots Blocked,2016 ES Faceoffs Won,2016 ES Faceoffs Lost,2016 ES Faceoffs %,2016 ES CF,2016 ES CA,2016 ES CF%,2016 ES FF,2016 ES FA,2016 ES FF%,2016 ES SF,2016 ES SA,2016 ES SF%,2016 ES GF,2016 ES GA,2016 ES GF%,2016 ES xGF,2016 ES xGA,2016 ES xGF%,2016 ES SCF,2016 ES SCA,2016 ES SCF%,2016 ES HDCF,2016 ES HDCA,2016 ES HDCF%,2016 ES HDGF,2016 ES HDGA,2016 ES HDGF%,2016 ES MDCF,2016 ES MDCA,2016 ES MDCF%,2016 ES MDGF,2016 ES MDGA,2016 ES MDGF%,2016 ES LDCF,2016 ES LDCA,2016 ES LDCF%,2016 ES LDGF,2016 ES LDGA,2016 ES LDGF%,2016 ES On-Ice SH%,2016 ES On-Ice SV%,2016 ES PDO,2016 ES Off. Zone Starts,2016 ES Neu. Zone Starts,2016 ES Def. Zone Starts,2016 ES On The Fly Starts,2016 ES Off. Zone Start %,2016 ES Off. Zone Faceoffs,2016 ES Neu. Zone Faceoffs,2016 ES Def. Zone Faceoffs,2016 ES Off. Zone Faceoff %,2016 PP TOI,2016 PP Goals,2016 PP Total Assists,2016 PP First Assists,...,2019 PP MDCF%,2019 PP MDGF,2019 PP MDGA,2019 PP MDGF%,2019 PP LDCF,2019 PP LDCA,2019 PP LDCF%,2019 PP LDGF,2019 PP LDGA,2019 PP LDGF%,2019 PP On-Ice SH%,2019 PP On-Ice SV%,2019 PP PDO,2019 PP Off. Zone Starts,2019 PP Neu. Zone Starts,2019 PP Def. Zone Starts,2019 PP On The Fly Starts,2019 PP Off. Zone Start %,2019 PP Off. Zone Faceoffs,2019 PP Neu. Zone Faceoffs,2019 PP Def. Zone Faceoffs,2019 PP Off. Zone Faceoff %,2019 PK TOI,2019 PK Goals,2019 PK Total Assists,2019 PK First Assists,2019 PK Second Assists,2019 PK Total Points,2019 PK IPP,2019 PK Shots,2019 PK SH%,2019 PK ixG,2019 PK iCF,2019 PK iFF,2019 PK iSCF,2019 PK iHDCF,2019 PK Rush Attempts,2019 PK Rebounds Created,2019 PK PIM,2019 PK Total Penalties,2019 PK Minor,2019 PK Major,2019 PK Misconduct,2019 PK Penalties Drawn,2019 PK Giveaways,2019 PK Takeaways,2019 PK Hits,2019 PK Hits Taken,2019 PK Shots Blocked,2019 PK Faceoffs Won,2019 PK Faceoffs Lost,2019 PK Faceoffs %,2019 PK CF,2019 PK CA,2019 PK CF%,2019 PK FF,2019 PK FA,2019 PK FF%,2019 PK SF,2019 PK SA,2019 PK SF%,2019 PK GF,2019 PK GA,2019 PK GF%,2019 PK xGF,2019 PK xGA,2019 PK xGF%,2019 PK SCF,2019 PK SCA,2019 PK SCF%,2019 PK HDCF,2019 PK HDCA,2019 PK HDCF%,2019 PK HDGF,2019 PK HDGA,2019 PK HDGF%,2019 PK MDCF,2019 PK MDCA,2019 PK MDCF%,2019 PK MDGF,2019 PK MDGA,2019 PK MDGF%,2019 PK LDCF,2019 PK LDCA,2019 PK LDCF%,2019 PK LDGF,2019 PK LDGA,2019 PK LDGF%,2019 PK On-Ice SH%,2019 PK On-Ice SV%,2019 PK PDO,2019 PK Off. Zone Starts,2019 PK Neu. Zone Starts,2019 PK Def. Zone Starts,2019 PK On The Fly Starts,2019 PK Off. Zone Start %,2019 PK Off. Zone Faceoffs,2019 PK Neu. Zone Faceoffs,2019 PK Def. Zone Faceoffs,2019 PK Off. Zone Faceoff %
Player,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1,Unnamed: 99_level_1,Unnamed: 100_level_1,Unnamed: 101_level_1,Unnamed: 102_level_1,Unnamed: 103_level_1,Unnamed: 104_level_1,Unnamed: 105_level_1,Unnamed: 106_level_1,Unnamed: 107_level_1,Unnamed: 108_level_1,Unnamed: 109_level_1,Unnamed: 110_level_1,Unnamed: 111_level_1,Unnamed: 112_level_1,Unnamed: 113_level_1,Unnamed: 114_level_1,Unnamed: 115_level_1,Unnamed: 116_level_1,Unnamed: 117_level_1,Unnamed: 118_level_1,Unnamed: 119_level_1,Unnamed: 120_level_1,Unnamed: 121_level_1,Unnamed: 122_level_1,Unnamed: 123_level_1,Unnamed: 124_level_1,Unnamed: 125_level_1,Unnamed: 126_level_1,Unnamed: 127_level_1,Unnamed: 128_level_1,Unnamed: 129_level_1,Unnamed: 130_level_1,Unnamed: 131_level_1,Unnamed: 132_level_1,Unnamed: 133_level_1,Unnamed: 134_level_1,Unnamed: 135_level_1,Unnamed: 136_level_1,Unnamed: 137_level_1,Unnamed: 138_level_1,Unnamed: 139_level_1,Unnamed: 140_level_1,Unnamed: 141_level_1,Unnamed: 142_level_1,Unnamed: 143_level_1,Unnamed: 144_level_1,Unnamed: 145_level_1,Unnamed: 146_level_1,Unnamed: 147_level_1,Unnamed: 148_level_1,Unnamed: 149_level_1,Unnamed: 150_level_1,Unnamed: 151_level_1,Unnamed: 152_level_1,Unnamed: 153_level_1,Unnamed: 154_level_1,Unnamed: 155_level_1,Unnamed: 156_level_1,Unnamed: 157_level_1,Unnamed: 158_level_1,Unnamed: 159_level_1,Unnamed: 160_level_1,Unnamed: 161_level_1,Unnamed: 162_level_1,Unnamed: 163_level_1,Unnamed: 164_level_1,Unnamed: 165_level_1,Unnamed: 166_level_1,Unnamed: 167_level_1,Unnamed: 168_level_1,Unnamed: 169_level_1,Unnamed: 170_level_1,Unnamed: 171_level_1,Unnamed: 172_level_1,Unnamed: 173_level_1,Unnamed: 174_level_1,Unnamed: 175_level_1,Unnamed: 176_level_1,Unnamed: 177_level_1,Unnamed: 178_level_1,Unnamed: 179_level_1,Unnamed: 180_level_1,Unnamed: 181_level_1,Unnamed: 182_level_1,Unnamed: 183_level_1,Unnamed: 184_level_1,Unnamed: 185_level_1,Unnamed: 186_level_1,Unnamed: 187_level_1,Unnamed: 188_level_1,Unnamed: 189_level_1,Unnamed: 190_level_1,Unnamed: 191_level_1,Unnamed: 192_level_1,Unnamed: 193_level_1,Unnamed: 194_level_1,Unnamed: 195_level_1,Unnamed: 196_level_1,Unnamed: 197_level_1,Unnamed: 198_level_1,Unnamed: 199_level_1,Unnamed: 200_level_1,Unnamed: 201_level_1
A.J. Greer,COL,L,22,1996-12-14,Joliette,QC,CAN,CAN,75,210,2015,COL,2,9,39,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Aaron Ekblad,FLA,D,23,1996-02-07,Windsor,ON,CAN,CAN,76,220,2014,FLA,1,1,1,FLA,D,78.0,1411.183333,12.0,15.0,6.0,9.0,27.0,39.13,143.0,8.39,6.37,242.0,186.0,66.0,18.0,5.0,11.0,37.0,17.0,16.0,1.0,0.0,6.0,37.0,20.0,81.0,116.0,54.0,0.0,0.0,-,1176.0,1107.0,51.51,932.0,874.0,51.61,690.0,613.0,52.95,69.0,49.0,58.47,54.04,47.12,53.42,548.0,496.0,52.49,216.0,212.0,50.47,35.0,28.0,55.56,332.0,284.0,53.9,21.0,14.0,60.0,561.0,525.0,51.66,11.0,4.0,73.33,10.0,92.01,1.02,233.0,337.0,181.0,793.0,56.28,495.0,534.0,395.0,55.62,221.533333,3.0,6.0,2.0,...,88.89,6.0,2.0,75.00,98.0,10.0,90.74,6.0,0.0,100.00,12.98,87.10,1.001,46.0,26.0,7.0,111.0,86.79,88.0,44.0,9.0,90.72,190.45,0.0,0.0,0.0,0.0,0.0,0.00,12.0,0.00,0.36,16.0,13.0,6.0,0.0,4.0,1.0,12.0,6.0,6.0,0.0,0.0,4.0,9.0,9.0,9.0,1.0,11.0,0.0,0.0,-,74.0,266.0,21.76,66.0,202.0,24.63,57.0,136.0,29.53,2.0,24.0,7.69,3.62,17.69,16.99,37.0,140.0,20.90,16.0,52.0,23.53,0.0,10.0,0.00,21.0,88.0,19.27,2.0,8.0,20.00,23.0,117.0,16.43,0.0,6.0,0.00,3.51,82.35,0.859,3.0,32.0,137.0,91.0,2.14,12.0,52.0,199.0,5.69
Adam Clendening,CBJ,D,26,1992-10-26,Niagara Falls,NY,USA,USA,72,196,2011,CHI,2,6,36,"EDM, PIT",D,29.0,405.333333,1.0,6.0,1.0,5.0,7.0,36.84,41.0,2.44,1.44,76.0,56.0,13.0,1.0,1.0,0.0,20.0,10.0,10.0,0.0,0.0,4.0,18.0,5.0,19.0,55.0,33.0,0.0,0.0,-,374.0,398.0,48.45,277.0,305.0,47.59,204.0,217.0,48.46,19.0,13.0,59.38,14.94,16.63,47.31,175.0,180.0,49.3,68.0,72.0,48.57,13.0,10.0,56.52,107.0,108.0,49.77,4.0,0.0,100.0,164.0,190.0,46.33,2.0,3.0,40.0,9.31,94.01,1.033,52.0,69.0,41.0,353.0,55.91,123.0,114.0,101.0,54.91,23.366667,0.0,0.0,0.0,...,-,0.0,0.0,-,1.0,0.0,100.0,0.0,0.0,-,-,-,-,0.0,0.0,0.0,1.0,-,1.0,0.0,0.0,100.0,6.0,0.0,0.0,0.0,0.0,0.0,-,0.0,-,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,1.0,0.0,0.0,-,2.0,10.0,16.67,2.0,8.0,20.00,1.0,7.0,12.50,0.0,1.0,0.00,0.1,0.6,14.29,1.0,8.0,11.11,0.0,0.0,-,0.0,0.0,-,1.0,8.0,11.11,0.0,1.0,0.00,1.0,2.0,33.33,0.0,0.0,-,0.00,85.71,0.857,0.0,0.0,1.0,9.0,0.00,0.0,0.0,1.0,0.00
Adam Cracknell,ANA,R,34,1985-07-15,Prince Albert,SK,CAN,CAN,74,209,2004,CGY,9,21,279,"EDM, VAN",R,52.0,595.983333,5.0,5.0,4.0,1.0,10.0,76.92,67.0,7.46,5.71,119.0,91.0,56.0,27.0,4.0,6.0,18.0,9.0,9.0,0.0,0.0,10.0,13.0,23.0,123.0,60.0,25.0,146.0,217.0,40.22,481.0,551.0,46.61,351.0,398.0,46.86,251.0,287.0,46.65,13.0,14.0,48.15,19.51,22.05,46.95,207.0,283.0,42.24,86.0,104.0,45.26,7.0,8.0,46.67,121.0,179.0,40.33,3.0,5.0,37.5,218.0,216.0,50.23,1.0,1.0,50.0,5.18,95.12,1.003,79.0,110.0,97.0,489.0,44.89,145.0,169.0,187.0,43.67,6.066667,0.0,0.0,0.0,...,,,,,,,,,,,,,,,,,,,,,,,0.333333,0.0,0.0,0.0,0.0,0.0,-,0.0,-,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-,0.0,0.0,-,0.0,0.0,-,0.0,0.0,-,0.0,0.0,-,0.0,0.0,-,0.0,0.0,-,0.0,0.0,-,0.0,0.0,-,0.0,0.0,-,0.0,0.0,-,0.0,0.0,-,0.0,0.0,-,-,-,-,0.0,0.0,0.0,2.0,-,0.0,0.0,0.0,-
Adam Erne,T.B,L,24,1995-04-20,New Haven,CT,USA,USA,73,214,2013,T.B,2,3,33,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,100.00,3.0,0.0,100.00,13.0,5.0,72.22,1.0,0.0,100.00,21.05,100.00,1.211,14.0,6.0,0.0,30.0,100.00,17.0,10.0,1.0,94.44,19.566667,0.0,0.0,0.0,0.0,0.0,0.00,2.0,0.00,0.12,3.0,2.0,2.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,1.0,1.0,2.0,4.0,4.0,50.00,7.0,22.0,24.14,6.0,20.0,23.08,6.0,15.0,28.57,1.0,1.0,50.00,0.3,2.0,12.88,4.0,15.0,21.05,3.0,7.0,30.00,1.0,1.0,50.00,1.0,8.0,11.11,0.0,0.0,-,0.0,4.0,0.00,0.0,0.0,-,16.67,93.33,1.100,0.0,6.0,4.0,32.0,0.00,0.0,9.0,5.0,0.00


(907, 963)