This is my Fantasy Hockey Analyzer. The purpose of this project is to predict the fantasy hockey output of individual skaters based on stats from previous years.


Section 1: Modules Used

The following is a list of modules that I used and the reason why they were used:

-os: to allow the program to read data in the repository

-numpy: basic math operations

-pandas: all dataframe operations/data storage/data cleaning

-various sklearn: all machine learning operations/analysis

In addition to these modules, I also have a custom module that contains helper functions that help in data cleaning/accuracy evaluation. These functions are contained in the "my_module.py" file in the repository. If you are interested in taking a look at these functions, they are available at https://github.com/chrisberry888/FantasyHockeyAnalyzer in the "my_module.py" file.

In [1]:
#Import block
import os
import numpy as np
import pandas as pd
from sklearn.neural_network import MLPRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error

import my_module as mx

Section 2: Data Gathering and Cleaning

The ultimate goal of this project is to predict the number of fantasy points that a given player will have in the 2022-2023 season. Different fantasy leagues have different points breakdowns, but my current league has a points breakdown as described in the "points_dict" variable in the following code block. For example, each player gets 5 fantasy points for a goal, 3 for an assist, and so on. Of course, this can be changed if a different league has a different points breakdown.

To accomplish this all, I have gathered data from rotowire.com and moneypuck.com. The Moneypuck data contains just about every advanced stat you could think of in a variety of different situations (5-on-5, 5-on-4, etc). However, for the sake of this project we will only use their data that describes a player's output in all of their situations. The only stat we need that can't be calculated using the Moneypuck data is +/-; Rotowire has +/- available, so I'm using that dataset.

Data from these sets start from the 2010-2011 season and stretch to the 2021-2022 season. All of the major data gathering and cleaning occurs in the next three code blocks.


The following block does all of the prep work before we start to read the data. It first establishes where the data is stored in the repository so that it can be read by the program. It then creates a list of labels that will be used by the rotowire data (the rotowire dataset is formatted differently than the moneypuck dataset, so we need to do this extra step before proceeding). It then establishes a points breakdown for each relevant stat; this is used later on to calculate fantasy points for each player.

In [2]:
#The current working directory is the main repository directory; these lines set the path to where the data is
path = os.getcwd()
data_path = path + '\\data'

#This array makes it easier to format the rotowire data
rw_labels = ["name", "Team", "Pos", "Games", "Goals", "Assists", "Pts", "+/-", "PIM", "SOG", "GWG", "PP_Goals", "PP_Assists", "SH_Goals", "SH_Assists", "Hits", "Blocked_Shots"]

#This is the breakdown of how many fantasy points a player gets for each category
points_dict = {"Goals":5, "Assists":3, "+/-":1.5, "PIM":-0.25, "PP_Goals":4, "PP_Assists":2, "SH_Goals":6, "SH_Assists":4, "Faceoffs_Won":0.25, "Faceoffs_Lost":-0.15, "Hits":0.5, "Blocked_Shots":0.75 }


The following block takes all of the data in the repository and turns it into year-by-year player data. For each year from 2010 to 2021, the for-loop reads the rotowire and moneypuck data from the csv files in the repository, merges the datasets together, calculates the player's fantasy points for that season, does some formatting, then adds it to the "yearly_player_data" list. This list can be used later on for turning into ML-readable data.

In [3]:
#I have data from the 2010-2011 season through the 2021-2022 season.
#By the end of this block, there will be 12 seasons-worth of data in the "data" variable
yearly_player_data = []

for i in range(2010, 2022):
    new_data = []
    
    #Imports the rotowire and moneypuck datasets from the selected year into rdf and mdf
    rdf = pd.read_csv(data_path + '\\rotowire_data\\rotowire{}.csv'.format(str(i)))
    mdf = pd.read_csv(data_path + '\\moneypuck_data\\moneypuck{}.csv'.format(str(i)))
    
    #Formats the rotowire data
    rdf.set_axis(rw_labels, axis=1, inplace=True)
    rdf.drop(index=rdf.index[0], axis=0, inplace=True)
    
    #The Moneypuck data has information about 5-on-5, 5-on-4, 4-on-5, other, and all.
    #For this project I'm just focused on "all" since I suspect it'll give me the best results.
    mdf = mdf[mdf["situation"] == "all"]
    
    #Combines the "Name" and "Team" columns (There are some players with the same name on different teams)
    rdf["name"] = rdf["name"] + "-" + rdf["Team"]
    mdf["name"] = mdf["name"] + "-" + mdf["team"]
    
    
    
    #Merges the rotowire and moneypuck dataframes
    new_data = pd.merge(rdf, mdf, on="name")
    
    #Changes the name of a few columns in the new dataframe
    new_data = new_data.rename(columns={"name":"Name","faceoffsWon":"Faceoffs_Won","faceoffsLost":"Faceoffs_Lost"})
    
    #This section calculates each player's total fantasy output for that year
    cols = new_data.columns
    fant_points = [0 for i in range(len(new_data))]
    for i in range(len(new_data)):
        for j in range(len(new_data.iloc[i])):
            mult = points_dict.get(cols[j], 0)
            if mult != 0:
                fant_points[i] += mult*int(new_data.iloc[i, j])
    
    #Adds the players' fantasy points to the new_data dataframe
    new_data["Fantasy_Points"] = fant_points
    
    new_data = new_data.drop_duplicates()
    
    #Adds new_data to the "data" array
    yearly_player_data.append(new_data)
                
    



In [4]:
pd.options.display.max_columns = None
pd.options.display.max_rows = None
yearly_player_data[11]

Unnamed: 0,Name,Team,Pos,Games,Goals,Assists,Pts,+/-,PIM,SOG,GWG,PP_Goals,PP_Assists,SH_Goals,SH_Assists,Hits,Blocked_Shots,playerId,season,team,position,situation,games_played,icetime,shifts,gameScore,onIce_xGoalsPercentage,offIce_xGoalsPercentage,onIce_corsiPercentage,offIce_corsiPercentage,onIce_fenwickPercentage,offIce_fenwickPercentage,iceTimeRank,I_F_xOnGoal,I_F_xGoals,I_F_xRebounds,I_F_xFreeze,I_F_xPlayStopped,I_F_xPlayContinuedInZone,I_F_xPlayContinuedOutsideZone,I_F_flurryAdjustedxGoals,I_F_scoreVenueAdjustedxGoals,I_F_flurryScoreVenueAdjustedxGoals,I_F_primaryAssists,I_F_secondaryAssists,I_F_shotsOnGoal,I_F_missedShots,I_F_blockedShotAttempts,I_F_shotAttempts,I_F_points,I_F_goals,I_F_rebounds,I_F_reboundGoals,I_F_freeze,I_F_playStopped,I_F_playContinuedInZone,I_F_playContinuedOutsideZone,I_F_savedShotsOnGoal,I_F_savedUnblockedShotAttempts,penalties,I_F_penalityMinutes,I_F_faceOffsWon,I_F_hits,I_F_takeaways,I_F_giveaways,I_F_lowDangerShots,I_F_mediumDangerShots,I_F_highDangerShots,I_F_lowDangerxGoals,I_F_mediumDangerxGoals,I_F_highDangerxGoals,I_F_lowDangerGoals,I_F_mediumDangerGoals,I_F_highDangerGoals,I_F_scoreAdjustedShotsAttempts,I_F_unblockedShotAttempts,I_F_scoreAdjustedUnblockedShotAttempts,I_F_dZoneGiveaways,I_F_xGoalsFromxReboundsOfShots,I_F_xGoalsFromActualReboundsOfShots,I_F_reboundxGoals,I_F_xGoals_with_earned_rebounds,I_F_xGoals_with_earned_rebounds_scoreAdjusted,I_F_xGoals_with_earned_rebounds_scoreFlurryAdjusted,I_F_shifts,I_F_oZoneShiftStarts,I_F_dZoneShiftStarts,I_F_neutralZoneShiftStarts,I_F_flyShiftStarts,I_F_oZoneShiftEnds,I_F_dZoneShiftEnds,I_F_neutralZoneShiftEnds,I_F_flyShiftEnds,Faceoffs_Won,Faceoffs_Lost,timeOnBench,penalityMinutes,penalityMinutesDrawn,penaltiesDrawn,shotsBlockedByPlayer,OnIce_F_xOnGoal,OnIce_F_xGoals,OnIce_F_flurryAdjustedxGoals,OnIce_F_scoreVenueAdjustedxGoals,OnIce_F_flurryScoreVenueAdjustedxGoals,OnIce_F_shotsOnGoal,OnIce_F_missedShots,OnIce_F_blockedShotAttempts,OnIce_F_shotAttempts,OnIce_F_goals,OnIce_F_rebounds,OnIce_F_reboundGoals,OnIce_F_lowDangerShots,OnIce_F_mediumDangerShots,OnIce_F_highDangerShots,OnIce_F_lowDangerxGoals,OnIce_F_mediumDangerxGoals,OnIce_F_highDangerxGoals,OnIce_F_lowDangerGoals,OnIce_F_mediumDangerGoals,OnIce_F_highDangerGoals,OnIce_F_scoreAdjustedShotsAttempts,OnIce_F_unblockedShotAttempts,OnIce_F_scoreAdjustedUnblockedShotAttempts,OnIce_F_xGoalsFromxReboundsOfShots,OnIce_F_xGoalsFromActualReboundsOfShots,OnIce_F_reboundxGoals,OnIce_F_xGoals_with_earned_rebounds,OnIce_F_xGoals_with_earned_rebounds_scoreAdjusted,OnIce_F_xGoals_with_earned_rebounds_scoreFlurryAdjusted,OnIce_A_xOnGoal,OnIce_A_xGoals,OnIce_A_flurryAdjustedxGoals,OnIce_A_scoreVenueAdjustedxGoals,OnIce_A_flurryScoreVenueAdjustedxGoals,OnIce_A_shotsOnGoal,OnIce_A_missedShots,OnIce_A_blockedShotAttempts,OnIce_A_shotAttempts,OnIce_A_goals,OnIce_A_rebounds,OnIce_A_reboundGoals,OnIce_A_lowDangerShots,OnIce_A_mediumDangerShots,OnIce_A_highDangerShots,OnIce_A_lowDangerxGoals,OnIce_A_mediumDangerxGoals,OnIce_A_highDangerxGoals,OnIce_A_lowDangerGoals,OnIce_A_mediumDangerGoals,OnIce_A_highDangerGoals,OnIce_A_scoreAdjustedShotsAttempts,OnIce_A_unblockedShotAttempts,OnIce_A_scoreAdjustedUnblockedShotAttempts,OnIce_A_xGoalsFromxReboundsOfShots,OnIce_A_xGoalsFromActualReboundsOfShots,OnIce_A_reboundxGoals,OnIce_A_xGoals_with_earned_rebounds,OnIce_A_xGoals_with_earned_rebounds_scoreAdjusted,OnIce_A_xGoals_with_earned_rebounds_scoreFlurryAdjusted,OffIce_F_xGoals,OffIce_A_xGoals,OffIce_F_shotAttempts,OffIce_A_shotAttempts,xGoalsForAfterShifts,xGoalsAgainstAfterShifts,corsiForAfterShifts,corsiAgainstAfterShifts,fenwickForAfterShifts,fenwickAgainstAfterShifts,Fantasy_Points
0,Elias Lindholm-CGY,CGY,RW,82,42,40,82,61,22,235,9,10,9,1,1,66,52,8477496,2021,CGY,C,all,82,98129.0,2077.0,99.17,0.61,0.52,0.58,0.54,0.57,0.54,133.0,223.87,27.52,18.09,44.38,7.45,126.55,86.01,26.1,27.7,26.27,24.0,16.0,236.0,74.0,71.0,381.0,82.0,42.0,15.0,4.0,50.0,8.0,98.0,97.0,194.0,268.0,10.0,22.0,842.0,66.0,55.0,41.0,184.0,103.0,23.0,6.67,12.35,8.51,15.0,19.0,8.0,387.5,310.0,313.73,18.0,4.09,3.95,3.83,27.79,28.01,26.91,2077.0,374.0,421.0,298.0,984.0,249.0,226.0,323.0,1279.0,842.0,750.0,199844.0,22.0,28.0,14.0,52.0,1037.47,112.77,107.74,114.18,109.11,1057.0,390.0,436.0,1883.0,149.0,76.0,18.0,1007.0,332.0,108.0,29.51,39.86,43.39,44.0,57.0,48.0,1922.36,1447.0,1470.54,17.98,16.37,16.28,114.46,115.79,111.85,769.75,72.51,69.91,72.19,69.57,776.0,297.0,308.0,1381.0,70.0,52.0,10.0,801.0,203.0,69.0,23.03,24.42,25.07,22.0,22.0,26.0,1364.75,1073.0,1062.9,11.66,11.95,11.95,72.23,71.81,69.95,152.28,138.4,3345.0,2852.0,0.0,0.0,0.0,0.0,0.0,0.0,654.0
1,Devon Toews-COL,COL,D,66,13,44,57,52,20,158,3,2,10,0,0,54,85,8478038,2021,COL,D,all,66,100462.0,1920.0,74.2,0.55,0.54,0.53,0.53,0.53,0.53,109.0,139.78,9.33,8.65,35.29,4.49,82.63,61.61,8.86,9.42,8.95,11.0,33.0,158.0,44.0,65.0,267.0,57.0,13.0,10.0,2.0,42.0,4.0,56.0,77.0,145.0,189.0,10.0,20.0,0.0,54.0,45.0,34.0,175.0,18.0,9.0,3.96,2.32,3.04,6.0,3.0,4.0,273.19,202.0,205.74,21.0,1.86,2.59,1.56,9.62,9.72,9.37,1920.0,254.0,325.0,339.0,1002.0,218.0,214.0,278.0,1210.0,0.0,0.0,139623.0,20.0,10.0,5.0,85.0,934.11,102.61,99.08,103.91,100.32,995.0,312.0,442.0,1749.0,123.0,72.0,18.0,959.0,229.0,119.0,29.13,27.65,45.82,46.0,34.0,43.0,1791.17,1307.0,1331.54,13.86,16.9,17.08,99.4,100.55,98.37,845.3,83.08,77.38,82.21,76.54,883.0,296.0,352.0,1531.0,77.0,73.0,21.0,862.0,235.0,82.0,23.84,28.98,30.26,22.0,25.0,30.0,1501.08,1179.0,1161.11,14.16,17.85,17.85,79.39,78.52,74.93,128.73,111.84,2457.0,2169.0,0.0,0.0,0.0,0.0,0.0,0.0,388.75
2,Cale Makar-COL,COL,D,77,28,58,86,48,26,240,6,9,25,0,0,95,110,8480069,2021,COL,D,all,77,118586.0,2141.0,103.45,0.62,0.46,0.59,0.48,0.59,0.47,109.0,218.64,15.02,14.71,55.4,6.83,136.26,90.77,14.39,15.22,14.57,29.0,29.0,240.0,79.0,175.0,494.0,86.0,28.0,21.0,3.0,57.0,10.0,97.0,106.0,212.0,291.0,13.0,26.0,0.0,95.0,49.0,40.0,283.0,24.0,12.0,6.8,2.68,5.54,17.0,3.0,8.0,500.37,319.0,322.71,28.0,3.3,5.95,1.07,17.26,17.42,16.75,2141.0,441.0,252.0,384.0,1064.0,185.0,302.0,355.0,1299.0,0.0,0.0,161348.0,26.0,30.0,15.0,110.0,1293.92,144.76,137.67,145.99,138.81,1349.0,458.0,579.0,2386.0,163.0,116.0,25.0,1301.0,351.0,155.0,39.33,42.67,62.78,58.0,41.0,64.0,2423.41,1807.0,1830.12,21.86,27.69,27.59,139.04,140.0,135.16,897.11,90.15,85.33,89.73,84.87,935.0,316.0,378.0,1629.0,75.0,72.0,10.0,919.0,238.0,94.0,25.32,28.97,35.87,27.0,23.0,25.0,1604.56,1251.0,1236.7,13.69,16.69,16.69,87.15,86.64,84.24,117.17,139.52,2492.0,2675.0,0.0,0.0,0.0,0.0,0.0,0.0,595.5
3,Justin Faulk-STL,STL,D,76,16,31,47,41,43,167,6,1,6,0,0,149,101,8475753,2021,STL,D,all,76,105779.0,2352.0,50.63,0.5,0.51,0.49,0.48,0.48,0.49,142.0,152.69,9.99,9.42,38.01,4.85,88.46,68.28,9.67,10.13,9.81,8.0,23.0,167.0,52.0,99.0,318.0,47.0,16.0,10.0,1.0,33.0,3.0,63.0,94.0,151.0,203.0,15.0,33.0,0.0,149.0,37.0,40.0,194.0,19.0,6.0,4.89,2.23,2.87,9.0,3.0,4.0,324.82,219.0,223.86,31.0,1.98,2.17,1.49,10.48,10.57,10.38,2352.0,296.0,270.0,316.0,1470.0,264.0,289.0,296.0,1503.0,0.0,0.0,169880.0,33.0,19.0,8.0,101.0,857.01,88.46,85.62,89.68,86.78,894.0,297.0,356.0,1547.0,121.0,62.0,14.0,845.0,275.0,71.0,28.11,33.39,26.96,42.0,44.0,35.0,1576.82,1191.0,1210.69,13.12,13.85,13.26,88.24,89.14,87.63,936.47,89.1,85.72,89.02,85.63,932.0,370.0,317.0,1619.0,79.0,77.0,13.0,950.0,284.0,68.0,30.47,35.36,23.26,29.0,28.0,22.0,1606.82,1302.0,1293.93,14.47,16.77,16.99,86.59,86.44,84.93,150.11,143.4,2450.0,2645.0,0.0,0.0,0.0,0.0,0.0,0.0,390.0
4,Gustav Forsling-FLA,FLA,D,71,10,27,37,41,18,145,1,0,0,0,0,45,86,8478055,2021,FLA,D,all,71,90389.0,2054.0,58.33,0.49,0.58,0.51,0.58,0.51,0.58,191.0,132.27,7.9,7.88,34.29,4.52,78.32,59.1,7.46,8.0,7.55,13.0,14.0,145.0,47.0,104.0,296.0,37.0,10.0,5.0,1.0,46.0,5.0,64.0,62.0,135.0,182.0,9.0,18.0,0.0,45.0,51.0,53.0,171.0,18.0,3.0,4.47,2.14,1.29,6.0,2.0,2.0,302.1,192.0,195.41,40.0,1.64,0.58,0.78,8.76,8.87,8.65,2054.0,171.0,314.0,340.0,1229.0,383.0,253.0,330.0,1088.0,0.0,0.0,168464.0,18.0,20.0,10.0,86.0,827.08,87.93,83.96,88.56,84.6,872.0,258.0,361.0,1491.0,94.0,65.0,14.0,761.0,282.0,87.0,23.41,34.62,29.9,29.0,39.0,26.0,1523.25,1130.0,1147.6,11.82,14.02,13.63,86.12,86.82,84.7,800.83,91.51,83.35,91.48,83.27,819.0,284.0,321.0,1424.0,76.0,87.0,9.0,767.0,234.0,102.0,23.53,29.02,38.96,21.0,25.0,30.0,1406.23,1103.0,1096.49,13.12,19.88,19.69,84.94,84.72,79.62,187.28,133.72,3139.0,2282.0,0.0,0.0,0.0,0.0,0.0,0.0,275.0
5,Alex Goligoski-MIN,MIN,D,72,2,28,30,41,34,80,0,0,4,0,0,56,96,8471274,2021,MIN,D,all,72,81784.0,1605.0,40.1,0.53,0.5,0.53,0.5,0.54,0.49,280.0,72.97,3.56,4.36,20.17,2.5,44.5,32.9,3.35,3.58,3.38,11.0,17.0,80.0,28.0,65.0,173.0,30.0,2.0,7.0,0.0,22.0,3.0,39.0,35.0,78.0,106.0,17.0,34.0,0.0,56.0,6.0,31.0,104.0,2.0,2.0,2.6,0.18,0.79,2.0,0.0,0.0,175.9,108.0,109.0,25.0,0.92,1.53,0.16,4.32,4.35,4.12,1605.0,183.0,180.0,246.0,996.0,212.0,210.0,231.0,952.0,0.0,0.0,181340.0,34.0,16.0,8.0,96.0,749.54,72.86,69.57,73.74,70.43,788.0,256.0,315.0,1359.0,97.0,57.0,17.0,770.0,208.0,66.0,24.29,25.38,23.19,44.0,29.0,24.0,1382.15,1044.0,1058.55,11.54,14.99,14.37,69.96,70.8,68.77,641.52,63.44,61.51,62.86,60.93,652.0,244.0,307.0,1203.0,62.0,38.0,6.0,647.0,188.0,61.0,18.77,23.19,21.49,19.0,22.0,21.0,1186.6,896.0,887.99,9.56,8.22,8.22,64.72,64.1,62.84,151.88,151.52,2793.0,2838.0,0.0,0.0,0.0,0.0,0.0,0.0,255.0
6,Aaron Ekblad-FLA,FLA,D,61,15,42,57,38,26,180,2,3,17,0,3,62,69,8477932,2021,FLA,D,all,61,91212.0,1831.0,73.1,0.58,0.52,0.58,0.54,0.58,0.53,84.0,159.67,10.72,11.4,39.37,5.29,99.37,66.85,9.96,10.68,9.92,13.0,29.0,180.0,53.0,84.0,317.0,57.0,15.0,18.0,1.0,36.0,4.0,82.0,78.0,165.0,218.0,13.0,26.0,0.0,62.0,43.0,73.0,199.0,26.0,8.0,4.88,3.22,2.61,9.0,4.0,2.0,318.84,233.0,233.33,46.0,2.58,3.84,1.95,11.35,11.32,10.85,1831.0,366.0,297.0,332.0,836.0,162.0,199.0,331.0,1139.0,0.0,0.0,131282.0,26.0,16.0,8.0,69.0,939.51,109.86,103.91,110.3,104.33,973.0,312.0,411.0,1696.0,125.0,89.0,23.0,852.0,322.0,111.0,26.35,40.23,43.28,35.0,40.0,50.0,1716.91,1285.0,1294.92,16.16,21.64,21.75,104.26,104.81,101.39,683.29,81.04,75.27,80.84,75.08,742.0,191.0,318.0,1251.0,72.0,73.0,9.0,628.0,209.0,96.0,18.87,26.21,35.97,21.0,30.0,21.0,1237.61,933.0,927.04,11.47,17.51,17.51,75.0,74.85,72.42,124.07,112.5,2269.0,1946.0,0.0,0.0,0.0,0.0,0.0,0.0,392.25
7,Aleksander Barkov-FLA,FLA,C,67,39,49,88,36,18,214,5,12,14,4,1,50,42,8477493,2021,FLA,C,all,67,81619.0,1559.0,97.14,0.61,0.52,0.6,0.53,0.59,0.53,119.0,206.05,28.66,15.54,41.03,6.31,107.86,74.59,27.22,28.81,27.37,26.0,23.0,214.0,60.0,66.0,340.0,88.0,39.0,20.0,8.0,46.0,3.0,90.0,76.0,175.0,235.0,9.0,18.0,713.0,50.0,59.0,51.0,150.0,96.0,28.0,5.32,12.2,11.15,7.0,15.0,17.0,344.24,274.0,276.82,16.0,3.64,5.57,5.8,26.51,26.65,25.75,1559.0,281.0,268.0,284.0,726.0,171.0,146.0,290.0,952.0,713.0,539.0,162975.0,18.0,14.0,7.0,42.0,901.11,108.22,101.08,109.05,101.91,942.0,297.0,376.0,1615.0,126.0,78.0,22.0,815.0,317.0,107.0,24.96,39.6,43.65,32.0,43.0,51.0,1641.95,1239.0,1255.01,15.74,20.96,20.96,102.99,103.8,99.3,617.87,69.34,66.5,68.99,66.14,635.0,216.0,236.0,1087.0,66.0,65.0,11.0,594.0,179.0,78.0,18.74,22.45,28.16,18.0,24.0,24.0,1074.03,851.0,843.79,10.19,13.84,13.84,65.69,65.32,63.51,154.16,144.21,2774.0,2435.0,0.0,0.0,0.0,0.0,0.0,0.0,649.4
8,Mikko Rantanen-COL,COL,RW,75,36,56,92,35,56,254,2,16,19,0,0,58,43,8478420,2021,COL,R,all,75,94355.0,1808.0,105.19,0.64,0.45,0.62,0.47,0.62,0.46,164.0,260.73,30.68,19.51,55.07,8.55,152.6,97.6,29.64,30.86,29.83,37.0,19.0,254.0,110.0,95.0,459.0,92.0,36.0,19.0,6.0,47.0,14.0,110.0,138.0,218.0,328.0,27.0,56.0,164.0,58.0,48.0,49.0,256.0,76.0,32.0,8.87,8.92,12.89,8.0,11.0,17.0,465.06,364.0,368.32,13.0,4.52,3.85,4.15,31.06,31.23,30.3,1808.0,425.0,178.0,348.0,857.0,184.0,233.0,333.0,1058.0,164.0,190.0,178430.0,56.0,64.0,32.0,43.0,1109.68,128.64,122.08,129.76,123.14,1152.0,402.0,485.0,2039.0,159.0,97.0,24.0,1106.0,306.0,142.0,33.89,36.68,58.07,53.0,44.0,62.0,2069.57,1554.0,1573.73,19.5,23.45,23.51,124.63,125.57,121.01,684.0,71.23,67.74,70.57,67.09,729.0,222.0,275.0,1226.0,70.0,53.0,16.0,672.0,202.0,77.0,18.52,24.64,28.07,21.0,20.0,29.0,1207.48,951.0,940.11,9.9,12.37,12.37,68.76,68.19,66.15,125.15,153.48,2655.0,2978.0,0.0,0.0,0.0,0.0,0.0,0.0,562.25
9,Jaccob Slavin-CAR,CAR,D,79,4,38,42,35,10,165,0,1,6,0,2,50,121,8476958,2021,CAR,D,all,79,111532.0,2072.0,63.92,0.56,0.56,0.54,0.57,0.54,0.56,122.0,164.26,8.67,9.96,43.55,5.39,95.96,73.48,8.39,8.65,8.38,14.0,24.0,165.0,72.0,104.0,341.0,42.0,4.0,9.0,1.0,40.0,7.0,88.0,89.0,161.0,233.0,5.0,10.0,0.0,50.0,74.0,43.0,219.0,14.0,4.0,5.04,1.79,1.84,2.0,0.0,2.0,342.42,237.0,238.11,31.0,2.16,2.2,0.56,10.27,10.28,10.04,2072.0,245.0,287.0,334.0,1206.0,278.0,223.0,326.0,1245.0,0.0,0.0,175831.0,10.0,14.0,7.0,121.0,1024.34,111.11,107.0,112.04,107.89,1014.0,409.0,392.0,1815.0,107.0,85.0,22.0,997.0,310.0,116.0,29.76,37.75,43.59,25.0,42.0,40.0,1840.19,1423.0,1442.03,15.36,19.56,19.77,106.69,107.47,105.22,888.31,86.68,83.35,86.27,82.94,867.0,350.0,352.0,1569.0,74.0,85.0,18.0,888.0,247.0,82.0,26.62,31.0,29.06,26.0,23.0,25.0,1551.14,1217.0,1205.54,13.82,19.19,19.12,81.3,80.85,78.87,168.63,130.24,3116.0,2327.0,0.0,0.0,0.0,0.0,0.0,0.0,323.75


The following block takes the yearly data and turns it into ML-readable data. For this project, I am creating different models that use data from the past one year, the past two years, and the past three years, and seeing how much they differ in terms of efficacy.

In [5]:
ml_data_one_year = pd.DataFrame()
ml_data_two_year = pd.DataFrame()
ml_data_three_year = pd.DataFrame()
for i in range(2011, 2022):
    arr = [yearly_player_data[i-2011]]
    points_df = yearly_player_data[i-2010]
    temp = mx.merge_dataframes(arr, points_df)
    ml_data_one_year = pd.concat([ml_data_one_year, temp], ignore_index=True)
    
for i in range(2012, 2022):
    arr = [yearly_player_data[i-2012], yearly_player_data[i-2011]]
    points_df = yearly_player_data[i-2010]
    temp = mx.merge_dataframes(arr, points_df)
    ml_data_two_year = pd.concat([ml_data_two_year, temp], ignore_index=True)
    
for i in range(2013, 2022):
    arr = [yearly_player_data[i-2013], yearly_player_data[i-2012], yearly_player_data[i-2011]]
    points_df = yearly_player_data[i-2010]
    temp = mx.merge_dataframes(arr, points_df)
    ml_data_three_year = pd.concat([ml_data_three_year, temp], ignore_index=True)



Section 3:

Now that we have the data that ml models can read, we can now train the models. For this project, I'm using multi-layer perceptrons (MLPRegressor) and Random Forests (RandomForestRegressor). I'm making six total models: a MLP each for the one- two- and three-year data, and a Random Forest each for the one- two- and three-year data.

ONE YEAR:

In [6]:
arr = mx.separate_fantasy_points(ml_data_one_year)
X = mx.reformat_df(arr[0])
y = arr[1]

X_train_one, X_test_one, y_train_one, y_test_one = train_test_split(X, y)

one_year_regr = MLPRegressor(max_iter=1000)
one_year_regr.fit(X_train_one, y_train_one)

one_year_RF = RandomForestRegressor()
one_year_RF.fit(X_train_one, y_train_one)



RandomForestRegressor()

TWO YEAR:

In [7]:
arr = mx.separate_fantasy_points(ml_data_two_year)
X = mx.reformat_df(arr[0])
y = arr[1]

X_train_two, X_test_two, y_train_two, y_test_two = train_test_split(X, y)

two_year_regr = MLPRegressor(max_iter=1000)
two_year_regr.fit(X_train_two, y_train_two)

two_year_RF = RandomForestRegressor()
two_year_RF.fit(X_train_two, y_train_two)



RandomForestRegressor()

THREE YEAR:

In [8]:
arr = mx.separate_fantasy_points(ml_data_three_year)
X = mx.reformat_df(arr[0])
y = arr[1]

X_train_three, X_test_three, y_train_three, y_test_three = train_test_split(X, y)

three_year_regr = MLPRegressor(max_iter=1000)
three_year_regr.fit(X_train_three, y_train_three)

three_year_RF = RandomForestRegressor()
three_year_RF.fit(X_train_three, y_train_three)



RandomForestRegressor()

Section 4: Analysis

Now that we have the models trained, we can analyze them. We'll be analyzing the data in two ways: first, we'll see how accurate the actual points predictions are using the mean_absolute_error.

In [9]:
y_pred = one_year_regr.predict(X_test_one)
print(mean_absolute_error(y_test_one, y_pred))

y_pred = two_year_regr.predict(X_test_two)
print(mean_absolute_error(y_test_two, y_pred))

y_pred = three_year_regr.predict(X_test_three)
print(mean_absolute_error(y_test_three, y_pred))

y_pred = one_year_RF.predict(X_test_one)
print(mean_absolute_error(y_test_one, y_pred))

y_pred = two_year_RF.predict(X_test_two)
print(mean_absolute_error(y_test_two, y_pred))

y_pred = three_year_RF.predict(X_test_three)
print(mean_absolute_error(y_test_three, y_pred))

83.40516968497543
83.9488374287591
74.22160201494788
78.47220397111913
78.18138402061855
80.01302877697843


The MAE for all of the models range between around 70 and around 95. Given that most players in the league finish with a fantasy point total in the hundreds, we can see that the predicted points values aren't very accurate to the real-life values. However, we are less concerned with the actual points total that a player will have, and more concerned with their rank within the rest of the league. To look at this, we will rank the players both in terms of predicted fantasy points for a season, and actual fantasy points for a season. (I'll incorporate this at a later time)

Section 5: Predictions for Next Year

Now that we've taken a look at the accuracy of the model, we'll see what the models think will happen in the 2022-2023 season. 

In [10]:
one_year_pred = yearly_player_data[11].copy()
one_year_pred.drop(columns=["Fantasy_Points"], inplace=True)
one_year_df = mx.get_name_predictions(one_year_regr, one_year_pred)
one_year_df.sort_values(by="Prediction", ascending=False, inplace=True)
#pd.options.display.max_rows = 978
display(one_year_df)


Unnamed: 0,Name,Prediction
48,Auston Matthews-TOR,367.692412
25,Patrice Bergeron-BOS,339.990438
7,Aleksander Barkov-FLA,319.172022
37,Nathan MacKinnon-COL,303.619712
0,Elias Lindholm-CGY,302.65234
83,David Pastrnak-BOS,293.428187
56,Sebastian Aho-CAR,290.855385
18,Connor McDavid-EDM,288.357428
20,Gabriel Landeskog-COL,288.146526
405,John Tavares-TOR,285.435678


In [11]:
two_year_pred = [yearly_player_data[i] for i in [10,11]]
two_year_pred = mx.merge_dataframes(two_year_pred, yearly_player_data[11])
two_year_pred.drop(columns=["Predicted_Fantasy_Points"], inplace=True)
two_year_df = mx.get_name_predictions(two_year_regr, two_year_pred)
two_year_df.sort_values(by="Prediction", ascending=False, inplace=True)
#pd.options.display.max_rows = 978
display(two_year_df)

Unnamed: 0,Name,Prediction
0,Auston Matthews-TOR,411.503742
7,Aleksander Barkov-FLA,379.141166
2,Leon Draisaitl-EDM,374.008336
1,Connor McDavid-EDM,371.924
15,Patrice Bergeron-BOS,368.813778
21,Nathan MacKinnon-COL,368.14901
3,Mikko Rantanen-COL,355.664922
214,Tyler Seguin-DAL,344.002712
26,Elias Lindholm-CGY,339.576997
55,Roope Hintz-DAL,330.012268


In [12]:
three_year_pred = [yearly_player_data[i] for i in [9,10,11]]
three_year_pred = mx.merge_dataframes(three_year_pred, yearly_player_data[11])
three_year_pred.drop(columns=["Predicted_Fantasy_Points"], inplace=True)
three_year_df = mx.get_name_predictions(three_year_regr, three_year_pred)
three_year_df.sort_values(by="Prediction", ascending=False, inplace=True)
#pd.options.display.max_rows = 978
display(three_year_df)

Unnamed: 0,Name,Prediction
1,Auston Matthews-TOR,351.749761
10,Patrice Bergeron-BOS,317.096773
6,Nathan MacKinnon-COL,305.963115
2,Leon Draisaitl-EDM,296.904605
41,Aleksander Barkov-FLA,295.263438
7,Connor McDavid-EDM,295.111459
48,Mikko Rantanen-COL,290.611996
36,Gabriel Landeskog-COL,287.442009
26,Andrei Svechnikov-CAR,280.405776
49,Roope Hintz-DAL,279.864844


In [13]:
one_year_pred = yearly_player_data[11].copy()
one_year_pred.drop(columns=["Fantasy_Points"], inplace=True)
one_year_df = mx.get_name_predictions(one_year_RF, one_year_pred)
one_year_df.sort_values(by="Prediction", ascending=False, inplace=True)
#pd.options.display.max_rows = 978
display(one_year_df)

Unnamed: 0,Name,Prediction
18,Connor McDavid-EDM,424.2775
48,Auston Matthews-TOR,409.964
7,Aleksander Barkov-FLA,405.832
83,David Pastrnak-BOS,405.3735
20,Gabriel Landeskog-COL,405.158
37,Nathan MacKinnon-COL,404.815
477,Mark Scheifele-WPG,400.602
8,Mikko Rantanen-COL,397.0615
53,Sidney Crosby-PIT,394.831
19,Kirill Kaprizov-MIN,391.6575


In [14]:
two_year_pred = [yearly_player_data[i] for i in [10,11]]
two_year_pred = mx.merge_dataframes(two_year_pred, yearly_player_data[11])
two_year_pred.drop(columns=["Predicted_Fantasy_Points"], inplace=True)
two_year_df = mx.get_name_predictions(two_year_RF, two_year_pred)
two_year_df.sort_values(by="Prediction", ascending=False, inplace=True)
#pd.options.display.max_rows = 978
display(two_year_df)

Unnamed: 0,Name,Prediction
15,Patrice Bergeron-BOS,413.9755
7,Aleksander Barkov-FLA,413.898
16,Bryan Rust-PIT,404.6885
3,Mikko Rantanen-COL,403.909
11,Mika Zibanejad-NYR,403.842
0,Auston Matthews-TOR,396.1555
20,David Pastrnak-BOS,395.0655
9,Sidney Crosby-PIT,394.625
21,Nathan MacKinnon-COL,392.851
77,Mats Zuccarello-MIN,390.275


In [15]:
three_year_pred = [yearly_player_data[i] for i in [9,10,11]]
three_year_pred = mx.merge_dataframes(three_year_pred, yearly_player_data[11])
three_year_pred.drop(columns=["Predicted_Fantasy_Points"], inplace=True)
three_year_df = mx.get_name_predictions(three_year_RF, three_year_pred)
three_year_df.sort_values(by="Prediction", ascending=False, inplace=True)
#pd.options.display.max_rows = 978
display(three_year_df)

Unnamed: 0,Name,Prediction
0,David Pastrnak-BOS,395.036
10,Patrice Bergeron-BOS,392.368
14,Brad Marchand-BOS,391.7775
36,Gabriel Landeskog-COL,389.931
1,Auston Matthews-TOR,378.0025
6,Nathan MacKinnon-COL,377.373
41,Aleksander Barkov-FLA,376.863
48,Mikko Rantanen-COL,369.214
7,Connor McDavid-EDM,366.3115
9,Artemi Panarin-NYR,361.434


Section 6: Conclusion and Next Steps

The predictions made by all of these models make sense; all of the predicted top players are still some of the top players in the league this year, and many of the predictions match up with predictions made by ESPN. One step that could be done is aggregating the six models together to get an average points prediction, and listing the players that way. Another thing that can be done to improve the models is incorporate injury data; there are some elite players that were injured in some part of the past three years, and their predictions are more pessemistic than other players. Another improvement could be to try and scale for the COVID-shortened 2019-2020 and 2020-2021 seasons. There were many logistical issues that contributed to fewer games and lower scoring in those years, and a scaling of goal/assist values could be beneficial to the models.

Overall, I'm happy with how the models performed. After the 2022-2023 regular season, I will see how well they were able to predict some of the outliers, and I'll use that new data to make a prediction for the 2023-2024 season.

ACKNOWLEDGEMENTS:

Thank you to Rotowire and Moneypuck for making your NHL data easy for someone like me to utilize in a project like this, and thank you to Peter Tanner of Moneypuck not just for creating such a valuable resource, but for being responsive to questions I was having about your dataset.