This is version 2 of my Fantasy Hockey Analyzer. The purpose of this notebook is to predict the number of fantasy points every hockey player in the league will get based on previous years' performance.

This notebook primarily uses data from moneypuck.com for analysis, and it also uses data from rotowire.com to get +/- for each player.

Section 1: Parameters and Modules

These are the variables that can be adjusted. My model is an ensemble model consisting of neural nets and random forests, with data going back one, two, and three years.

In [None]:
# Set these values to the appropriate ammonts

current_year = 2025
common_number = 0
number_of_one_year_neural_nets = common_number
number_of_two_year_neural_nets = common_number
number_of_three_year_neural_nets = common_number
number_of_one_year_random_forests = common_number
number_of_two_year_random_forests = common_number
number_of_three_year_random_forests = common_number

# TODO: Make it so that this deletes all models and predictions
# Sets whether any models should be deleted and written over
create_new_models = True

# This is the breakdown of how many fantasy points a player gets for each category
points_dictionary = {
    'Goals':5, 
    'Assists':3, 
    '+/-':1.5, 
    'PIM':-0.25, 
    'PP_Goals':4, 
    'PP_Assists':2, 
    'SH_Goals':6, #won't count SHG from 5-on-3
    'SH_Assists':4, 
    'Faceoffs_Won':0.25, 
    'Faceoffs_Lost':-0.15, 
    'Hits':0.5, 
    'Blocked_Shots':0.75
    }

The following is a list of modules that I used and the reason why they were used:

-os: to allow the program to read data in the repository

-numpy: basic math operations

-pandas: all dataframe operations/data storage/data cleaning

-various sklearn: all machine learning operations/analysis

In addition to these modules, I also have a custom module that contains helper functions that help in data cleaning/accuracy evaluation. These functions are contained in the "my_module.py" file in the repository. If you are interested in taking a look at these functions, they are available at https://github.com/chrisberry888/FantasyHockeyAnalyzer in the "my_module.py" file.

In [2]:
#Import block
import os
import numpy as np
import pandas as pd
from sklearn.neural_network import MLPRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error
from sklearn.base import clone
import joblib
import my_module_v2 as mx

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None) 

Section 2: Data Gathering and Cleaning

This section compiles the Moneypuck and Rotowire data into a format that is usable by the ML models.

There is some discrepencies between stats on ESPN and on moneypuck. These shouldn't alter the fantasy points too much. For example, Sydney Crosby is short-changed 1 faceoff win but has one additional hit in moneypuck than on ESPN, so in my model he has 0.10 more points than he does on ESPN. I would like to see why this is the case in the future/get fully accurate data (perhaps from ESPN themselves), but for now I am ok with this very small error.

In [3]:
yearly_player_data = []

for year in range(2010, current_year):
    moneypuck_data = mx.get_moneypuck_data(year)
    rotowire_data = mx.get_rotowire_data(year)
    combined_data = mx.combine_dataframes(moneypuck_data, rotowire_data)
    this_years_data = mx.calculate_fantasy_points(combined_data, points_dictionary)
    yearly_player_data.append(this_years_data)

    

In [4]:
player_id_table = mx.get_player_id_table(yearly_player_data)

This next cell compiles the yearly data into chunks of one, two, and three-year data to be used by the ML models.

In [5]:
ml_data_one_year = mx.get_ml_data(yearly_player_data, current_year, 1)
ml_data_two_years = mx.get_ml_data(yearly_player_data, current_year, 2)
ml_data_three_years = mx.get_ml_data(yearly_player_data, current_year, 3)

  df = df.fillna(False)
  df = df.fillna(False)
  df = df.fillna(False)


The data is now ready to be used to train the ML model.

In [6]:

one_year_X, one_year_y = mx.separate_fantasy_points(ml_data_one_year)
two_year_X, two_year_y = mx.separate_fantasy_points(ml_data_two_years)
three_year_X, three_year_y = mx.separate_fantasy_points(ml_data_three_years)

In [7]:
one_year_neural_net_args = (
    one_year_X,
    one_year_y,
    MLPRegressor(max_iter=1000),
    2, #number_of_one_year_neural_nets,
    '/1_year/neural_nets'
)

one_year_random_forest_args = (
    one_year_X,
    one_year_y,
    RandomForestRegressor(),
    1, #number_of_one_year_random_forests,
	'/1_year/random_forests'
)

two_year_neural_net_args = (
    two_year_X,
    two_year_y,
    MLPRegressor(max_iter=1000),
    number_of_two_year_neural_nets,
	'/2_year/neural_nets'
)

two_year_random_forest_args = (
    two_year_X,
    two_year_y,
    RandomForestRegressor(),
    number_of_two_year_random_forests,
	'/2_year/random_forests'
)

three_year_neural_net_args = (
    three_year_X,
    three_year_y,
    MLPRegressor(max_iter=1000),
    1, #number_of_three_year_neural_nets,
	'/3_year/neural_nets'
)

three_year_random_forest_args = (
    three_year_X,
    three_year_y,
    RandomForestRegressor(),
    number_of_three_year_random_forests,
	'/3_year/random_forests'
)

In [8]:

if create_new_models:
    mx.create_models(*one_year_neural_net_args)
    mx.create_models(*one_year_random_forest_args)
    mx.create_models(*two_year_neural_net_args)
    mx.create_models(*two_year_random_forest_args)
    mx.create_models(*three_year_neural_net_args)
    mx.create_models(*three_year_random_forest_args)
    


Now we generate the table with final predictions.

In [9]:
current_one_year_X = mx.get_final_year_data(yearly_player_data, 1)
current_two_year_X = mx.get_final_year_data(yearly_player_data, 2)
current_three_year_X = mx.get_final_year_data(yearly_player_data, 3)

In [10]:
path = os.getcwd() + '/models/1_year/neural_nets/model_0.joblib'
model = joblib.load(path)
table = mx.get_prediction_table([model], current_one_year_X, player_id_table)

In [11]:
final_table_inputs = (
    (
        current_one_year_X,
        current_two_year_X,
        current_three_year_X
    ),
    player_id_table
)
mx.generate_predictions(*final_table_inputs)

In [12]:
mx.generate_final_table()

In [13]:
mx.get_final_table().head()

Unnamed: 0,playerId,name,prediction
0,8477492,Nathan MacKinnon_COL_2024_C,464.457689
1,8478402,Connor McDavid_EDM_2024_C,340.885971
2,8476389,Vincent Trocheck_NYR_2024_C,330.951733
3,8477946,Dylan Larkin_DET_2024_C,328.735544
4,8477956,David Pastrnak_BOS_2024_R,316.688152
