This is version 2 of my Fantasy Hockey Analyzer. The purpose of this notebook is to predict the number of fantasy points every hockey player in the league will get based on previous years' performance.

This notebook primarily uses data from moneypuck.com for analysis, and it also uses data from rotowire.com to get +/- for each player.

Section 1: Parameters and Modules

These are the variables that can be adjusted. My model is an ensemble model consisting of neural nets and random forests, with data going back one, two, and three years.

In [1]:
# Set these values to the appropriate ammonts

current_year = 2025
common_number = 100
number_of_one_year_neural_nets = common_number
number_of_two_year_neural_nets = common_number
number_of_three_year_neural_nets = common_number
number_of_one_year_random_forests = common_number
number_of_two_year_random_forests = common_number
number_of_three_year_random_forests = common_number

# This is the breakdown of how many fantasy points a player gets for each category
points_dictionary = {
    'Goals':5, 
    'Assists':3, 
    '+/-':1.5, 
    'PIM':-0.25, 
    'PP_Goals':4, 
    'PP_Assists':2, 
    'SH_Goals':6, #won't count SHG from 5-on-3
    'SH_Assists':4, 
    'Faceoffs_Won':0.25, 
    'Faceoffs_Lost':-0.15, 
    'Hits':0.5, 
    'Blocked_Shots':0.75
    }

The following is a list of modules that I used and the reason why they were used:

-os: to allow the program to read data in the repository

-numpy: basic math operations

-pandas: all dataframe operations/data storage/data cleaning

-various sklearn: all machine learning operations/analysis

In addition to these modules, I also have a custom module that contains helper functions that help in data cleaning/accuracy evaluation. These functions are contained in the "my_module.py" file in the repository. If you are interested in taking a look at these functions, they are available at https://github.com/chrisberry888/FantasyHockeyAnalyzer in the "my_module.py" file.

In [2]:
#Import block
import os
import numpy as np
import pandas as pd
from sklearn.neural_network import MLPRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error
from sklearn.base import clone
import my_module_v2 as mx

pd.set_option('display.max_columns', None)

Section 2: Data Gathering and Cleaning

This section compiles the Moneypuck and Rotowire data into a format that is usable by the ML models.

There is some discrepencies between stats on ESPN and on moneypuck. These shouldn't alter the fantasy points too much. For example, Sydney Crosby is short-changed 1 power-play goal and a few power-play assists. Not sure how ti fix this.

In [16]:
yearly_player_data = []

mp_teams = []
rw_teams = []

for year in range(2024, current_year):

    moneypuck_data = mx.get_moneypuck_data(year)
    rotowire_data = mx.get_rotowire_data(year)
    combined_data = mx.combine_dataframes(moneypuck_data, rotowire_data)
    final_df = mx.calculate_fantasy_points(combined_data, points_dictionary)
    
    # display(final_df.head())

    display(final_df[['name', 'Fantasy_Points', 'other_I_F_goals', 'full_strength_I_F_goals', 'PP_Goals', 'SH_Goals'] + list(points_dictionary.keys())].head())
    

Unnamed: 0,name,Fantasy_Points,other_I_F_goals,full_strength_I_F_goals,PP_Goals,SH_Goals,Goals,Assists,+/-,PIM,PP_Goals.1,PP_Assists,SH_Goals.1,SH_Assists,Faceoffs_Won,Faceoffs_Lost,Hits,Blocked_Shots
0,Ryan Suter,160.75,0.0,1.0,0.0,1.0,2.0,13.0,7,24.0,0.0,0.0,1.0,0.0,0.0,0.0,39.0,109.0
1,Brent Burns,187.5,0.0,6.0,0.0,0.0,6.0,23.0,7,28.0,0.0,3.0,0.0,0.0,0.0,0.0,11.0,98.0
2,Corey Perry,194.6,1.0,13.0,4.0,0.0,19.0,11.0,12,51.0,4.0,3.0,0.0,0.0,17.0,16.0,36.0,26.0
3,Alex Ovechkin,450.45,12.0,21.0,11.0,0.0,44.0,29.0,15,14.0,11.0,8.0,0.0,0.0,0.0,2.0,110.0,13.0
4,Sidney Crosby,567.1,6.0,16.0,9.0,0.0,33.0,58.0,-20,31.0,9.0,14.0,0.0,0.0,1016.0,766.0,67.0,39.0
