People always look at a player's performance over the past season or several as an indicator of his skills. But how accurate is it actually at predicting the same player's performance in the new season? How many seasons back should be look? And how much the player's team can affect it? Those are the main topics this notebook aims to research.

At first, we are going to test out a simple model where we are only interested in the points (goals + assists) and, therefore, only the skaters. If the model shows any promising result, we can attempt expanding it to the other key performance indicators as well.

In [1]:
# Importing standard packages for data exploration and processing.
import numpy as np
import pandas as pd

pd.set_option("display.max_rows", None)
pd.set_option("display.max_columns", None)


data = pd.read_csv('../data/players/skaters_season.csv')
data.head()

Unnamed: 0,Profile,Player,Position,Season,Year,Team,Number,Games,Goals,Assists,Points,Plus_minus,Plus,Minus,Penalties,Goals_even,Goals_powerplay,Goals_shorthanded,Goals_overtime,Game_winning_goals,Game_winning_shootouts,Shots,Shots_percentage,Shots_game,Faceoffs,Faceoffs_won,Faceoffs_percentage,Icetime_game,Icetime_game_seconds,Shifts_game,Hits,Shots_blocked,Penalties_against
0,https://en.khl.ru/players/16673/,Sergei Abramov,Skater,Regular season,2014/2015,Amur (Khabarovsk),93.0,13,1,0,1,-4,1,5,6,1,0,0,0,0,0,11,9.1,0.8,0,0,,6:57,417,9.3,1.0,2.0,1.0
1,https://en.khl.ru/players/16673/,Sergei Abramov,Skater,Regular season,2013/2014,Amur (Khabarovsk),91.0,12,0,0,0,0,1,1,0,0,0,0,0,0,0,14,0.0,1.2,1,0,0.0,6:15,375,8.0,,,
2,https://en.khl.ru/players/19200/,Dmitry Ambrozheichik,Skater,Regular season,2017/2018,Dinamo (Minsk),63.0,8,0,0,0,-1,1,2,0,0,0,0,0,0,0,6,0.0,0.8,0,0,,6:00,360,8.2,2.0,2.0,0.0
3,https://en.khl.ru/players/19200/,Dmitry Ambrozheichik,Skater,Regular season,2016/2017,Dinamo (Minsk),15.0,20,3,1,4,1,5,4,10,3,0,0,0,0,0,21,14.3,1.1,7,3,42.9,9:43,583,12.7,10.0,7.0,9.0
4,https://en.khl.ru/players/19200/,Dmitry Ambrozheichik,Skater,Regular season,2015/2016,Dinamo (Minsk),24.0,11,1,0,1,1,3,2,0,1,0,0,0,0,0,5,20.0,0.5,0,0,,4:43,283,8.5,1.0,0.0,1.0


We will definitely need the total time on ice over the season. After all, two players might be equally skilled but one of them simply gets much more icetime and thus gets more points. What we are going to use is not the points over the season but really a standartised amount of points over a certain interval. For ease of browse, let us set the interval as 60 minutes (standard match length) the same as with goalies.

In [2]:
data['Icetime'] = data['Games'] * data['Icetime_game_seconds'] / 3600
data['Points_average'] = data['Points'] / data['Icetime']
data.head()

Unnamed: 0,Profile,Player,Position,Season,Year,Team,Number,Games,Goals,Assists,Points,Plus_minus,Plus,Minus,Penalties,Goals_even,Goals_powerplay,Goals_shorthanded,Goals_overtime,Game_winning_goals,Game_winning_shootouts,Shots,Shots_percentage,Shots_game,Faceoffs,Faceoffs_won,Faceoffs_percentage,Icetime_game,Icetime_game_seconds,Shifts_game,Hits,Shots_blocked,Penalties_against,Icetime,Points_average
0,https://en.khl.ru/players/16673/,Sergei Abramov,Skater,Regular season,2014/2015,Amur (Khabarovsk),93.0,13,1,0,1,-4,1,5,6,1,0,0,0,0,0,11,9.1,0.8,0,0,,6:57,417,9.3,1.0,2.0,1.0,1.505833,0.664084
1,https://en.khl.ru/players/16673/,Sergei Abramov,Skater,Regular season,2013/2014,Amur (Khabarovsk),91.0,12,0,0,0,0,1,1,0,0,0,0,0,0,0,14,0.0,1.2,1,0,0.0,6:15,375,8.0,,,,1.25,0.0
2,https://en.khl.ru/players/19200/,Dmitry Ambrozheichik,Skater,Regular season,2017/2018,Dinamo (Minsk),63.0,8,0,0,0,-1,1,2,0,0,0,0,0,0,0,6,0.0,0.8,0,0,,6:00,360,8.2,2.0,2.0,0.0,0.8,0.0
3,https://en.khl.ru/players/19200/,Dmitry Ambrozheichik,Skater,Regular season,2016/2017,Dinamo (Minsk),15.0,20,3,1,4,1,5,4,10,3,0,0,0,0,0,21,14.3,1.1,7,3,42.9,9:43,583,12.7,10.0,7.0,9.0,3.238889,1.234991
4,https://en.khl.ru/players/19200/,Dmitry Ambrozheichik,Skater,Regular season,2015/2016,Dinamo (Minsk),24.0,11,1,0,1,1,3,2,0,1,0,0,0,0,0,5,20.0,0.5,0,0,,4:43,283,8.5,1.0,0.0,1.0,0.864722,1.156441


Since we are using averages we need to ensure that all players have participated at a certain bare mininum during the season. This could be accounted for in two ways, based off either games played or icetime recorded. For now, icetime like a good choice. Let us set the minimum requirement at 2 hours.

On a related note, let us drop all playoff seasons from the data. Not only they tend to be fairly short and would be mostly sorted out based on the icetime required, but the playoff matches tend to behave somewhat differently than the regular season.

In [3]:
data = data[data['Season'] == 'Regular season']
data = data[data['Icetime'] >= 2]

We are going to first try predicting based off the latest two seasons that player has participated in. Important note - those seasons are not necessarily the last ones as a player could not participate in some seasons or not participate enough to be included in our analysis. And since we need the values for at least the current and two latest seasons for each player, players with less than 3 seasons in the data have to be dropped altogether.

In [5]:
data = data.groupby('Profile').filter(lambda x: len(x) > 2)

In [7]:
# We can drop all unnecessary columns now.
data.columns

Index(['Profile', 'Player', 'Position', 'Season', 'Year', 'Team', 'Number',
       'Games', 'Goals', 'Assists', 'Points', 'Plus_minus', 'Plus', 'Minus',
       'Penalties', 'Goals_even', 'Goals_powerplay', 'Goals_shorthanded',
       'Goals_overtime', 'Game_winning_goals', 'Game_winning_shootouts',
       'Shots', 'Shots_percentage', 'Shots_game', 'Faceoffs', 'Faceoffs_won',
       'Faceoffs_percentage', 'Icetime_game', 'Icetime_game_seconds',
       'Shifts_game', 'Hits', 'Shots_blocked', 'Penalties_against', 'Icetime',
       'Points_average'],
      dtype='object')

In [8]:
drop_list = ['Position', 'Season', 'Number', 'Games', 'Goals', 'Assists', 'Points', 'Plus_minus', 'Plus',
             'Minus', 'Penalties', 'Goals_even', 'Goals_powerplay', 'Goals_shorthanded', 'Goals_overtime',
             'Game_winning_goals', 'Game_winning_shootouts', 'Shots', 'Shots_percentage', 'Shots_game',
             'Faceoffs', 'Faceoffs_won', 'Faceoffs_percentage', 'Icetime_game', 'Icetime_game_seconds',
             'Shifts_game', 'Hits', 'Shots_blocked', 'Penalties_against', 'Icetime']
data.drop(drop_list, axis=1, inplace=True)

In [9]:
data.head()

Unnamed: 0,Profile,Player,Year,Team,Points_average
6,https://en.khl.ru/players/13714/,Vitaly Anikeyenko,2010/2011,Lokomotiv (Yaroslavl),1.081731
8,https://en.khl.ru/players/13714/,Vitaly Anikeyenko,2009/2010,Lokomotiv (Yaroslavl),0.965262
10,https://en.khl.ru/players/13714/,Vitaly Anikeyenko,2008/2009,Lokomotiv (Yaroslavl),0.96861
20,https://en.khl.ru/players/14763/,Sergei Andronov,2020/2021,CSKA (Moscow),1.811594
21,https://en.khl.ru/players/14763/,Sergei Andronov,2019/2020,CSKA (Moscow),0.450563
