In [1]:
# Importing standard packages for data exploration and processing.

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

This notebook is going to focus on processing the players' season statistics only. The other three files will be processed in separate notebooks.

In [2]:
# Unlike in the Stage 1 notebooks, we are going to create new variables rather than perform the operations in-place here.
# The reason is that we might need to review the original data during processing.

raw_players_season = pd.read_csv('../raw_data/raw_players_season.csv')

In [3]:
# Does everything seem to be alright with the data?

raw_players_season

Unnamed: 0,URL,Player name,Season,Tournament / Team,№,GP,G,Assists,PTS,+/-,...,FOA,W,L,SOP,GA,Sv,%Sv,GAA,SO,TOI
0,https://en.khl.ru/players/16673/,Sergei Abramov,Regular season 2014/2015,Amur (Khabarovsk),93.0,13.0,1.0,0.0,1.0,-4.0,...,1.0,,,,,,,,,
1,https://en.khl.ru/players/16673/,Sergei Abramov,Nadezhda Cup 2013/2014,Amur (Khabarovsk),91.0,2.0,0.0,0.0,0.0,0.0,...,,,,,,,,,,
2,https://en.khl.ru/players/16673/,Sergei Abramov,Regular season 2013/2014,Amur (Khabarovsk),91.0,12.0,0.0,0.0,0.0,0.0,...,,,,,,,,,,
3,https://en.khl.ru/players/16673/,Sergei Abramov,Nadezhda Cup 2012/2013,Amur (Khabarovsk),99.0,0.0,0.0,0.0,0.0,0.0,...,,,,,,,,,,
4,https://en.khl.ru/players/16462/,Maxim Alyapkin,Regular season 2015/2016,Torpedo (Nizhny Novgorod Region),31.0,2.0,0.0,0.0,,,...,,1.0,1.0,0.0,3.0,10.0,76.9,2.98,0.0,60:25
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19669,https://en.khl.ru/players/16217/,Airat Ziazov,Regular season 2009/2010,Neftekhimik (Nizhnekamsk),79.0,1.0,0.0,0.0,0.0,0.0,...,,,,,,,,,,
19670,https://en.khl.ru/players/23656/,Tomislav Zanoski,Regular season 2016/2017,Medvescak (Zagreb),10.0,24.0,2.0,1.0,3.0,-8.0,...,6.0,,,,,,,,,
19671,https://en.khl.ru/players/23656/,Tomislav Zanoski,Regular season 2015/2016,Medvescak (Zagreb),10.0,15.0,3.0,0.0,3.0,1.0,...,2.0,,,,,,,,,
19672,https://en.khl.ru/players/11543/,Alexander Zevakhin,Regular season 2009/2010,Severstal (Cherepovets),15.0,20.0,1.0,1.0,2.0,-5.0,...,,,,,,,,,,


We can already see that there are some issues with missing data and integers stored as floats.

In [4]:
# What would the summary tell us?

raw_players_season.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19674 entries, 0 to 19673
Data columns (total 39 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   URL                19674 non-null  object 
 1   Player name        19674 non-null  object 
 2   Season             19674 non-null  object 
 3   Tournament / Team  19674 non-null  object 
 4   №                  18677 non-null  float64
 5   GP                 19674 non-null  float64
 6   G                  19674 non-null  float64
 7   Assists            19674 non-null  float64
 8   PTS                17753 non-null  float64
 9   +/-                17753 non-null  float64
 10  +                  17753 non-null  float64
 11  -                  17753 non-null  float64
 12  PIM                19674 non-null  float64
 13  ESG                17753 non-null  float64
 14  PPG                17753 non-null  float64
 15  SHG                17753 non-null  float64
 16  OTG                177

We can see that in many columns there is no missing data at all. At the same time, for other columns there is a clear separation into skaters (forwards and defencemen) and goalies.

For example, we can see that season statistics appears to have 7859 rows of data for skaters and 1113 rows for goalies, with a total of 8972 rows. However, there are 8974 rows in the dataframe so 2 rows seem to be unaccounted in either.

Let us find out who is messing up our data. Icetime seems like a good indicator since it must be present for all players who have recorded a match during that season and is stored differently for skaters (average icetime per match) and goalies (total icetime per season).

In [5]:
# We need the rows for which both icetime are null.

raw_players_season[raw_players_season['TOI/G'].isnull() & raw_players_season['TOI'].isnull()]

Unnamed: 0,URL,Player name,Season,Tournament / Team,№,GP,G,Assists,PTS,+/-,...,FOA,W,L,SOP,GA,Sv,%Sv,GAA,SO,TOI
2037,https://en.khl.ru/players/29144/,David Boldizar,Regular season 2018/2019,Slovan (Bratislava),61.0,3.0,0.0,0.0,,,...,,,,,,,,,,
2038,https://en.khl.ru/players/29144/,David Boldizar,Regular season 2017/2018,Slovan (Bratislava),23.0,2.0,0.0,0.0,,,...,,,,,,,,,,


So, David Boldizar from Slovan (Bratislava) is the culprit. I wonder what is going on with him. Thankfully, we have added each player's profile link so we can easily check the original data, and it turns out that the data was not stored properly on the website to begin with.

In [6]:
# What do we know about that specific player?

raw_players_season[raw_players_season['URL'] == 'https://en.khl.ru/players/29144/'].info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2 entries, 2037 to 2038
Data columns (total 39 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   URL                2 non-null      object 
 1   Player name        2 non-null      object 
 2   Season             2 non-null      object 
 3   Tournament / Team  2 non-null      object 
 4   №                  2 non-null      float64
 5   GP                 2 non-null      float64
 6   G                  2 non-null      float64
 7   Assists            2 non-null      float64
 8   PTS                0 non-null      float64
 9   +/-                0 non-null      float64
 10  +                  0 non-null      float64
 11  -                  0 non-null      float64
 12  PIM                2 non-null      float64
 13  ESG                0 non-null      float64
 14  PPG                0 non-null      float64
 15  SHG                0 non-null      float64
 16  OTG                0 non

Most of the data is missing, and not because it is supposed to be a zero. After all, icetime cannot be zero. Therefore, we need to drop this player from our data altogether.

In [7]:
# Once again, we are using the profile link as primary key due to a possibility of matching names.

players_season = raw_players_season[raw_players_season['URL'] != 'https://en.khl.ru/players/29144/']

Now we can create a new column indicating whether a player is a skater or a goalie. Let us use the icetime for the separation.

In [8]:
# Total ice time is only tracked for goalies, so skaters are supposed to have it as null.

players_season['Role'] = np.where(players_season['TOI'].isnull(), 'Skater', 'Goalie')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  players_season['Role'] = np.where(players_season['TOI'].isnull(), 'Skater', 'Goalie')


In [9]:
players_season

Unnamed: 0,URL,Player name,Season,Tournament / Team,№,GP,G,Assists,PTS,+/-,...,W,L,SOP,GA,Sv,%Sv,GAA,SO,TOI,Role
0,https://en.khl.ru/players/16673/,Sergei Abramov,Regular season 2014/2015,Amur (Khabarovsk),93.0,13.0,1.0,0.0,1.0,-4.0,...,,,,,,,,,,Skater
1,https://en.khl.ru/players/16673/,Sergei Abramov,Nadezhda Cup 2013/2014,Amur (Khabarovsk),91.0,2.0,0.0,0.0,0.0,0.0,...,,,,,,,,,,Skater
2,https://en.khl.ru/players/16673/,Sergei Abramov,Regular season 2013/2014,Amur (Khabarovsk),91.0,12.0,0.0,0.0,0.0,0.0,...,,,,,,,,,,Skater
3,https://en.khl.ru/players/16673/,Sergei Abramov,Nadezhda Cup 2012/2013,Amur (Khabarovsk),99.0,0.0,0.0,0.0,0.0,0.0,...,,,,,,,,,,Skater
4,https://en.khl.ru/players/16462/,Maxim Alyapkin,Regular season 2015/2016,Torpedo (Nizhny Novgorod Region),31.0,2.0,0.0,0.0,,,...,1.0,1.0,0.0,3.0,10.0,76.9,2.98,0.0,60:25,Goalie
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19669,https://en.khl.ru/players/16217/,Airat Ziazov,Regular season 2009/2010,Neftekhimik (Nizhnekamsk),79.0,1.0,0.0,0.0,0.0,0.0,...,,,,,,,,,,Skater
19670,https://en.khl.ru/players/23656/,Tomislav Zanoski,Regular season 2016/2017,Medvescak (Zagreb),10.0,24.0,2.0,1.0,3.0,-8.0,...,,,,,,,,,,Skater
19671,https://en.khl.ru/players/23656/,Tomislav Zanoski,Regular season 2015/2016,Medvescak (Zagreb),10.0,15.0,3.0,0.0,3.0,1.0,...,,,,,,,,,,Skater
19672,https://en.khl.ru/players/11543/,Alexander Zevakhin,Regular season 2009/2010,Severstal (Cherepovets),15.0,20.0,1.0,1.0,2.0,-5.0,...,,,,,,,,,,Skater


We would like to fix the floats in columns where we know the values are supposed to be integers. You can't score 3.5 goals after all. However, we can have a problem here since converting data to another type requires that there is no NaN values.

At the same time, the off-season tournaments can mess up our data. If a player have only participated in the off-season tournaments, the rest of his season statistics would end up with NaN values since the off-season statistics is not included. Therefore, let us drop such players from the data despite going through the hassle of including them while scraping the data.

In [10]:
# What off-season tournaments do we have?

players_season['Season'].unique()

array(['Regular season 2014/2015', 'Nadezhda Cup 2013/2014',
       'Regular season 2013/2014', 'Nadezhda Cup 2012/2013',
       'Regular season 2015/2016', 'Regular season 2017/2018',
       'Regular season 2016/2017', 'Playoffs 2010/2011',
       'Regular season 2010/2011', 'Playoffs 2009/2010',
       'Regular season 2009/2010', 'Playoffs 2008/2009',
       'Regular season 2008/2009', 'Regular season 2011/2012',
       'Playoffs 2020/2021', 'Regular season 2020/2021',
       'Playoffs 2019/2020', 'Regular season 2019/2020',
       'Playoffs 2018/2019', 'Regular season 2018/2019',
       'Playoffs 2017/2018', 'Playoffs 2016/2017', 'Playoffs 2015/2016',
       'Playoffs 2011/2012', 'Regular season 2012/2013',
       'Playoffs 2014/2015', 'Playoffs 2012/2013', 'Playoffs 2013/2014'],
      dtype=object)

In [11]:
# Apparently, only the Nadezhda Cup that has run in 2012/2013 and 2013/2014 seasons.

players_season = players_season[~players_season['Season'].isin(['Nadezhda Cup 2012/2013', 'Nadezhda Cup 2013/2014'])]

# We still have players left with no games recorded in the official matches.

players_season = players_season[players_season['GP'] > 0]

# Separating the 'Season' column into the type of season and the years would allow us to more easily sort it.

players_season['Year'] = players_season['Season'].apply(lambda x: x[:-10])
players_season['Season'] = players_season['Season'].apply(lambda x: x[-9:])

# We should move the 'Role' and 'Year' columns to be after profile link and player name.

columns = players_season.columns

players_season = players_season[[col for col in columns[:2]] + ['Role', 'Year'] + [col for col in columns[2:-2]]]

# The current column names are not very informative, are they?

header = ['Profile', 'Player', 'Role', 'Year', 'Season', 'Team', 'Number', 'Games', 'Goals', 'Assists', 'Points', 'Plus_minus',
          'Plus', 'Minus', 'Penalties', 'Goals_even', 'Goals_powerplay', 'Goals_shorthanded', 'Goals_overtime',
          'Game_winning_goals', 'Game_winning_shootouts', 'Shots', 'Shots_percentage', 'Shots_game', 'Faceoffs', 'Faceoffs_won',
         'Faceoffs_percentage', 'Icetime_game', 'Shifts_game', 'Hits', 'Shots_blocked', 'Penalties_against', 'Wins',
          'Losses', 'Shootouts', 'Goals_against', 'Saves', 'Saves_percentage', 'Goals_against_average', 'Shutouts', 'Icetime']

players_season.columns = header

# Cleaning up the dataframe.

players_season = players_season.reset_index(drop=True)

In [12]:
players_season

Unnamed: 0,Profile,Player,Role,Year,Season,Team,Number,Games,Goals,Assists,...,Penalties_against,Wins,Losses,Shootouts,Goals_against,Saves,Saves_percentage,Goals_against_average,Shutouts,Icetime
0,https://en.khl.ru/players/16673/,Sergei Abramov,Skater,Regular season,2014/2015,Amur (Khabarovsk),93.0,13.0,1.0,0.0,...,1.0,,,,,,,,,
1,https://en.khl.ru/players/16673/,Sergei Abramov,Skater,Regular season,2013/2014,Amur (Khabarovsk),91.0,12.0,0.0,0.0,...,,,,,,,,,,
2,https://en.khl.ru/players/16462/,Maxim Alyapkin,Goalie,Regular season,2015/2016,Torpedo (Nizhny Novgorod Region),31.0,2.0,0.0,0.0,...,,1.0,1.0,0.0,3.0,10.0,76.9,2.98,0.0,60:25
3,https://en.khl.ru/players/16462/,Maxim Alyapkin,Goalie,Regular season,2014/2015,Torpedo (Nizhny Novgorod Region),31.0,1.0,0.0,0.0,...,,0.0,1.0,0.0,2.0,9.0,81.8,3.5,0.0,34:16
4,https://en.khl.ru/players/19200/,Dmitry Ambrozheichik,Skater,Regular season,2017/2018,Dinamo (Minsk),63.0,8.0,0.0,0.0,...,0.0,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
18523,https://en.khl.ru/players/16217/,Airat Ziazov,Skater,Regular season,2009/2010,Neftekhimik (Nizhnekamsk),79.0,1.0,0.0,0.0,...,,,,,,,,,,
18524,https://en.khl.ru/players/23656/,Tomislav Zanoski,Skater,Regular season,2016/2017,Medvescak (Zagreb),10.0,24.0,2.0,1.0,...,6.0,,,,,,,,,
18525,https://en.khl.ru/players/23656/,Tomislav Zanoski,Skater,Regular season,2015/2016,Medvescak (Zagreb),10.0,15.0,3.0,0.0,...,2.0,,,,,,,,,
18526,https://en.khl.ru/players/11543/,Alexander Zevakhin,Skater,Regular season,2009/2010,Severstal (Cherepovets),15.0,20.0,1.0,1.0,...,,,,,,,,,,


Can we now change the data from floats to integers? Not really.

Most of our columns still has many NaN values because different statistics are tracked for skaters and goalies. And integers do not like having NaN values in them. It could be worked around but such an approach is not necessarily the best one.

We could, of course, leave it as it is or replace missing values with zeros. However, analysing skaters and goalies together in the future sounds like a bad analysis design since the two groups are very distinct. Therefore, let us separate the data into two distinct dataframes and store skater statistics and goalie statistics separately. That way, we can also change floats into integers within each dataframe separately.

In [13]:
# Thankfully, we have a convenient column to separate on.

skaters_season = players_season[players_season['Role'] == 'Skater'].reset_index(drop=True)
goalies_season = players_season[players_season['Role'] == 'Goalie'].reset_index(drop=True)

In [14]:
# All columns either null or non-null!

skaters_season.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17053 entries, 0 to 17052
Data columns (total 41 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Profile                 17053 non-null  object 
 1   Player                  17053 non-null  object 
 2   Role                    17053 non-null  object 
 3   Year                    17053 non-null  object 
 4   Season                  17053 non-null  object 
 5   Team                    17053 non-null  object 
 6   Number                  16126 non-null  float64
 7   Games                   17053 non-null  float64
 8   Goals                   17053 non-null  float64
 9   Assists                 17053 non-null  float64
 10  Points                  17053 non-null  float64
 11  Plus_minus              17053 non-null  float64
 12  Plus                    17053 non-null  float64
 13  Minus                   17053 non-null  float64
 14  Penalties               17053 non-null

In [15]:
# And same here as well!

goalies_season.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1475 entries, 0 to 1474
Data columns (total 41 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Profile                 1475 non-null   object 
 1   Player                  1475 non-null   object 
 2   Role                    1475 non-null   object 
 3   Year                    1475 non-null   object 
 4   Season                  1475 non-null   object 
 5   Team                    1475 non-null   object 
 6   Number                  1405 non-null   float64
 7   Games                   1475 non-null   float64
 8   Goals                   1475 non-null   float64
 9   Assists                 1475 non-null   float64
 10  Points                  0 non-null      float64
 11  Plus_minus              0 non-null      float64
 12  Plus                    0 non-null      float64
 13  Minus                   0 non-null      float64
 14  Penalties               1475 non-null   

Finally we can remove the null columns and change the non-null ones to integers. Actually, in a few cases we would need to change the columns to floats for things such as '%SOG' (percentage of shots on goal that scored) which seem to be stored as objects right now.

At the same time, Some columns that we want to store as floats have '-' for their value, which cannot be converted into a float value. This is because they are obtained by dividing one statistics by another and one of the two may not be suitable for such operation. We are going to replace those values with NaN.

Moreover, the 'TOI/G' and 'TOI' columns are stored in the format 'minutes:seconds' and are thus not convertible to floats. A new column will be added for them, calculated as an integer value of seconds.

In [16]:
# We can copy paste parts of the previously created list of column names instead of typing them up manually.

print(header)

['Profile', 'Player', 'Role', 'Year', 'Season', 'Team', 'Number', 'Games', 'Goals', 'Assists', 'Points', 'Plus_minus', 'Plus', 'Minus', 'Penalties', 'Goals_even', 'Goals_powerplay', 'Goals_shorthanded', 'Goals_overtime', 'Game_winning_goals', 'Game_winning_shootouts', 'Shots', 'Shots_percentage', 'Shots_game', 'Faceoffs', 'Faceoffs_won', 'Faceoffs_percentage', 'Icetime_game', 'Shifts_game', 'Hits', 'Shots_blocked', 'Penalties_against', 'Wins', 'Losses', 'Shootouts', 'Goals_against', 'Saves', 'Saves_percentage', 'Goals_against_average', 'Shutouts', 'Icetime']


In [17]:
# Starting with the more numerous skaters.

skaters_season.drop(['Wins', 'Losses', 'Shootouts', 'Goals_against', 'Saves', 'Saves_percentage',
                     'Goals_against_average','Shutouts', 'Icetime'], axis=1, inplace=True)

# A list of columns to be changed into integers.

skaters_int = ['Games', 'Goals', 'Assists', 'Points', 'Plus_minus', 'Plus', 'Minus', 'Penalties', 'Goals_even',
               'Goals_powerplay', 'Goals_shorthanded', 'Goals_overtime', 'Game_winning_goals', 'Game_winning_shootouts',
               'Shots', 'Faceoffs', 'Faceoffs_won', 'Hits', 'Shots_blocked', 'Penalties_against']

skaters_season[skaters_int] = skaters_season[skaters_int].replace(np.NaN, 0).astype('int')

# A list of columns to be changed into floats.
# Remember, we need to fix the '-' symbol and cannot change the 'Icetime_game' column.

skaters_float = ['Shots_percentage', 'Shots_game', 'Faceoffs_percentage', 'Shifts_game']

skaters_season[skaters_float] = skaters_season[skaters_float].replace('-', np.NaN).astype('float')

# The player's number is currently stored as float, let us change it into object.

skaters_season['Number'] = skaters_season['Number'].astype('object')

# Finally, let us add the icetime in seconds. For icetime, we are okay with having zero values instead of NaN.
# The 'Icetime_game' column sometimes has '-' as its value, we will need to fix that first.

skaters_season['Icetime_game'] = skaters_season['Icetime_game'].replace('-', '0:00')

skaters_season['Icetime_game_seconds'] = skaters_season['Icetime_game'].apply(lambda x: int(x[:-3]) * 60 + int(x[-2:]))

# Moving the new column to be right after our existing icetime.

header_skaters = skaters_season.columns

skaters_season = skaters_season[[col for col in header_skaters[:28]] + ['Icetime_game_seconds'] +
                                [col for col in header_skaters[28:-1]]]

In [18]:
# Now for the goalies.

goalies_season.drop(['Points', 'Plus_minus', 'Plus', 'Minus', 'Goals_even', 'Goals_powerplay', 'Goals_shorthanded',
                     'Goals_overtime', 'Game_winning_goals', 'Game_winning_shootouts', 'Shots_percentage', 'Shots_game',
                     'Faceoffs', 'Faceoffs_won', 'Faceoffs_percentage', 'Icetime_game','Shifts_game', 'Hits',
                     'Shots_blocked', 'Penalties_against'], axis=1, inplace=True)

# A list of columns to be changed into integers.

goalies_int = ['Games', 'Goals', 'Assists', 'Penalties', 'Shots', 'Wins', 'Losses', 'Shootouts', 'Goals_against',
               'Saves', 'Shutouts']

goalies_season[goalies_int] = goalies_season[goalies_int].replace(np.NaN, 0).astype('int')

# A list of columns to be changed into floats
# We still need to fix the '-' symbol and cannot change the 'icetime' column.

goalies_float = ['Saves_percentage', 'Goals_against_average']

goalies_season[goalies_float] = goalies_season[goalies_float].replace('-', np.NaN).astype('float')

# Still need to change the player's number into an object.

goalies_season['Number'] = goalies_season['Number'].astype('object')

# At least we will not have to move the icetime in seconds now, since icetime is already at the end of the dataframe.

goalies_season['Icetime'] = goalies_season['Icetime'].replace('-', '0:00')

goalies_season['Icetime_seconds'] = goalies_season['Icetime'].apply(lambda x: int(x[:-3]) * 60 + int(x[-2:]))

In [19]:
skaters_season

Unnamed: 0,Profile,Player,Role,Year,Season,Team,Number,Games,Goals,Assists,...,Shots_game,Faceoffs,Faceoffs_won,Faceoffs_percentage,Icetime_game,Icetime_game_seconds,Shifts_game,Hits,Shots_blocked,Penalties_against
0,https://en.khl.ru/players/16673/,Sergei Abramov,Skater,Regular season,2014/2015,Amur (Khabarovsk),93.0,13,1,0,...,0.8,0,0,,6:57,417,9.3,1,2,1
1,https://en.khl.ru/players/16673/,Sergei Abramov,Skater,Regular season,2013/2014,Amur (Khabarovsk),91.0,12,0,0,...,1.2,1,0,0.0,6:15,375,8.0,0,0,0
2,https://en.khl.ru/players/19200/,Dmitry Ambrozheichik,Skater,Regular season,2017/2018,Dinamo (Minsk),63.0,8,0,0,...,0.8,0,0,,6:00,360,8.2,2,2,0
3,https://en.khl.ru/players/19200/,Dmitry Ambrozheichik,Skater,Regular season,2016/2017,Dinamo (Minsk),15.0,20,3,1,...,1.1,7,3,42.9,9:43,583,12.7,10,7,9
4,https://en.khl.ru/players/19200/,Dmitry Ambrozheichik,Skater,Regular season,2015/2016,Dinamo (Minsk),24.0,11,1,0,...,0.5,0,0,,4:43,283,8.5,1,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
17048,https://en.khl.ru/players/16217/,Airat Ziazov,Skater,Regular season,2009/2010,Neftekhimik (Nizhnekamsk),79.0,1,0,0,...,1.0,0,0,,4:15,255,7.0,0,0,0
17049,https://en.khl.ru/players/23656/,Tomislav Zanoski,Skater,Regular season,2016/2017,Medvescak (Zagreb),10.0,24,2,1,...,1.2,5,1,20.0,11:31,691,14.0,8,6,6
17050,https://en.khl.ru/players/23656/,Tomislav Zanoski,Skater,Regular season,2015/2016,Medvescak (Zagreb),10.0,15,3,0,...,0.7,0,0,,7:36,456,11.2,8,2,2
17051,https://en.khl.ru/players/11543/,Alexander Zevakhin,Skater,Regular season,2009/2010,Severstal (Cherepovets),15.0,20,1,1,...,1.0,3,0,0.0,10:07,607,12.7,0,0,0


In [20]:
goalies_season

Unnamed: 0,Profile,Player,Role,Year,Season,Team,Number,Games,Goals,Assists,...,Wins,Losses,Shootouts,Goals_against,Saves,Saves_percentage,Goals_against_average,Shutouts,Icetime,Icetime_seconds
0,https://en.khl.ru/players/16462/,Maxim Alyapkin,Goalie,Regular season,2015/2016,Torpedo (Nizhny Novgorod Region),31.0,2,0,0,...,1,1,0,3,10,76.9,2.98,0,60:25,3625
1,https://en.khl.ru/players/16462/,Maxim Alyapkin,Goalie,Regular season,2014/2015,Torpedo (Nizhny Novgorod Region),31.0,1,0,0,...,0,1,0,2,9,81.8,3.50,0,34:16,2056
2,https://en.khl.ru/players/16898/,Artyom Artemyev,Goalie,Regular season,2015/2016,Sochi (Sochi),31.0,1,0,0,...,0,0,0,3,9,75.0,3.36,0,53:38,3218
3,https://en.khl.ru/players/16898/,Artyom Artemyev,Goalie,Regular season,2014/2015,Atlant (Moscow Region),31.0,5,0,0,...,1,2,1,9,103,92.0,2.07,0,261:14,15674
4,https://en.khl.ru/players/16898/,Artyom Artemyev,Goalie,Regular season,2013/2014,Severstal (Cherepovets),52.0,2,0,0,...,0,0,0,3,13,81.2,8.89,0,20:15,1215
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1470,https://en.khl.ru/players/19184/,Artyom Zagidulin,Goalie,Regular season,2015/2016,Metallurg (Magnitogorsk),1.0,1,0,0,...,0,1,0,1,13,92.9,1.81,0,33:05,1985
1471,https://en.khl.ru/players/2820/,Sergei Zvyagin,Goalie,Regular season,2009/2010,Vityaz (Moscow Region),40.0,7,0,0,...,1,3,0,20,131,86.8,4.54,0,264:30,15870
1472,https://en.khl.ru/players/2820/,Sergei Zvyagin,Goalie,Playoffs,2008/2009,Lokomotiv (Yaroslavl),40.0,1,0,0,...,0,0,0,0,6,100.0,0.00,0,20:00,1200
1473,https://en.khl.ru/players/2820/,Sergei Zvyagin,Goalie,Regular season,2008/2009,Lokomotiv (Yaroslavl),40.0,5,0,0,...,3,1,0,7,77,91.7,1.81,0,231:36,13896


Everything seems to be in order, good job us! Now, for the best part.

In [21]:
skaters_season.to_csv('../data/skaters_season.csv', encoding='utf8', index=False)
goalies_season.to_csv('../data/goalies_season.csv', encoding='utf8', index=False)