In [1]:
# Importing standard packages for data exploration and processing.

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

This notebook is going to focus on processing the players' career statistics only. The other three files will be processed in separate notebooks.

In [2]:
# Unlike in the Stage 1 notebooks, we are going to create new variables rather than perform the operations in-place here.
# The reason is that we might need to review the original data during processing.

raw_players_career = pd.read_csv('../raw_data/raw_players_career.csv')

In [3]:
# Does everything seem to be alright with the data?

raw_players_career

Unnamed: 0,URL,Player name,Season,Tournament / Team,№,GP,G,Assists,PTS,+/-,...,FOA,W,L,SOP,GA,Sv,%Sv,GAA,SO,TOI
0,https://en.khl.ru/players/16673/,Sergei Abramov,KHL Summary,Regular season:,,25.0,1.0,0.0,1.0,-4.0,...,1.0,,,,,,,,,
1,https://en.khl.ru/players/16673/,Sergei Abramov,KHL Summary,Nadezhda Cup:,,2.0,0.0,0.0,0.0,0.0,...,0.0,,,,,,,,,
2,https://en.khl.ru/players/16673/,Sergei Abramov,KHL Summary,KHL Total:,,25.0,1.0,0.0,1.0,-4.0,...,1.0,,,,,,,,,
3,https://en.khl.ru/players/16462/,Maxim Alyapkin,KHL Summary,Regular season:,,3.0,0.0,0.0,,,...,,1.0,2.0,0.0,5.0,19.0,79.2,3.17,0.0,94:41
4,https://en.khl.ru/players/16462/,Maxim Alyapkin,KHL Summary,KHL Total:,,3.0,0.0,0.0,,,...,,1.0,2.0,0.0,5.0,19.0,79.2,3.17,0.0,94:41
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8969,https://en.khl.ru/players/16217/,Airat Ziazov,KHL Summary,KHL Total:,,79.0,6.0,10.0,16.0,-4.0,...,6.0,,,,,,,,,
8970,https://en.khl.ru/players/23656/,Tomislav Zanoski,KHL Summary,Regular season:,,39.0,5.0,1.0,6.0,-7.0,...,8.0,,,,,,,,,
8971,https://en.khl.ru/players/23656/,Tomislav Zanoski,KHL Summary,KHL Total:,,39.0,5.0,1.0,6.0,-7.0,...,8.0,,,,,,,,,
8972,https://en.khl.ru/players/11543/,Alexander Zevakhin,KHL Summary,Regular season:,,64.0,4.0,7.0,11.0,-3.0,...,0.0,,,,,,,,,


We can already see that there are some issues with missing data and integers stored as floats.

In [4]:
# What would the summary tell us?

raw_players_career.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8974 entries, 0 to 8973
Data columns (total 39 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   URL                8974 non-null   object 
 1   Player name        8974 non-null   object 
 2   Season             8974 non-null   object 
 3   Tournament / Team  8974 non-null   object 
 4   №                  0 non-null      float64
 5   GP                 8974 non-null   float64
 6   G                  8974 non-null   float64
 7   Assists            8974 non-null   float64
 8   PTS                7859 non-null   float64
 9   +/-                7859 non-null   float64
 10  +                  7405 non-null   float64
 11  -                  7405 non-null   float64
 12  PIM                8974 non-null   float64
 13  ESG                7859 non-null   float64
 14  PPG                7859 non-null   float64
 15  SHG                7859 non-null   float64
 16  OTG                7859 

We can see that in many columns there is no missing data at all. At the same time, for other columns there is a clear separation into skaters (forwards and defencemen) and goalies.

For example, we can see that season statistics appears to have 7859 rows of data for skaters and 1113 rows for goalies, with a total of 8972 rows. However, there are 8974 rows in the dataframe so 2 rows seem to be unaccounted in either.

Let us find out who is messing up our data. Icetime seems like a good indicator since it must be present for all players who have recorded a match during that season and is stored differently for skaters (average icetime per match) and goalies (total icetime per season).

In [5]:
# We need the rows for which both icetime are null.

raw_players_career[raw_players_career['TOI/G'].isnull() & raw_players_career['TOI'].isnull()]

Unnamed: 0,URL,Player name,Season,Tournament / Team,№,GP,G,Assists,PTS,+/-,...,FOA,W,L,SOP,GA,Sv,%Sv,GAA,SO,TOI
887,https://en.khl.ru/players/29144/,David Boldizar,KHL Summary,Regular season:,,5.0,0.0,0.0,,,...,,,,,,,,,,
888,https://en.khl.ru/players/29144/,David Boldizar,KHL Summary,KHL Total:,,5.0,0.0,0.0,,,...,,,,,,,,,,


So, David Boldizar from Slovan (Bratislava) is the culprit. I wonder what is going on with him. Thankfully, we have added each player's profile link so we can easily check the original data, and it turns out that the data was not stored properly on the website to begin with.

In [6]:
# What do we know about that specific player?

raw_players_career[raw_players_career['URL'] == 'https://en.khl.ru/players/29144/'].info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2 entries, 887 to 888
Data columns (total 39 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   URL                2 non-null      object 
 1   Player name        2 non-null      object 
 2   Season             2 non-null      object 
 3   Tournament / Team  2 non-null      object 
 4   №                  0 non-null      float64
 5   GP                 2 non-null      float64
 6   G                  2 non-null      float64
 7   Assists            2 non-null      float64
 8   PTS                0 non-null      float64
 9   +/-                0 non-null      float64
 10  +                  0 non-null      float64
 11  -                  0 non-null      float64
 12  PIM                2 non-null      float64
 13  ESG                0 non-null      float64
 14  PPG                0 non-null      float64
 15  SHG                0 non-null      float64
 16  OTG                0 non-n

Most of the data is missing, and not because it is supposed to be a zero. After all, icetime cannot be zero. Therefore, we need to drop this player from our data altogether.

In [7]:
# Once again, we are using the profile link as primary key due to a possibility of matching names.

players_career = raw_players_career[raw_players_career['URL'] != 'https://en.khl.ru/players/29144/']

Now we can create a new column indicating whether a player is a skater or a goalie. Let us use the icetime for the separation.

In [8]:
# Total ice time is only tracked for goalies, so skaters are supposed to have it as null.

players_career['Role'] = np.where(players_career['TOI'].isnull(), 'Skater', 'Goalie')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  players_career['Role'] = np.where(players_career['TOI'].isnull(), 'Skater', 'Goalie')


In [9]:
players_career

Unnamed: 0,URL,Player name,Season,Tournament / Team,№,GP,G,Assists,PTS,+/-,...,W,L,SOP,GA,Sv,%Sv,GAA,SO,TOI,Role
0,https://en.khl.ru/players/16673/,Sergei Abramov,KHL Summary,Regular season:,,25.0,1.0,0.0,1.0,-4.0,...,,,,,,,,,,Skater
1,https://en.khl.ru/players/16673/,Sergei Abramov,KHL Summary,Nadezhda Cup:,,2.0,0.0,0.0,0.0,0.0,...,,,,,,,,,,Skater
2,https://en.khl.ru/players/16673/,Sergei Abramov,KHL Summary,KHL Total:,,25.0,1.0,0.0,1.0,-4.0,...,,,,,,,,,,Skater
3,https://en.khl.ru/players/16462/,Maxim Alyapkin,KHL Summary,Regular season:,,3.0,0.0,0.0,,,...,1.0,2.0,0.0,5.0,19.0,79.2,3.17,0.0,94:41,Goalie
4,https://en.khl.ru/players/16462/,Maxim Alyapkin,KHL Summary,KHL Total:,,3.0,0.0,0.0,,,...,1.0,2.0,0.0,5.0,19.0,79.2,3.17,0.0,94:41,Goalie
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8969,https://en.khl.ru/players/16217/,Airat Ziazov,KHL Summary,KHL Total:,,79.0,6.0,10.0,16.0,-4.0,...,,,,,,,,,,Skater
8970,https://en.khl.ru/players/23656/,Tomislav Zanoski,KHL Summary,Regular season:,,39.0,5.0,1.0,6.0,-7.0,...,,,,,,,,,,Skater
8971,https://en.khl.ru/players/23656/,Tomislav Zanoski,KHL Summary,KHL Total:,,39.0,5.0,1.0,6.0,-7.0,...,,,,,,,,,,Skater
8972,https://en.khl.ru/players/11543/,Alexander Zevakhin,KHL Summary,Regular season:,,64.0,4.0,7.0,11.0,-3.0,...,,,,,,,,,,Skater


We would like to fix the floats in columns where we know the values are supposed to be integers. You can't score 3.5 goals after all. However, we can have a problem here since converting data to another type requires that there is no NaN values.

At the same time, the off-season tournaments can mess up our data. If a player have only participated in the off-season tournaments, the rest of his career statistics would end up with NaN values since the off-season statistics is not included. Therefore, let us drop such players from the data despite going through the hassle of including them while scraping the data.

In [10]:
# What off-season tournaments do we have?

players_career['Tournament / Team'].unique()

array(['Regular season:', 'Nadezhda Cup:', 'KHL Total:', 'Playoffs:'],
      dtype=object)

In [11]:
# Apparently, only the Nadezhda Cup.

players_career = players_career[players_career['Tournament / Team'] != 'Nadezhda Cup:']

# We still have players left with no games recorded in the official matches.

players_career = players_career[players_career['GP'] > 0]

# The 'Season' and '№' columns are not saying us much the way they are.
# At the same time, the 'Tournament / Team' would be better without a colon at the end.

players_career.drop(['Season', '№'], axis=1, inplace=True)

players_career['Tournament / Team'] = players_career['Tournament / Team'].apply(lambda x: x[:-1])

# We should move the 'Role' column to be the third column in the dataframe, after profile link and player name.

columns = players_career.columns

players_career = players_career[[col for col in columns[:2]] + ['Role'] + [col for col in columns[2:-1]]]

# The current column names are not very informative, are they?

header = ['Profile', 'Player', 'Role', 'Season', 'Games', 'Goals', 'Assists', 'Points', 'Plus_minus', 'Plus', 'Minus',
         'Penalties', 'Goals_even', 'Goals_powerplay', 'Goals_shorthanded', 'Goals_overtime', 'Game_winning_goals',
         'Game_winning_shootouts', 'Shots', 'Shots_percentage', 'Shots_game', 'Faceoffs', 'Faceoffs_won',
         'Faceoffs_percentage', 'Icetime_game', 'Shifts_game', 'Hits', 'Shots_blocked', 'Penalties_against', 'Wins',
          'Losses', 'Shootouts', 'Goals_against', 'Saves', 'Saves_percentage', 'Goals_against_average', 'Shutouts', 'Icetime']

players_career.columns = header

# Cleaning up the dataframe.

players_career = players_career.reset_index(drop=True)

In [12]:
players_career

Unnamed: 0,Profile,Player,Role,Season,Games,Goals,Assists,Points,Plus_minus,Plus,...,Penalties_against,Wins,Losses,Shootouts,Goals_against,Saves,Saves_percentage,Goals_against_average,Shutouts,Icetime
0,https://en.khl.ru/players/16673/,Sergei Abramov,Skater,Regular season,25.0,1.0,0.0,1.0,-4.0,2.0,...,1.0,,,,,,,,,
1,https://en.khl.ru/players/16673/,Sergei Abramov,Skater,KHL Total,25.0,1.0,0.0,1.0,-4.0,2.0,...,1.0,,,,,,,,,
2,https://en.khl.ru/players/16462/,Maxim Alyapkin,Goalie,Regular season,3.0,0.0,0.0,,,,...,,1.0,2.0,0.0,5.0,19.0,79.2,3.17,0.0,94:41
3,https://en.khl.ru/players/16462/,Maxim Alyapkin,Goalie,KHL Total,3.0,0.0,0.0,,,,...,,1.0,2.0,0.0,5.0,19.0,79.2,3.17,0.0,94:41
4,https://en.khl.ru/players/19200/,Dmitry Ambrozheichik,Skater,Regular season,39.0,4.0,1.0,5.0,1.0,9.0,...,10.0,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8018,https://en.khl.ru/players/16217/,Airat Ziazov,Skater,KHL Total,79.0,6.0,10.0,16.0,-4.0,27.0,...,6.0,,,,,,,,,
8019,https://en.khl.ru/players/23656/,Tomislav Zanoski,Skater,Regular season,39.0,5.0,1.0,6.0,-7.0,15.0,...,8.0,,,,,,,,,
8020,https://en.khl.ru/players/23656/,Tomislav Zanoski,Skater,KHL Total,39.0,5.0,1.0,6.0,-7.0,15.0,...,8.0,,,,,,,,,
8021,https://en.khl.ru/players/11543/,Alexander Zevakhin,Skater,Regular season,64.0,4.0,7.0,11.0,-3.0,18.0,...,0.0,,,,,,,,,


Can we now change the data from floats to integers? Not really.

Most of our columns still has many NaN values because different statistics are tracked for skaters and goalies. And integers do not like having NaN values in them. It could be worked around but such an approach is not necessarily the best one.

We could, of course, leave it as it is or replace missing values with zeros. However, analysing skaters and goalies together in the future sounds like a bad analysis design since the two groups are very distinct. Therefore, let us separate the data into two distinct dataframes and store skater statistics and goalie statistics separately. That way, we can also change floats into integers within each dataframe separately.

In [13]:
# Thankfully, we have a convenient column to separate on.

skaters_career = players_career[players_career['Role'] == 'Skater'].reset_index(drop=True)
goalies_career = players_career[players_career['Role'] == 'Goalie'].reset_index(drop=True)

In [14]:
# All columns either null or non-null!

skaters_career.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7242 entries, 0 to 7241
Data columns (total 38 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Profile                 7242 non-null   object 
 1   Player                  7242 non-null   object 
 2   Role                    7242 non-null   object 
 3   Season                  7242 non-null   object 
 4   Games                   7242 non-null   float64
 5   Goals                   7242 non-null   float64
 6   Assists                 7242 non-null   float64
 7   Points                  7242 non-null   float64
 8   Plus_minus              7242 non-null   float64
 9   Plus                    7242 non-null   float64
 10  Minus                   7242 non-null   float64
 11  Penalties               7242 non-null   float64
 12  Goals_even              7242 non-null   float64
 13  Goals_powerplay         7242 non-null   float64
 14  Goals_shorthanded       7242 non-null   

In [15]:
# And same here as well!

goalies_career.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 781 entries, 0 to 780
Data columns (total 38 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Profile                 781 non-null    object 
 1   Player                  781 non-null    object 
 2   Role                    781 non-null    object 
 3   Season                  781 non-null    object 
 4   Games                   781 non-null    float64
 5   Goals                   781 non-null    float64
 6   Assists                 781 non-null    float64
 7   Points                  0 non-null      float64
 8   Plus_minus              0 non-null      float64
 9   Plus                    0 non-null      float64
 10  Minus                   0 non-null      float64
 11  Penalties               781 non-null    float64
 12  Goals_even              0 non-null      float64
 13  Goals_powerplay         0 non-null      float64
 14  Goals_shorthanded       0 non-null      fl

Finally we can remove the null columns and change the non-null ones to integers. Actually, in a few cases we would need to change the columns to floats for things such as '%SOG' (percentage of shots on goal that scored) which seem to be stored as objects right now.

At the same time, Some columns that we want to store as floats have '-' for their value, which cannot be converted into a float value. This is because they are obtained by dividing one statistics by another and one of the two may not be suitable for such operation. We are going to replace those values with NaN.

Moreover, the 'TOI/G' and 'TOI' columns are stored in the format 'minutes:seconds' and are thus not convertible to floats. A new column will be added for them, calculated as an integer value of seconds.

In [16]:
# We can copy paste parts of the previously created list of column names instead of typing them up manually.

print(header)

['Profile', 'Player', 'Role', 'Season', 'Games', 'Goals', 'Assists', 'Points', 'Plus_minus', 'Plus', 'Minus', 'Penalties', 'Goals_even', 'Goals_powerplay', 'Goals_shorthanded', 'Goals_overtime', 'Game_winning_goals', 'Game_winning_shootouts', 'Shots', 'Shots_percentage', 'Shots_game', 'Faceoffs', 'Faceoffs_won', 'Faceoffs_percentage', 'Icetime_game', 'Shifts_game', 'Hits', 'Shots_blocked', 'Penalties_against', 'Wins', 'Losses', 'Shootouts', 'Goals_against', 'Saves', 'Saves_percentage', 'Goals_against_average', 'Shutouts', 'Icetime']


In [17]:
# Starting with the more numerous skaters.

skaters_career.drop(['Wins', 'Losses', 'Shootouts', 'Goals_against', 'Saves', 'Saves_percentage',
                     'Goals_against_average','Shutouts', 'Icetime'], axis=1, inplace=True)

# A list of columns to be changed into integers.

skaters_int = ['Games', 'Goals', 'Assists', 'Points', 'Plus_minus', 'Plus', 'Minus', 'Penalties', 'Goals_even',
               'Goals_powerplay', 'Goals_shorthanded', 'Goals_overtime', 'Game_winning_goals', 'Game_winning_shootouts',
               'Shots', 'Faceoffs', 'Faceoffs_won', 'Hits', 'Shots_blocked', 'Penalties_against']

skaters_career[skaters_int] = skaters_career[skaters_int].astype('int')

# A list of columns to be changed into floats.
# Remember, we need to fix the '-' symbol and cannot change the 'Icetime_game' column.

skaters_float = ['Shots_percentage', 'Shots_game', 'Faceoffs_percentage', 'Shifts_game']

skaters_career[skaters_float] = skaters_career[skaters_float].replace('-', np.NaN).astype('float')

# Finally, let us add the icetime in seconds. For icetime, we are okay with having zero values instead of NaN.
# The 'Icetime_game' column sometimes has '-' as its value, we will need to fix that first.

skaters_career['Icetime_game'] = skaters_career['Icetime_game'].replace('-', '0:00')

skaters_career['Icetime_game_seconds'] = skaters_career['Icetime_game'].apply(lambda x: int(x[:-3]) * 60 + int(x[-2:]))

# Moving the new column to be right after our existing icetime.

header_skaters = skaters_career.columns

skaters_career = skaters_career[[col for col in header_skaters[:25]] + ['Icetime_game_seconds'] +
                                [col for col in header_skaters[25:-1]]]

In [18]:
# Now for the goalies.

goalies_career.drop(['Points', 'Plus_minus', 'Plus', 'Minus', 'Goals_even', 'Goals_powerplay', 'Goals_shorthanded',
                     'Goals_overtime', 'Game_winning_goals', 'Game_winning_shootouts', 'Shots_percentage', 'Shots_game',
                     'Faceoffs', 'Faceoffs_won', 'Faceoffs_percentage', 'Icetime_game','Shifts_game', 'Hits',
                     'Shots_blocked', 'Penalties_against'], axis=1, inplace=True)

# A list of columns to be changed into integers.

goalies_int = ['Games', 'Goals', 'Assists', 'Penalties', 'Shots', 'Wins', 'Losses', 'Shootouts', 'Goals_against',
               'Saves', 'Shutouts']

goalies_career[goalies_int] = goalies_career[goalies_int].astype('int')

# A list of columns to be changed into floats
# We still need to fix the '-' symbol and cannot change the 'icetime' column.

goalies_float = ['Saves_percentage', 'Goals_against_average']

goalies_career[goalies_float] = goalies_career[goalies_float].replace('-', np.NaN).astype('float')

# At least we will not have to move the icetime in seconds now, since icetime is already at the end of the dataframe.

goalies_career['Icetime'] = goalies_career['Icetime'].replace('-', '0:00')

goalies_career['Icetime_seconds'] = goalies_career['Icetime'].apply(lambda x: int(x[:-3]) * 60 + int(x[-2:]))

In [19]:
skaters_career

Unnamed: 0,Profile,Player,Role,Season,Games,Goals,Assists,Points,Plus_minus,Plus,...,Shots_game,Faceoffs,Faceoffs_won,Faceoffs_percentage,Icetime_game,Icetime_game_seconds,Shifts_game,Hits,Shots_blocked,Penalties_against
0,https://en.khl.ru/players/16673/,Sergei Abramov,Skater,Regular season,25,1,0,1,-4,2,...,1.0,1,0,0.0,6:37,397,8.7,1,2,1
1,https://en.khl.ru/players/16673/,Sergei Abramov,Skater,KHL Total,25,1,0,1,-4,2,...,1.0,1,0,0.0,6:37,397,8.7,1,2,1
2,https://en.khl.ru/players/19200/,Dmitry Ambrozheichik,Skater,Regular season,39,4,1,5,1,9,...,0.8,7,3,42.9,7:33,453,10.6,13,9,10
3,https://en.khl.ru/players/19200/,Dmitry Ambrozheichik,Skater,KHL Total,39,4,1,5,1,9,...,0.8,7,3,42.9,7:33,453,10.6,13,9,10
4,https://en.khl.ru/players/13714/,Vitaly Anikeyenko,Skater,Regular season,144,14,35,49,45,121,...,1.8,3,0,0.0,20:15,1215,22.8,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7237,https://en.khl.ru/players/16217/,Airat Ziazov,Skater,KHL Total,79,6,10,16,-4,27,...,1.3,771,356,46.2,11:06,666,14.9,7,5,6
7238,https://en.khl.ru/players/23656/,Tomislav Zanoski,Skater,Regular season,39,5,1,6,-7,15,...,1.0,5,1,20.0,10:01,601,12.9,16,8,8
7239,https://en.khl.ru/players/23656/,Tomislav Zanoski,Skater,KHL Total,39,5,1,6,-7,15,...,1.0,5,1,20.0,10:01,601,12.9,16,8,8
7240,https://en.khl.ru/players/11543/,Alexander Zevakhin,Skater,Regular season,64,4,7,11,-3,18,...,1.1,49,18,36.7,12:04,724,14.7,0,0,0


In [20]:
goalies_career

Unnamed: 0,Profile,Player,Role,Season,Games,Goals,Assists,Penalties,Shots,Wins,Losses,Shootouts,Goals_against,Saves,Saves_percentage,Goals_against_average,Shutouts,Icetime,Icetime_seconds
0,https://en.khl.ru/players/16462/,Maxim Alyapkin,Goalie,Regular season,3,0,0,0,24,1,2,0,5,19,79.2,3.17,0,94:41,5681
1,https://en.khl.ru/players/16462/,Maxim Alyapkin,Goalie,KHL Total,3,0,0,0,24,1,2,0,5,19,79.2,3.17,0,94:41,5681
2,https://en.khl.ru/players/16898/,Artyom Artemyev,Goalie,Regular season,8,0,0,0,140,1,2,1,15,125,89.3,2.69,0,335:07,20107
3,https://en.khl.ru/players/16898/,Artyom Artemyev,Goalie,KHL Total,8,0,0,0,140,1,2,1,15,125,89.3,2.69,0,335:07,20107
4,https://en.khl.ru/players/14909/,Danila Alistratov,Goalie,Regular season,130,0,2,4,3111,41,54,13,312,2799,90.0,2.89,5,6488:11,389291
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
776,https://en.khl.ru/players/2820/,Sergei Zvyagin,Goalie,Regular season,12,0,0,0,235,4,4,0,27,208,88.5,3.27,0,496:06,29766
777,https://en.khl.ru/players/2820/,Sergei Zvyagin,Goalie,Playoffs,1,0,0,0,6,0,0,0,0,6,100.0,0.00,0,20:00,1200
778,https://en.khl.ru/players/2820/,Sergei Zvyagin,Goalie,KHL Total,13,0,0,0,241,4,4,0,27,214,88.8,3.14,0,516:06,30966
779,https://en.khl.ru/players/14930/,Alexander Zalivin,Goalie,Regular season,6,0,0,0,178,0,3,2,16,162,91.0,3.10,0,309:40,18580


Everything seems to be in order, good job us! Now, for the best part.

In [21]:
skaters_career.to_csv('../data/skaters_career.csv', encoding='utf8', index=False)
goalies_career.to_csv('../data/goalies_career.csv', encoding='utf8', index=False)