# Player values
In this notebook we explore the available player value data. They are stored in a dataframe in pickle format.

In [1]:
import pandas as pd
values = pd.read_pickle('../data/players_for_ratings_based_predictions.pkl')

In [2]:
values.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9610 entries, 5665 to 160599
Data columns (total 14 columns):
 #   Column         Non-Null Count  Dtype         
---  ------         --------------  -----         
 0   player_id      9610 non-null   int32         
 1   player_name    9610 non-null   object        
 2   player_role    9607 non-null   category      
 3   birth          9610 non-null   datetime64[ns]
 4   nationalities  9610 non-null   object        
 5   height         9610 non-null   float64       
 6   foot           9610 non-null   object        
 7   value          9610 non-null   float64       
 8   league         9610 non-null   object        
 9   season         9610 non-null   int32         
 10  value_at       9610 non-null   datetime64[ns]
 11  nat1           9610 non-null   object        
 12  nat2           9610 non-null   object        
 13  age_years      9610 non-null   float64       
dtypes: category(1), datetime64[ns](2), float64(3), int32(2), object(6)


Here is a little explanation of the data contained in the `values` data frame.
- `player_id` is the TransferMarkt id of the player
- `player_name`, `player_role`, `height`, `foot` are self-explanatory
-  `birth` is their day of birth

In [7]:
values['birth'].describe()

  values['birth'].describe()


count                    9610
unique                   2616
top       1994-05-27 00:00:00
freq                       22
first     1977-01-02 00:00:00
last      2001-08-16 00:00:00
Name: birth, dtype: object

- `nationalities` is a string that contains one or more nationalities of the player
- `nat1` and `nat2` separate the possible multiple nationalities of the `nationalities` column

In [9]:
values.loc[:,['nationalities','nat1','nat2']].head()

Unnamed: 0,nationalities,nat1,nat2
5665,Serbia,Serbia,-
5667,Portugal,Portugal,-
5671,Spain,Spain,-
5674,Argentina-Italy,Argentina,Italy
5676,Portugal-CapeVerde,Portugal,CapeVerde


- `value` contains the market value of the player recorded at time `value_at`
- `league` and `season` are the league and season where the player is competing at time `value_at`. Season should refer to the second year of the season, but due to a bug in the recording of the data, this value is not always consistent, so it should not be used for predictions.

- `age_years` is the age of the player, in days, at time `value_at`

In [4]:
values.loc[:,['player_id','birth','value','value_at','season','league','age_years']].head()

Unnamed: 0,player_id,birth,value,value_at,season,league,age_years
5665,94308,1992-01-24,3.6,2018-06-30,2018,SPA1,27.430527
5667,139336,1992-01-08,2.7,2018-06-30,2018,SPA1,27.474333
5671,87469,1989-09-05,2.25,2018-06-30,2018,SPA1,29.815195
5674,266795,1993-03-27,2.25,2018-06-30,2018,SPA1,26.258727
5676,153427,1990-07-12,0.9,2018-06-30,2018,SPA1,28.966461
