## Data Visualisation: Players

Notebook for fast visualisation of data available in the `Player` and `PLayer attributes` tables of the dataset.

In [1]:
import pandas as pd
import numpy as np
from rich import print, pretty
pretty.install()

In [2]:
df_player = pd.read_csv("../data/interim/player.csv")
df_player_attributes = pd.read_csv("../data/interim/player_attributes.csv")

***

In [3]:
print(list(df_player.columns))

In [4]:
# Check if column id is a counter for the rows

print("Is the 'id' a row counter:", np.sum(df_player["id"].diff() != 1) == 0)
print("Is the 'id' column unique:", df_player["id"].is_unique)

We have 7 columns in the dataset.

* `id`: numeric unique index, not identical to row counter
* `player_api_id`: player id for some unknown api 
* `player_name`: string with players names. sometimes its their full name, sometimes how they're known on the pitch
* `player_fifa_api_id`: unique id for the EA FIFA game franchise API for the year of 2016
* `birthday`: date of birth, in Y-m-d H:M:S. the hours, minutes and seconds are all set to 00:00:00
* `height`: player height in centimeters (cm)
* `weight`: player weight in pounds (lb)

In [5]:
df_player.head()

Unnamed: 0,id,player_api_id,player_name,player_fifa_api_id,birthday,height,weight
0,1,505942,Aaron Appindangoye,218353,1992-02-29 00:00:00,182.88,187
1,2,155782,Aaron Cresswell,189615,1989-12-15 00:00:00,170.18,146
2,3,162549,Aaron Doran,186170,1991-05-13 00:00:00,170.18,163
3,4,30572,Aaron Galindo,140161,1982-05-08 00:00:00,182.88,198
4,5,23780,Aaron Hughes,17725,1979-11-08 00:00:00,182.88,154


In [6]:
df_player.describe()

Unnamed: 0,id,player_api_id,player_fifa_api_id,height,weight
count,11060.0,11060.0,11060.0,11060.0,11060.0
mean,5537.511392,156582.427215,165664.910488,181.867445,168.380289
std,3197.692647,160713.700624,58649.92836,6.369201,14.990217
min,1.0,2625.0,2.0,157.48,117.0
25%,2767.75,35555.5,151889.5,177.8,159.0
50%,5536.5,96619.5,184671.0,182.88,168.0
75%,8306.25,212470.5,203883.25,185.42,179.0
max,11075.0,750584.0,234141.0,208.28,243.0


In [7]:
df_player.shape

In [8]:
# get nan count
df_player.isna().sum()

In [9]:
print(list(df_player_attributes.columns))

In [10]:
df_player_attributes.head()

Unnamed: 0,id,player_fifa_api_id,player_api_id,date,overall_rating,potential,preferred_foot,attacking_work_rate,defensive_work_rate,crossing,...,vision,penalties,marking,standing_tackle,sliding_tackle,gk_diving,gk_handling,gk_kicking,gk_positioning,gk_reflexes
0,1,218353,505942,2016-02-18 00:00:00,67.0,71.0,right,medium,medium,49.0,...,54.0,48.0,65.0,69.0,69.0,6.0,11.0,10.0,8.0,8.0
1,2,218353,505942,2015-11-19 00:00:00,67.0,71.0,right,medium,medium,49.0,...,54.0,48.0,65.0,69.0,69.0,6.0,11.0,10.0,8.0,8.0
2,3,218353,505942,2015-09-21 00:00:00,62.0,66.0,right,medium,medium,49.0,...,54.0,48.0,65.0,66.0,69.0,6.0,11.0,10.0,8.0,8.0
3,4,218353,505942,2015-03-20 00:00:00,61.0,65.0,right,medium,medium,48.0,...,53.0,47.0,62.0,63.0,66.0,5.0,10.0,9.0,7.0,7.0
4,5,218353,505942,2007-02-22 00:00:00,61.0,65.0,right,medium,medium,48.0,...,53.0,47.0,62.0,63.0,66.0,5.0,10.0,9.0,7.0,7.0


In [11]:
df_player_attributes.shape

In [12]:
# get nan count
df_player_attributes.isna().sum()