## **1. SOCCER PLAYERS** <a id="1"></a>

<a><img style="float: right;" src="https://www.linkpicture.com/q/nigel-msipa-t5ny_JdGxJc-unsplash.jpg" width="300" /></a>
 



- Dataset source: https://www.kaggle.com/datasets/antoinekrajnc/soccer-players-statistics

### 1.2 Notebook Preparation <a id="1.2"></a>

This part of the notebook deals with the relevant library import and visual configuration.

In [98]:
# Import libraries

import pandas as pd
import numpy as np 
from scipy import stats

import matplotlib.pyplot as plt
import seaborn as sns
import plotly
import plotly.graph_objects as go
import plotly.express as px

from sklearn.preprocessing import StandardScaler

from sklearn.cluster import KMeans
from scipy.cluster.hierarchy import linkage
from scipy.cluster.hierarchy import dendrogram
from sklearn.metrics import silhouette_samples, silhouette_score

In [99]:
# Set notebook options

pd.set_option('precision',2)
pd.options.display.max_columns = 30

import warnings
warnings.filterwarnings("ignore")

## **2. Data Preparation** <a id="2"></a>

The below section provides an initial exploration of the data.

In [100]:
# Import the data as a DataFrame and check first 5 rows

df = pd.read_csv('soccer.csv', index_col=0)

df.head(5)

Unnamed: 0_level_0,Nationality,National_Position,National_Kit,Club,Club_Position,Club_Kit,Club_Joining,Contract_Expiry,Rating,Height,Weight,Preffered_Foot,Birth_Date,Age,Preffered_Position,...,Agility,Jumping,Heading,Shot_Power,Finishing,Long_Shots,Curve,Freekick_Accuracy,Penalties,Volleys,GK_Positioning,GK_Diving,GK_Kicking,GK_Handling,GK_Reflexes
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1
Cristiano Ronaldo,Portugal,LS,7.0,Real Madrid,LW,7.0,07/01/2009,2021.0,94,185 cm,80 kg,Right,02/05/1985,32,LW/ST,...,90,95,85,92,93,90,81,76,85,88,14,7,15,11,11
Lionel Messi,Argentina,RW,10.0,FC Barcelona,RW,10.0,07/01/2004,2018.0,93,170 cm,72 kg,Left,06/24/1987,29,RW,...,90,68,71,85,95,88,89,90,74,85,14,6,15,11,8
Neymar,Brazil,LW,10.0,FC Barcelona,LW,11.0,07/01/2013,2021.0,92,174 cm,68 kg,Right,02/05/1992,25,LW,...,96,61,62,78,89,77,79,84,81,83,15,9,15,9,11
Luis Suárez,Uruguay,LS,9.0,FC Barcelona,ST,9.0,07/11/2014,2021.0,92,182 cm,85 kg,Right,01/24/1987,30,ST,...,86,69,77,87,94,86,86,84,85,88,33,27,31,25,37
Manuel Neuer,Germany,GK,1.0,FC Bayern,GK,1.0,07/01/2011,2021.0,92,193 cm,92 kg,Right,03/27/1986,31,GK,...,52,78,25,25,13,16,14,11,47,11,91,89,95,90,89


In [101]:
# Check data types and if any records are missing

df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 17588 entries, Cristiano Ronaldo to Barry Richardson
Data columns (total 52 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Nationality         17588 non-null  object 
 1   National_Position   1075 non-null   object 
 2   National_Kit        1075 non-null   float64
 3   Club                17588 non-null  object 
 4   Club_Position       17587 non-null  object 
 5   Club_Kit            17587 non-null  float64
 6   Club_Joining        17587 non-null  object 
 7   Contract_Expiry     17587 non-null  float64
 8   Rating              17588 non-null  int64  
 9   Height              17588 non-null  object 
 10  Weight              17588 non-null  object 
 11  Preffered_Foot      17588 non-null  object 
 12  Birth_Date          17588 non-null  object 
 13  Age                 17588 non-null  int64  
 14  Preffered_Position  17588 non-null  object 
 15  Work_Rate           17588 non-n

- Seems we have missing records in our datasets. However, we can ignore the missing values for now because we are only interested in Age, Weight, and Height values in this dataset.

In [138]:
# We can count the missing values in each column of our dataset.

df.isnull().sum()

Nationality               0
National_Position     16513
National_Kit          16513
Club                      0
Club_Position             1
Club_Kit                  1
Club_Joining              1
Contract_Expiry           1
Rating                    0
Height                    0
Weight                    0
Preffered_Foot            0
Birth_Date                0
Age                       0
Preffered_Position        0
Work_Rate                 0
Weak_foot                 0
Skill_Moves               0
Ball_Control              0
Dribbling                 0
Marking                   0
Sliding_Tackle            0
Standing_Tackle           0
Aggression                0
Reactions                 0
Attacking_Position        0
Interceptions             0
Vision                    0
Composure                 0
Crossing                  0
Short_Pass                0
Long_Pass                 0
Acceleration              0
Speed                     0
Stamina                   0
Strength            

In [139]:
# Let us remove 'kg' from the values of weights of the soccer players 

df_Weight = df["Weight"].str.replace("kg", "")

In [140]:
# Let us convert the Weight column from string to integer

df_Weight = df_Weight.astype({'Weight':'int'})

In [141]:
# # Let us remove 'cm' from the values of heights of the soccer players

df_Height = df["Height"].str.replace("cm", "")

In [142]:
# Let us convert the Height column from string to integer

df_Height = df_Height.astype({'Height':'int'})

In [143]:
# Seems we have missing records. Let us check again using another method.

df.isnull().sum()

Nationality               0
National_Position     16513
National_Kit          16513
Club                      0
Club_Position             1
Club_Kit                  1
Club_Joining              1
Contract_Expiry           1
Rating                    0
Height                    0
Weight                    0
Preffered_Foot            0
Birth_Date                0
Age                       0
Preffered_Position        0
Work_Rate                 0
Weak_foot                 0
Skill_Moves               0
Ball_Control              0
Dribbling                 0
Marking                   0
Sliding_Tackle            0
Standing_Tackle           0
Aggression                0
Reactions                 0
Attacking_Position        0
Interceptions             0
Vision                    0
Composure                 0
Crossing                  0
Short_Pass                0
Long_Pass                 0
Acceleration              0
Speed                     0
Stamina                   0
Strength            

- Since we are only interested in the average age, average weight and average height, we can ignore the missing values values for now.

- Let us extract Age, Weight and Height information from the dataset

In [144]:
# Here, we extract the ages of soccer players from our dataset

df_Age = df['Age']

df_Age.head()

Name
Cristiano Ronaldo    32
Lionel Messi         29
Neymar               25
Luis Suárez          30
Manuel Neuer         31
Name: Age, dtype: int64

In [145]:
# Let us find the average age of soccer players

df_Age_mean = df_Age.mean()

df_Age_mean

25.460313850352513

In [146]:
df_Weight

Name
Cristiano Ronaldo    80
Lionel Messi         72
Neymar               68
Luis Suárez          85
Manuel Neuer         92
                     ..
Adam Dunbar          82
Dylan McGoey         80
Tommy Ouldridge      61
Mark Foden           80
Barry Richardson     77
Name: Weight, Length: 17588, dtype: int32

In [147]:
# Let us find the average weight of soccer players

df_Weight.mean()

75.25335455992722

In [148]:
df_Height

Name
Cristiano Ronaldo    185
Lionel Messi         170
Neymar               174
Luis Suárez          182
Manuel Neuer         193
                    ... 
Adam Dunbar          183
Dylan McGoey         185
Tommy Ouldridge      173
Mark Foden           180
Barry Richardson     185
Name: Height, Length: 17588, dtype: int32

In [149]:
# Let us find the average height of soccer players in 'cm'

df_Height.mean()

181.10546963838982

## **5. Conclusion** <a id="5"></a>

- We can establish here that the average age, average height, and average weight of Soccer players based on our data
  records are 25 years, 181.1 cm, and 75 kg