# DATA CLEANING CHALLENGE 

*Executed By Adeyemi Oluwaseyi*

[Click here to check my Linkedin profile](https://www.linkedin.com/in/oluwaseyi-adeyemi-33b1ab197/)

[Twitter link](https://twitter.com/AmLegendseyi)

## Introduction 

DataCleaning challenge provides an opportunity for every Data Analyst at all levels of expertise (beginner, intermediate,  or even expert) to build a portfolio-worthy project that can be shared with recruiters.

The challenge also provides an avenue for Data Analysts to meet with fellow learners and build a great network.   

#### What to look out for when cleaning your data

 - Incorrect data types
 - Null entries
 - Missing values
 - Duplicate entries
 - Errors in spellings and values
 - Wrong calculations across rows and columns
 - Irrelevant data 
 - Outliers 

First, importing the necessary libraries required to perform the above operations

 - ## Importing Libraries

In [2]:
#Importing Libraries 
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns


In [3]:
# read the data set into dataframe using the pandas library
pd.set_option('display.max_columns', None)
data = pd.read_csv('fifa21 raw data v2.csv', skip_blank_lines=True, low_memory=False,)


Lets view our dataframe and observe the data

 - ## Observing The Dataset

## Some of the columns are abbreviated the meaning can be found [here](dataset_dictionary.txt). 

In [4]:
# previewing the first 10 rows of the dataset
data.head(10)

Unnamed: 0,ID,Name,LongName,photoUrl,playerUrl,Nationality,Age,↓OVA,POT,Club,Contract,Positions,Height,Weight,Preferred Foot,BOV,Best Position,Joined,Loan Date End,Value,Wage,Release Clause,Attacking,Crossing,Finishing,Heading Accuracy,Short Passing,Volleys,Skill,Dribbling,Curve,FK Accuracy,Long Passing,Ball Control,Movement,Acceleration,Sprint Speed,Agility,Reactions,Balance,Power,Shot Power,Jumping,Stamina,Strength,Long Shots,Mentality,Aggression,Interceptions,Positioning,Vision,Penalties,Composure,Defending,Marking,Standing Tackle,Sliding Tackle,Goalkeeping,GK Diving,GK Handling,GK Kicking,GK Positioning,GK Reflexes,Total Stats,Base Stats,W/F,SM,A/W,D/W,IR,PAC,SHO,PAS,DRI,DEF,PHY,Hits
0,158023,L. Messi,Lionel Messi,https://cdn.sofifa.com/players/158/023/21_60.png,http://sofifa.com/player/158023/lionel-messi/2...,Argentina,33,93,93,\n\n\n\nFC Barcelona,2004 ~ 2021,"RW, ST, CF",170cm,72kg,Left,93,RW,"Jul 1, 2004",,€103.5M,€560K,€138.4M,429,85,95,70,91,88,470,96,93,94,91,96,451,91,80,91,94,95,389,86,68,72,69,94,347,44,40,93,95,75,96,91,32,35,24,54,6,11,15,14,8,2231,466,4 ★,4★,Medium,Low,5 ★,85,92,91,95,38,65,771
1,20801,Cristiano Ronaldo,C. Ronaldo dos Santos Aveiro,https://cdn.sofifa.com/players/020/801/21_60.png,http://sofifa.com/player/20801/c-ronaldo-dos-s...,Portugal,35,92,92,\n\n\n\nJuventus,2018 ~ 2022,"ST, LW",187cm,83kg,Right,92,ST,"Jul 10, 2018",,€63M,€220K,€75.9M,437,84,95,90,82,86,414,88,81,76,77,92,431,87,91,87,95,71,444,94,95,84,78,93,353,63,29,95,82,84,95,84,28,32,24,58,7,11,15,14,11,2221,464,4 ★,5★,High,Low,5 ★,89,93,81,89,35,77,562
2,200389,J. Oblak,Jan Oblak,https://cdn.sofifa.com/players/200/389/21_60.png,http://sofifa.com/player/200389/jan-oblak/210006/,Slovenia,27,91,93,\n\n\n\nAtlético Madrid,2014 ~ 2023,GK,188cm,87kg,Right,91,GK,"Jul 16, 2014",,€120M,€125K,€159.4M,95,13,11,15,43,13,109,12,13,14,40,30,307,43,60,67,88,49,268,59,78,41,78,12,140,34,19,11,65,11,68,57,27,12,18,437,87,92,78,90,90,1413,489,3 ★,1★,Medium,Medium,3 ★,87,92,78,90,52,90,150
3,192985,K. De Bruyne,Kevin De Bruyne,https://cdn.sofifa.com/players/192/985/21_60.png,http://sofifa.com/player/192985/kevin-de-bruyn...,Belgium,29,91,91,\n\n\n\nManchester City,2015 ~ 2023,"CAM, CM",181cm,70kg,Right,91,CAM,"Aug 30, 2015",,€129M,€370K,€161M,407,94,82,55,94,82,441,88,85,83,93,92,398,77,76,78,91,76,408,91,63,89,74,91,408,76,66,88,94,84,91,186,68,65,53,56,15,13,5,10,13,2304,485,5 ★,4★,High,High,4 ★,76,86,93,88,64,78,207
4,190871,Neymar Jr,Neymar da Silva Santos Jr.,https://cdn.sofifa.com/players/190/871/21_60.png,http://sofifa.com/player/190871/neymar-da-silv...,Brazil,28,91,91,\n\n\n\nParis Saint-Germain,2017 ~ 2022,"LW, CAM",175cm,68kg,Right,91,LW,"Aug 3, 2017",,€132M,€270K,€166.5M,408,85,87,62,87,87,448,95,88,89,81,95,453,94,89,96,91,83,357,80,62,81,50,84,356,51,36,87,90,92,93,94,35,30,29,59,9,9,15,15,11,2175,451,5 ★,5★,High,Medium,5 ★,91,85,86,94,36,59,595
5,188545,R. Lewandowski,Robert Lewandowski,https://cdn.sofifa.com/players/188/545/21_60.png,http://sofifa.com/player/188545/robert-lewando...,Poland,31,91,91,\n\n\n\nFC Bayern München,2014 ~ 2023,ST,184cm,80kg,Right,91,ST,"Jul 1, 2014",,€111M,€240K,€132M,423,71,94,85,84,89,407,85,79,85,70,88,407,77,78,77,93,82,420,89,84,76,86,85,391,81,49,94,79,88,88,96,35,42,19,51,15,6,12,8,10,2195,457,4 ★,4★,High,Medium,4 ★,78,91,78,85,43,82,248
6,209331,M. Salah,Mohamed Salah,https://cdn.sofifa.com/players/209/331/21_60.png,http://sofifa.com/player/209331/mohamed-salah/...,Egypt,28,90,90,\n\n\n\nLiverpool,2017 ~ 2023,RW,175cm,71kg,Left,90,RW,"Jul 1, 2017",,€120.5M,€250K,€144.3M,392,79,91,59,84,79,406,90,83,69,75,89,460,94,92,91,92,91,393,80,69,85,75,84,376,63,55,91,84,83,90,122,38,43,41,62,14,14,9,11,14,2211,470,3 ★,4★,High,Medium,3 ★,93,86,81,90,45,75,246
7,212831,Alisson,Alisson Ramses Becker,https://cdn.sofifa.com/players/212/831/21_60.png,http://sofifa.com/player/212831/alisson-ramses...,Brazil,27,90,91,\n\n\n\nLiverpool,2018 ~ 2024,GK,191cm,91kg,Right,90,GK,"Jul 19, 2018",,€102M,€160K,€120.3M,114,17,13,19,45,20,138,27,19,18,44,30,268,56,47,40,88,37,240,64,52,32,78,14,140,27,11,13,66,23,65,50,15,19,16,439,86,88,85,91,89,1389,490,3 ★,1★,Medium,Medium,3 ★,86,88,85,89,51,91,120
8,231747,K. Mbappé,Kylian Mbappé,https://cdn.sofifa.com/players/231/747/21_60.png,http://sofifa.com/player/231747/kylian-mbappe/...,France,21,90,95,\n\n\n\nParis Saint-Germain,2018 ~ 2022,"ST, LW, RW",178cm,73kg,Right,91,ST,"Jul 1, 2018",,€185.5M,€160K,€203.1M,408,78,91,73,83,83,394,92,79,63,70,90,458,96,96,92,92,82,404,86,77,86,76,79,341,62,38,91,80,70,84,100,34,34,32,42,13,5,7,11,6,2147,466,4 ★,5★,High,Low,3 ★,96,86,78,91,39,76,1.6K
9,192448,M. ter Stegen,Marc-André ter Stegen,https://cdn.sofifa.com/players/192/448/21_60.png,http://sofifa.com/player/192448/marc-andre-ter...,Germany,28,90,93,\n\n\n\nFC Barcelona,2014 ~ 2022,GK,187cm,85kg,Right,90,GK,"Jul 1, 2014",,€110M,€260K,€147.7M,118,18,14,11,61,14,144,21,18,12,63,30,254,38,50,37,86,43,268,66,79,35,78,10,171,43,22,11,70,25,70,48,25,13,10,439,88,85,88,88,90,1442,484,4 ★,1★,Medium,Medium,3 ★,88,85,88,90,45,88,130


In [5]:
# to see the numbers of rows and columns we have in our dataset
data.shape

(18979, 77)

we have 18979 rows and 77 columns in the dataset

lets see the names of the columns we have

In [6]:
# the preview the name of the columns we have in the dataset
data.columns

Index(['ID', 'Name', 'LongName', 'photoUrl', 'playerUrl', 'Nationality', 'Age',
       '↓OVA', 'POT', 'Club', 'Contract', 'Positions', 'Height', 'Weight',
       'Preferred Foot', 'BOV', 'Best Position', 'Joined', 'Loan Date End',
       'Value', 'Wage', 'Release Clause', 'Attacking', 'Crossing', 'Finishing',
       'Heading Accuracy', 'Short Passing', 'Volleys', 'Skill', 'Dribbling',
       'Curve', 'FK Accuracy', 'Long Passing', 'Ball Control', 'Movement',
       'Acceleration', 'Sprint Speed', 'Agility', 'Reactions', 'Balance',
       'Power', 'Shot Power', 'Jumping', 'Stamina', 'Strength', 'Long Shots',
       'Mentality', 'Aggression', 'Interceptions', 'Positioning', 'Vision',
       'Penalties', 'Composure', 'Defending', 'Marking', 'Standing Tackle',
       'Sliding Tackle', 'Goalkeeping', 'GK Diving', 'GK Handling',
       'GK Kicking', 'GK Positioning', 'GK Reflexes', 'Total Stats',
       'Base Stats', 'W/F', 'SM', 'A/W', 'D/W', 'IR', 'PAC', 'SHO', 'PAS',
       'DRI', 'DEF', 

##### Obeservation
It was observed that some columns do not follow the standard naming convention, some of the column names where separate with space instead of " _ " 

In [7]:
# renaming the columns and replacing white spaces with an underscore
data.columns = [col.replace(" ", "_") for col in data.columns]

lets see if our change is applied

In [8]:
# previewing the column once again to see if there is any further adjustment needed
data.columns

Index(['ID', 'Name', 'LongName', 'photoUrl', 'playerUrl', 'Nationality', 'Age',
       '↓OVA', 'POT', 'Club', 'Contract', 'Positions', 'Height', 'Weight',
       'Preferred_Foot', 'BOV', 'Best_Position', 'Joined', 'Loan_Date_End',
       'Value', 'Wage', 'Release_Clause', 'Attacking', 'Crossing', 'Finishing',
       'Heading_Accuracy', 'Short_Passing', 'Volleys', 'Skill', 'Dribbling',
       'Curve', 'FK_Accuracy', 'Long_Passing', 'Ball_Control', 'Movement',
       'Acceleration', 'Sprint_Speed', 'Agility', 'Reactions', 'Balance',
       'Power', 'Shot_Power', 'Jumping', 'Stamina', 'Strength', 'Long_Shots',
       'Mentality', 'Aggression', 'Interceptions', 'Positioning', 'Vision',
       'Penalties', 'Composure', 'Defending', 'Marking', 'Standing_Tackle',
       'Sliding_Tackle', 'Goalkeeping', 'GK_Diving', 'GK_Handling',
       'GK_Kicking', 'GK_Positioning', 'GK_Reflexes', 'Total_Stats',
       'Base_Stats', 'W/F', 'SM', 'A/W', 'D/W', 'IR', 'PAC', 'SHO', 'PAS',
       'DRI', 'DEF', 

Lets see some other features of the data

The information method contains the number of columns, column labels, column data types, memory usage, range index, and the number of cells in each column (non-null values).

In [9]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18979 entries, 0 to 18978
Data columns (total 77 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   ID                18979 non-null  int64 
 1   Name              18979 non-null  object
 2   LongName          18979 non-null  object
 3   photoUrl          18979 non-null  object
 4   playerUrl         18979 non-null  object
 5   Nationality       18979 non-null  object
 6   Age               18979 non-null  int64 
 7   ↓OVA              18979 non-null  int64 
 8   POT               18979 non-null  int64 
 9   Club              18979 non-null  object
 10  Contract          18979 non-null  object
 11  Positions         18979 non-null  object
 12  Height            18979 non-null  object
 13  Weight            18979 non-null  object
 14  Preferred_Foot    18979 non-null  object
 15  BOV               18979 non-null  int64 
 16  Best_Position     18979 non-null  object
 17  Joined      

Some of the columns are abbreviated the meaning can be found [here](dataset_dictionary.txt). 

Some columns name are not well represented, lets fix that

#### Observation
* The OVA column is not well represented
* The Height and Weight column are of the object type
* Some columns like Loan_Date_End, Hits contains NaN (Null) values

 - ## Cleaning The Data

In [10]:
#the OVA columns is not well spelt
# renaming that particular column is ideal
data.rename(columns = {'↓OVA': 'OVA'}, inplace = True)

In [11]:
# Previewing the columns again to check if the change is applied
data.columns

Index(['ID', 'Name', 'LongName', 'photoUrl', 'playerUrl', 'Nationality', 'Age',
       'OVA', 'POT', 'Club', 'Contract', 'Positions', 'Height', 'Weight',
       'Preferred_Foot', 'BOV', 'Best_Position', 'Joined', 'Loan_Date_End',
       'Value', 'Wage', 'Release_Clause', 'Attacking', 'Crossing', 'Finishing',
       'Heading_Accuracy', 'Short_Passing', 'Volleys', 'Skill', 'Dribbling',
       'Curve', 'FK_Accuracy', 'Long_Passing', 'Ball_Control', 'Movement',
       'Acceleration', 'Sprint_Speed', 'Agility', 'Reactions', 'Balance',
       'Power', 'Shot_Power', 'Jumping', 'Stamina', 'Strength', 'Long_Shots',
       'Mentality', 'Aggression', 'Interceptions', 'Positioning', 'Vision',
       'Penalties', 'Composure', 'Defending', 'Marking', 'Standing_Tackle',
       'Sliding_Tackle', 'Goalkeeping', 'GK_Diving', 'GK_Handling',
       'GK_Kicking', 'GK_Positioning', 'GK_Reflexes', 'Total_Stats',
       'Base_Stats', 'W/F', 'SM', 'A/W', 'D/W', 'IR', 'PAC', 'SHO', 'PAS',
       'DRI', 'DEF', '

Now, the column has been successfully renamed

Going back to the data.info() to observe the dataset

We also observed some of the columns contains NAN values or empty values

In [12]:
# Preview the whole dataset for further observation
data.head(5)

Unnamed: 0,ID,Name,LongName,photoUrl,playerUrl,Nationality,Age,OVA,POT,Club,Contract,Positions,Height,Weight,Preferred_Foot,BOV,Best_Position,Joined,Loan_Date_End,Value,Wage,Release_Clause,Attacking,Crossing,Finishing,Heading_Accuracy,Short_Passing,Volleys,Skill,Dribbling,Curve,FK_Accuracy,Long_Passing,Ball_Control,Movement,Acceleration,Sprint_Speed,Agility,Reactions,Balance,Power,Shot_Power,Jumping,Stamina,Strength,Long_Shots,Mentality,Aggression,Interceptions,Positioning,Vision,Penalties,Composure,Defending,Marking,Standing_Tackle,Sliding_Tackle,Goalkeeping,GK_Diving,GK_Handling,GK_Kicking,GK_Positioning,GK_Reflexes,Total_Stats,Base_Stats,W/F,SM,A/W,D/W,IR,PAC,SHO,PAS,DRI,DEF,PHY,Hits
0,158023,L. Messi,Lionel Messi,https://cdn.sofifa.com/players/158/023/21_60.png,http://sofifa.com/player/158023/lionel-messi/2...,Argentina,33,93,93,\n\n\n\nFC Barcelona,2004 ~ 2021,"RW, ST, CF",170cm,72kg,Left,93,RW,"Jul 1, 2004",,€103.5M,€560K,€138.4M,429,85,95,70,91,88,470,96,93,94,91,96,451,91,80,91,94,95,389,86,68,72,69,94,347,44,40,93,95,75,96,91,32,35,24,54,6,11,15,14,8,2231,466,4 ★,4★,Medium,Low,5 ★,85,92,91,95,38,65,771
1,20801,Cristiano Ronaldo,C. Ronaldo dos Santos Aveiro,https://cdn.sofifa.com/players/020/801/21_60.png,http://sofifa.com/player/20801/c-ronaldo-dos-s...,Portugal,35,92,92,\n\n\n\nJuventus,2018 ~ 2022,"ST, LW",187cm,83kg,Right,92,ST,"Jul 10, 2018",,€63M,€220K,€75.9M,437,84,95,90,82,86,414,88,81,76,77,92,431,87,91,87,95,71,444,94,95,84,78,93,353,63,29,95,82,84,95,84,28,32,24,58,7,11,15,14,11,2221,464,4 ★,5★,High,Low,5 ★,89,93,81,89,35,77,562
2,200389,J. Oblak,Jan Oblak,https://cdn.sofifa.com/players/200/389/21_60.png,http://sofifa.com/player/200389/jan-oblak/210006/,Slovenia,27,91,93,\n\n\n\nAtlético Madrid,2014 ~ 2023,GK,188cm,87kg,Right,91,GK,"Jul 16, 2014",,€120M,€125K,€159.4M,95,13,11,15,43,13,109,12,13,14,40,30,307,43,60,67,88,49,268,59,78,41,78,12,140,34,19,11,65,11,68,57,27,12,18,437,87,92,78,90,90,1413,489,3 ★,1★,Medium,Medium,3 ★,87,92,78,90,52,90,150
3,192985,K. De Bruyne,Kevin De Bruyne,https://cdn.sofifa.com/players/192/985/21_60.png,http://sofifa.com/player/192985/kevin-de-bruyn...,Belgium,29,91,91,\n\n\n\nManchester City,2015 ~ 2023,"CAM, CM",181cm,70kg,Right,91,CAM,"Aug 30, 2015",,€129M,€370K,€161M,407,94,82,55,94,82,441,88,85,83,93,92,398,77,76,78,91,76,408,91,63,89,74,91,408,76,66,88,94,84,91,186,68,65,53,56,15,13,5,10,13,2304,485,5 ★,4★,High,High,4 ★,76,86,93,88,64,78,207
4,190871,Neymar Jr,Neymar da Silva Santos Jr.,https://cdn.sofifa.com/players/190/871/21_60.png,http://sofifa.com/player/190871/neymar-da-silv...,Brazil,28,91,91,\n\n\n\nParis Saint-Germain,2017 ~ 2022,"LW, CAM",175cm,68kg,Right,91,LW,"Aug 3, 2017",,€132M,€270K,€166.5M,408,85,87,62,87,87,448,95,88,89,81,95,453,94,89,96,91,83,357,80,62,81,50,84,356,51,36,87,90,92,93,94,35,30,29,59,9,9,15,15,11,2175,451,5 ★,5★,High,Medium,5 ★,91,85,86,94,36,59,595


#### Another Observation
It was observed that the Club column has some unwanted strings in it

In [13]:
#Lets preview the Club column alone
data.Club.head()

0           \n\n\n\nFC Barcelona
1               \n\n\n\nJuventus
2        \n\n\n\nAtlético Madrid
3        \n\n\n\nManchester City
4    \n\n\n\nParis Saint-Germain
Name: Club, dtype: object

Let remove the unwanted string in the column

In [14]:
# Previewing how the data set will look like if we remove the unwanted char
print(data['Club'].str[4:])

0               FC Barcelona
1                   Juventus
2            Atlético Madrid
3            Manchester City
4        Paris Saint-Germain
                ...         
18974             Wuhan Zall
18975        Oldham Athletic
18976             Derry City
18977       Dalian YiFang FC
18978       Dalian YiFang FC
Name: Club, Length: 18979, dtype: object


In [15]:
# Using the unique mechod to get the unique values present in the dataset
# The Unique value is use to observe if we have abnormalities in that particular column
data['Club'].head().unique()

array(['\n\n\n\nFC Barcelona', '\n\n\n\nJuventus',
       '\n\n\n\nAtlético Madrid', '\n\n\n\nManchester City',
       '\n\n\n\nParis Saint-Germain'], dtype=object)

#### Obersvation
from the Observation, the string char "\n\n\n\n" is common to most of the string characters in that column
we can replace the unwanted string char with an empty string

In [16]:
# replacing the unwanted string character with an empty string
data['Club'] = data['Club'].str.replace('\n\n\n\n', '')

In [17]:
# lets inspect the dataset to see if the change is applied
data.head()

Unnamed: 0,ID,Name,LongName,photoUrl,playerUrl,Nationality,Age,OVA,POT,Club,Contract,Positions,Height,Weight,Preferred_Foot,BOV,Best_Position,Joined,Loan_Date_End,Value,Wage,Release_Clause,Attacking,Crossing,Finishing,Heading_Accuracy,Short_Passing,Volleys,Skill,Dribbling,Curve,FK_Accuracy,Long_Passing,Ball_Control,Movement,Acceleration,Sprint_Speed,Agility,Reactions,Balance,Power,Shot_Power,Jumping,Stamina,Strength,Long_Shots,Mentality,Aggression,Interceptions,Positioning,Vision,Penalties,Composure,Defending,Marking,Standing_Tackle,Sliding_Tackle,Goalkeeping,GK_Diving,GK_Handling,GK_Kicking,GK_Positioning,GK_Reflexes,Total_Stats,Base_Stats,W/F,SM,A/W,D/W,IR,PAC,SHO,PAS,DRI,DEF,PHY,Hits
0,158023,L. Messi,Lionel Messi,https://cdn.sofifa.com/players/158/023/21_60.png,http://sofifa.com/player/158023/lionel-messi/2...,Argentina,33,93,93,FC Barcelona,2004 ~ 2021,"RW, ST, CF",170cm,72kg,Left,93,RW,"Jul 1, 2004",,€103.5M,€560K,€138.4M,429,85,95,70,91,88,470,96,93,94,91,96,451,91,80,91,94,95,389,86,68,72,69,94,347,44,40,93,95,75,96,91,32,35,24,54,6,11,15,14,8,2231,466,4 ★,4★,Medium,Low,5 ★,85,92,91,95,38,65,771
1,20801,Cristiano Ronaldo,C. Ronaldo dos Santos Aveiro,https://cdn.sofifa.com/players/020/801/21_60.png,http://sofifa.com/player/20801/c-ronaldo-dos-s...,Portugal,35,92,92,Juventus,2018 ~ 2022,"ST, LW",187cm,83kg,Right,92,ST,"Jul 10, 2018",,€63M,€220K,€75.9M,437,84,95,90,82,86,414,88,81,76,77,92,431,87,91,87,95,71,444,94,95,84,78,93,353,63,29,95,82,84,95,84,28,32,24,58,7,11,15,14,11,2221,464,4 ★,5★,High,Low,5 ★,89,93,81,89,35,77,562
2,200389,J. Oblak,Jan Oblak,https://cdn.sofifa.com/players/200/389/21_60.png,http://sofifa.com/player/200389/jan-oblak/210006/,Slovenia,27,91,93,Atlético Madrid,2014 ~ 2023,GK,188cm,87kg,Right,91,GK,"Jul 16, 2014",,€120M,€125K,€159.4M,95,13,11,15,43,13,109,12,13,14,40,30,307,43,60,67,88,49,268,59,78,41,78,12,140,34,19,11,65,11,68,57,27,12,18,437,87,92,78,90,90,1413,489,3 ★,1★,Medium,Medium,3 ★,87,92,78,90,52,90,150
3,192985,K. De Bruyne,Kevin De Bruyne,https://cdn.sofifa.com/players/192/985/21_60.png,http://sofifa.com/player/192985/kevin-de-bruyn...,Belgium,29,91,91,Manchester City,2015 ~ 2023,"CAM, CM",181cm,70kg,Right,91,CAM,"Aug 30, 2015",,€129M,€370K,€161M,407,94,82,55,94,82,441,88,85,83,93,92,398,77,76,78,91,76,408,91,63,89,74,91,408,76,66,88,94,84,91,186,68,65,53,56,15,13,5,10,13,2304,485,5 ★,4★,High,High,4 ★,76,86,93,88,64,78,207
4,190871,Neymar Jr,Neymar da Silva Santos Jr.,https://cdn.sofifa.com/players/190/871/21_60.png,http://sofifa.com/player/190871/neymar-da-silv...,Brazil,28,91,91,Paris Saint-Germain,2017 ~ 2022,"LW, CAM",175cm,68kg,Right,91,LW,"Aug 3, 2017",,€132M,€270K,€166.5M,408,85,87,62,87,87,448,95,88,89,81,95,453,94,89,96,91,83,357,80,62,81,50,84,356,51,36,87,90,92,93,94,35,30,29,59,9,9,15,15,11,2175,451,5 ★,5★,High,Medium,5 ★,91,85,86,94,36,59,595


#### Further cleaning
Also, for our Height and Weight, We can rename the columns to carry the SI unit, then removing the unit from the entire column, and then convert the column to an int type

In [18]:
# renaming the Height column to by adding the SI of the height to it

data.rename(columns = {'Height': 'Height_cm'}, inplace = True)

In [19]:
# renaming the Weight column to by adding the SI of weight to it

data.rename(columns = {'Weight': 'Weight_kg'}, inplace = True)

In [20]:
# Inspecting the dataset to see if the change is applied
data.head(4)

Unnamed: 0,ID,Name,LongName,photoUrl,playerUrl,Nationality,Age,OVA,POT,Club,Contract,Positions,Height_cm,Weight_kg,Preferred_Foot,BOV,Best_Position,Joined,Loan_Date_End,Value,Wage,Release_Clause,Attacking,Crossing,Finishing,Heading_Accuracy,Short_Passing,Volleys,Skill,Dribbling,Curve,FK_Accuracy,Long_Passing,Ball_Control,Movement,Acceleration,Sprint_Speed,Agility,Reactions,Balance,Power,Shot_Power,Jumping,Stamina,Strength,Long_Shots,Mentality,Aggression,Interceptions,Positioning,Vision,Penalties,Composure,Defending,Marking,Standing_Tackle,Sliding_Tackle,Goalkeeping,GK_Diving,GK_Handling,GK_Kicking,GK_Positioning,GK_Reflexes,Total_Stats,Base_Stats,W/F,SM,A/W,D/W,IR,PAC,SHO,PAS,DRI,DEF,PHY,Hits
0,158023,L. Messi,Lionel Messi,https://cdn.sofifa.com/players/158/023/21_60.png,http://sofifa.com/player/158023/lionel-messi/2...,Argentina,33,93,93,FC Barcelona,2004 ~ 2021,"RW, ST, CF",170cm,72kg,Left,93,RW,"Jul 1, 2004",,€103.5M,€560K,€138.4M,429,85,95,70,91,88,470,96,93,94,91,96,451,91,80,91,94,95,389,86,68,72,69,94,347,44,40,93,95,75,96,91,32,35,24,54,6,11,15,14,8,2231,466,4 ★,4★,Medium,Low,5 ★,85,92,91,95,38,65,771
1,20801,Cristiano Ronaldo,C. Ronaldo dos Santos Aveiro,https://cdn.sofifa.com/players/020/801/21_60.png,http://sofifa.com/player/20801/c-ronaldo-dos-s...,Portugal,35,92,92,Juventus,2018 ~ 2022,"ST, LW",187cm,83kg,Right,92,ST,"Jul 10, 2018",,€63M,€220K,€75.9M,437,84,95,90,82,86,414,88,81,76,77,92,431,87,91,87,95,71,444,94,95,84,78,93,353,63,29,95,82,84,95,84,28,32,24,58,7,11,15,14,11,2221,464,4 ★,5★,High,Low,5 ★,89,93,81,89,35,77,562
2,200389,J. Oblak,Jan Oblak,https://cdn.sofifa.com/players/200/389/21_60.png,http://sofifa.com/player/200389/jan-oblak/210006/,Slovenia,27,91,93,Atlético Madrid,2014 ~ 2023,GK,188cm,87kg,Right,91,GK,"Jul 16, 2014",,€120M,€125K,€159.4M,95,13,11,15,43,13,109,12,13,14,40,30,307,43,60,67,88,49,268,59,78,41,78,12,140,34,19,11,65,11,68,57,27,12,18,437,87,92,78,90,90,1413,489,3 ★,1★,Medium,Medium,3 ★,87,92,78,90,52,90,150
3,192985,K. De Bruyne,Kevin De Bruyne,https://cdn.sofifa.com/players/192/985/21_60.png,http://sofifa.com/player/192985/kevin-de-bruyn...,Belgium,29,91,91,Manchester City,2015 ~ 2023,"CAM, CM",181cm,70kg,Right,91,CAM,"Aug 30, 2015",,€129M,€370K,€161M,407,94,82,55,94,82,441,88,85,83,93,92,398,77,76,78,91,76,408,91,63,89,74,91,408,76,66,88,94,84,91,186,68,65,53,56,15,13,5,10,13,2304,485,5 ★,4★,High,High,4 ★,76,86,93,88,64,78,207


Since the Height column has been renamed, we can now remove the SI from the entire column.
before then, lets see if there are some abnormal values in the column

### Cleaning the Height column

In [21]:
# using the unique() method to see the unique values contained in the data set
data['Height_cm'].unique()

array(['170cm', '187cm', '188cm', '181cm', '175cm', '184cm', '191cm',
       '178cm', '193cm', '185cm', '199cm', '173cm', '168cm', '176cm',
       '177cm', '183cm', '180cm', '189cm', '179cm', '195cm', '172cm',
       '182cm', '186cm', '192cm', '165cm', '194cm', '167cm', '196cm',
       '163cm', '190cm', '174cm', '169cm', '171cm', '197cm', '200cm',
       '166cm', '6\'2"', '164cm', '198cm', '6\'3"', '6\'5"', '5\'11"',
       '6\'4"', '6\'1"', '6\'0"', '5\'10"', '5\'9"', '5\'6"', '5\'7"',
       '5\'4"', '201cm', '158cm', '162cm', '161cm', '160cm', '203cm',
       '157cm', '156cm', '202cm', '159cm', '206cm', '155cm'], dtype=object)

From our observation, not all the values in the height column are represented in centimeters(cm),
We will remove the cm from all the values in the column 
also, we will clean the abnormal values which was represented in ft-inches and then convert it to cm

In [22]:
# replace the values that are represented in ft, inches and converting it to centimetres

In [23]:
#to define a function that will do the conversion, i will be using the unique last value as a reference
data['Height_cm'].str[-1].unique()

array(['m', '"'], dtype=object)

In [24]:
# define a function 
#using the unique last value as a reference of condition
# replacing the unwant string value with an empty string
# converting the concerned value from their respective units to centimeter
# Adding both units to complete the conversion
# return the answer
# if else, resume the other centimeter value without the cm
def convert_height(x):
    if x[-1] == '"':
        x = x.replace("\"","")
        inch = int(x[2:]) * 2.54
        foot = int(x[0]) * 30.48
        return round(foot+inch)
    elif x[-1] == "m":
        return int(x[:-2])


In [25]:
# Applying the function to the Height column
data['Height_cm'] = data['Height_cm'].apply(convert_height)

In [26]:
#checking if the change is applied
data['Height_cm'].unique()

array([170, 187, 188, 181, 175, 184, 191, 178, 193, 185, 199, 173, 168,
       176, 177, 183, 180, 189, 179, 195, 172, 182, 186, 192, 165, 194,
       167, 196, 163, 190, 174, 169, 171, 197, 200, 166, 164, 198, 201,
       158, 162, 161, 160, 203, 157, 156, 202, 159, 206, 155])

In [27]:
data['Height_cm'].isnull().sum()

0

### Cleaning the weight column
Also for the Weight column, we will convert the lbs to kg

Also for the weight column

In [28]:
#previewing the unique values of the weight column the SI units are both kg and lbs
data['Weight_kg'].unique()

array(['72kg', '83kg', '87kg', '70kg', '68kg', '80kg', '71kg', '91kg',
       '73kg', '85kg', '92kg', '69kg', '84kg', '96kg', '81kg', '82kg',
       '75kg', '86kg', '89kg', '74kg', '76kg', '64kg', '78kg', '90kg',
       '66kg', '60kg', '94kg', '79kg', '67kg', '65kg', '59kg', '61kg',
       '93kg', '88kg', '97kg', '77kg', '62kg', '63kg', '95kg', '100kg',
       '58kg', '183lbs', '179lbs', '172lbs', '196lbs', '176lbs', '185lbs',
       '170lbs', '203lbs', '168lbs', '161lbs', '146lbs', '130lbs',
       '190lbs', '174lbs', '148lbs', '165lbs', '159lbs', '192lbs',
       '181lbs', '139lbs', '154lbs', '157lbs', '163lbs', '98kg', '103kg',
       '99kg', '102kg', '56kg', '101kg', '57kg', '55kg', '104kg', '107kg',
       '110kg', '53kg', '50kg', '54kg', '52kg'], dtype=object)

In [29]:
#to define a function that will do the conversion, i will be using the unique last value as a reference
data['Weight_kg'].str[-1].unique()

array(['g', 's'], dtype=object)

In [30]:
# define a function 
#using the unique last value as a reference of condition
# replacing the unwant string value with an empty string
# converting the concerned value from it unit to Kilogram
# return the rounded off answer
# if else, preserve the other kilogram value without the kg

def convert_weight(x):
    if x[-1] == 's':
        x = x.replace("lbs","")
        kg = int(x[:3]) / 2.205
        return round(kg)
    elif x[-1] == "g":
        return int(x[:-2])

In [31]:
# Applying the function to the weight column 
data['Weight_kg'] = data['Weight_kg'].apply(convert_weight)

In [32]:
# checking to see if the change is applied
data['Weight_kg'].unique()

array([ 72,  83,  87,  70,  68,  80,  71,  91,  73,  85,  92,  69,  84,
        96,  81,  82,  75,  86,  89,  74,  76,  64,  78,  90,  66,  60,
        94,  79,  67,  65,  59,  61,  93,  88,  97,  77,  62,  63,  95,
       100,  58,  98, 103,  99, 102,  56, 101,  57,  55, 104, 107, 110,
        53,  50,  54,  52])

In [33]:
#Previewing the data set
data.head(5)

Unnamed: 0,ID,Name,LongName,photoUrl,playerUrl,Nationality,Age,OVA,POT,Club,Contract,Positions,Height_cm,Weight_kg,Preferred_Foot,BOV,Best_Position,Joined,Loan_Date_End,Value,Wage,Release_Clause,Attacking,Crossing,Finishing,Heading_Accuracy,Short_Passing,Volleys,Skill,Dribbling,Curve,FK_Accuracy,Long_Passing,Ball_Control,Movement,Acceleration,Sprint_Speed,Agility,Reactions,Balance,Power,Shot_Power,Jumping,Stamina,Strength,Long_Shots,Mentality,Aggression,Interceptions,Positioning,Vision,Penalties,Composure,Defending,Marking,Standing_Tackle,Sliding_Tackle,Goalkeeping,GK_Diving,GK_Handling,GK_Kicking,GK_Positioning,GK_Reflexes,Total_Stats,Base_Stats,W/F,SM,A/W,D/W,IR,PAC,SHO,PAS,DRI,DEF,PHY,Hits
0,158023,L. Messi,Lionel Messi,https://cdn.sofifa.com/players/158/023/21_60.png,http://sofifa.com/player/158023/lionel-messi/2...,Argentina,33,93,93,FC Barcelona,2004 ~ 2021,"RW, ST, CF",170,72,Left,93,RW,"Jul 1, 2004",,€103.5M,€560K,€138.4M,429,85,95,70,91,88,470,96,93,94,91,96,451,91,80,91,94,95,389,86,68,72,69,94,347,44,40,93,95,75,96,91,32,35,24,54,6,11,15,14,8,2231,466,4 ★,4★,Medium,Low,5 ★,85,92,91,95,38,65,771
1,20801,Cristiano Ronaldo,C. Ronaldo dos Santos Aveiro,https://cdn.sofifa.com/players/020/801/21_60.png,http://sofifa.com/player/20801/c-ronaldo-dos-s...,Portugal,35,92,92,Juventus,2018 ~ 2022,"ST, LW",187,83,Right,92,ST,"Jul 10, 2018",,€63M,€220K,€75.9M,437,84,95,90,82,86,414,88,81,76,77,92,431,87,91,87,95,71,444,94,95,84,78,93,353,63,29,95,82,84,95,84,28,32,24,58,7,11,15,14,11,2221,464,4 ★,5★,High,Low,5 ★,89,93,81,89,35,77,562
2,200389,J. Oblak,Jan Oblak,https://cdn.sofifa.com/players/200/389/21_60.png,http://sofifa.com/player/200389/jan-oblak/210006/,Slovenia,27,91,93,Atlético Madrid,2014 ~ 2023,GK,188,87,Right,91,GK,"Jul 16, 2014",,€120M,€125K,€159.4M,95,13,11,15,43,13,109,12,13,14,40,30,307,43,60,67,88,49,268,59,78,41,78,12,140,34,19,11,65,11,68,57,27,12,18,437,87,92,78,90,90,1413,489,3 ★,1★,Medium,Medium,3 ★,87,92,78,90,52,90,150
3,192985,K. De Bruyne,Kevin De Bruyne,https://cdn.sofifa.com/players/192/985/21_60.png,http://sofifa.com/player/192985/kevin-de-bruyn...,Belgium,29,91,91,Manchester City,2015 ~ 2023,"CAM, CM",181,70,Right,91,CAM,"Aug 30, 2015",,€129M,€370K,€161M,407,94,82,55,94,82,441,88,85,83,93,92,398,77,76,78,91,76,408,91,63,89,74,91,408,76,66,88,94,84,91,186,68,65,53,56,15,13,5,10,13,2304,485,5 ★,4★,High,High,4 ★,76,86,93,88,64,78,207
4,190871,Neymar Jr,Neymar da Silva Santos Jr.,https://cdn.sofifa.com/players/190/871/21_60.png,http://sofifa.com/player/190871/neymar-da-silv...,Brazil,28,91,91,Paris Saint-Germain,2017 ~ 2022,"LW, CAM",175,68,Right,91,LW,"Aug 3, 2017",,€132M,€270K,€166.5M,408,85,87,62,87,87,448,95,88,89,81,95,453,94,89,96,91,83,357,80,62,81,50,84,356,51,36,87,90,92,93,94,35,30,29,59,9,9,15,15,11,2175,451,5 ★,5★,High,Medium,5 ★,91,85,86,94,36,59,595


## Moving on with cleaning other columns

Also according to the info() of the dataset, some other columns needs cleaning

### Cleaning The Contract Column

In [34]:
# Checking the unique values in the contract column
data['Contract'].unique()

array(['2004 ~ 2021', '2018 ~ 2022', '2014 ~ 2023', '2015 ~ 2023',
       '2017 ~ 2022', '2017 ~ 2023', '2018 ~ 2024', '2014 ~ 2022',
       '2018 ~ 2023', '2016 ~ 2023', '2013 ~ 2023', '2011 ~ 2023',
       '2009 ~ 2022', '2005 ~ 2021', '2011 ~ 2021', '2015 ~ 2022',
       '2017 ~ 2024', '2010 ~ 2024', '2012 ~ 2021', '2019 ~ 2024',
       '2015 ~ 2024', '2017 ~ 2025', '2020 ~ 2025', '2019 ~ 2023',
       '2008 ~ 2023', '2015 ~ 2021', '2020 ~ 2022', '2012 ~ 2022',
       '2016 ~ 2025', '2013 ~ 2022', '2011 ~ 2022', '2012 ~ 2024',
       '2016 ~ 2021', '2012 ~ 2023', '2008 ~ 2022', '2019 ~ 2022',
       '2017 ~ 2021', '2013 ~ 2024', '2020 ~ 2024', '2010 ~ 2022',
       '2020 ~ 2021', '2011 ~ 2024', '2020 ~ 2023', '2014 ~ 2024',
       '2013 ~ 2026', '2016 ~ 2022', '2010 ~ 2021', '2013 ~ 2021',
       '2019 ~ 2025', '2018 ~ 2025', '2016 ~ 2024', '2018 ~ 2021',
       '2009 ~ 2024', '2007 ~ 2022', 'Jun 30, 2021 On Loan',
       '2009 ~ 2021', '2019 ~ 2021', '2019 ~ 2026', 'Free', '2012 ~ 

Its is observed that some players have year range contracts which includes the date and year of expiration,
Some players are on loan with year of expiration
Some players are on Free contracts

So Idealy,it is ideal to convert the column into a categorical column, i will be using the last value on each column as a reference for my python function

In [35]:
#Some row with the year range contains the range of 0 - 9 as their last string value
str(list(range(10)))

'[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]'

In [36]:
# see the unique last value of the contract column
data['Contract'].str[-1].unique()

array(['1', '2', '3', '4', '5', '6', 'n', 'e', '8', '0', '7'],
      dtype=object)

In [37]:
#Using a function to set values for each rows containing the unique last values used as reference

def convert_Cont(x):
    if x[-1] in str(list(range(10))):
        x = 'Active'
        return x
    elif x[-1] == 'n':
        x = 'Loan'
        return x
    elif x[-1] == 'e':
        x = 'Free'
        return x
        

In [38]:
#Applying the function
data['Contract'] = data['Contract'].apply(convert_Cont)

In [39]:
#converting the column into a categorical type 
data['Contract'] = data['Contract'].astype('category')

In [40]:
#renaming the column from contract to contract status
data.rename(columns = {'Contract': 'Contract_Status'}, inplace = True)

In [41]:
data[data['Contract_Status'] == 'Free']

Unnamed: 0,ID,Name,LongName,photoUrl,playerUrl,Nationality,Age,OVA,POT,Club,Contract_Status,Positions,Height_cm,Weight_kg,Preferred_Foot,BOV,Best_Position,Joined,Loan_Date_End,Value,Wage,Release_Clause,Attacking,Crossing,Finishing,Heading_Accuracy,Short_Passing,Volleys,Skill,Dribbling,Curve,FK_Accuracy,Long_Passing,Ball_Control,Movement,Acceleration,Sprint_Speed,Agility,Reactions,Balance,Power,Shot_Power,Jumping,Stamina,Strength,Long_Shots,Mentality,Aggression,Interceptions,Positioning,Vision,Penalties,Composure,Defending,Marking,Standing_Tackle,Sliding_Tackle,Goalkeeping,GK_Diving,GK_Handling,GK_Kicking,GK_Positioning,GK_Reflexes,Total_Stats,Base_Stats,W/F,SM,A/W,D/W,IR,PAC,SHO,PAS,DRI,DEF,PHY,Hits
289,230347,Welington Dano,Welington Kauê Dano Nascimento,https://cdn.sofifa.com/players/230/347/21_60.png,http://sofifa.com/player/230347/welington-kaue...,Brazil,20,81,81,No Club,Free,"LB, LM",178,69,Left,81,LB,"Jan 1, 2019",,€0,€0,€0,327,82,51,69,78,47,361,77,83,52,71,78,402,78,80,83,80,81,336,55,71,89,68,53,354,69,77,72,75,61,82,228,70,77,81,60,16,15,15,7,7,2068,436,4 ★,4★,Medium,Medium,1 ★,79,54,76,78,75,74,172
292,230225,Juiano Mestres,Juan Everton Mestres de Mesquita,https://cdn.sofifa.com/players/230/225/21_60.png,http://sofifa.com/player/230225/juan-everton-m...,Brazil,24,81,81,No Club,Free,"CB, CDM",181,82,Right,81,CB,"Jan 1, 2019",,€0,€0,€0,309,40,56,83,67,63,312,60,72,58,58,64,344,68,74,61,78,63,381,74,91,76,79,61,338,86,82,58,50,62,70,246,77,85,84,73,18,12,14,11,18,2003,413,3 ★,2★,High,Medium,1 ★,71,61,57,62,82,80,75
369,245299,J. Frendado,Jaime Nicolás Frendado,https://cdn.sofifa.com/players/245/299/21_60.png,http://sofifa.com/player/245299/jaime-nicolas-...,Uruguay,36,80,80,No Club,Free,"CB, CDM",181,82,Right,80,CB,"Aug 10, 2018",,€0,€0,€0,306,40,56,80,67,63,312,60,72,58,58,64,341,67,74,61,76,63,363,74,91,60,77,61,338,86,82,58,50,62,70,247,83,82,82,80,14,15,15,17,19,1987,408,3 ★,2★,High,Medium,1 ★,71,61,57,62,82,75,11
374,245294,J. Serendero,Jorge Ezequiel Serendero,https://cdn.sofifa.com/players/245/294/21_60.png,http://sofifa.com/player/245294/jorge-ezequiel...,Uruguay,32,80,80,No Club,Free,GK,190,85,Right,80,GK,"Aug 10, 2018",,€0,€0,€0,55,10,8,11,18,8,73,9,9,18,16,21,236,41,42,34,78,41,240,58,67,29,79,7,169,34,23,27,71,14,67,42,12,17,13,393,78,81,77,80,77,1208,435,2 ★,1★,Medium,Medium,1 ★,78,81,77,77,42,80,18
375,245308,M. Nérez,Mauro Evidio Nérez,https://cdn.sofifa.com/players/245/308/21_60.png,http://sofifa.com/player/245308/mauro-evidio-n...,Uruguay,32,80,80,No Club,Free,"LB, LM",178,69,Left,80,LB,"Aug 10, 2018",,€0,€0,€0,322,77,51,69,78,47,361,77,83,52,71,78,402,78,80,83,80,81,336,55,71,89,68,53,354,69,77,72,75,61,82,230,77,77,76,60,16,15,15,7,7,2065,436,4 ★,4★,Medium,Medium,1 ★,79,54,75,78,76,74,17
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
17220,247059,A. Al Hidhani,Ali Al Hidhani,https://cdn.sofifa.com/players/247/059/21_60.png,http://sofifa.com/player/247059/ali-al-hidhani...,United Arab Emirates,22,56,64,No Club,Free,RB,172,67,Right,56,RB,"Jul 1, 2018",,€0,€0,€0,205,51,34,44,49,27,215,53,40,34,38,50,345,72,73,67,53,80,237,35,65,58,50,29,231,60,51,45,40,35,43,157,52,53,52,53,9,11,10,14,9,1443,312,3 ★,2★,Medium,Medium,1 ★,73,33,45,55,51,55,
17343,251734,A. Sanghu,Anudaan Sanghu,https://cdn.sofifa.com/players/251/734/21_60.png,http://sofifa.com/player/251734/anudaan-sanghu...,India,20,56,56,No Club,Free,"RM, CM",168,62,Right,56,RM,"Jul 12, 2019",,€0,€0,€0,236,49,38,43,51,55,265,57,61,48,44,55,356,81,77,81,38,79,251,48,58,74,36,35,235,50,34,49,50,52,50,83,18,32,33,56,16,16,8,6,10,1482,308,3 ★,3★,Medium,Low,1 ★,79,42,50,59,29,49,
17659,251738,M. Chada,Madanapal Chada,https://cdn.sofifa.com/players/251/738/21_60.png,http://sofifa.com/player/251738/madanapal-chad...,India,27,55,55,No Club,Free,ST,178,77,Right,57,RM,"Jul 12, 2019",,€0,€0,€0,244,46,51,45,51,51,243,61,38,35,49,60,312,71,64,66,52,59,295,57,75,72,43,48,248,68,30,59,46,45,57,98,15,43,40,48,8,6,8,15,11,1488,316,1 ★,2★,High,Medium,1 ★,67,52,47,61,32,57,
17661,251741,E. Suresh,Eelamynthan Suresh,https://cdn.sofifa.com/players/251/741/21_60.png,http://sofifa.com/player/251741/eelamynthan-su...,India,28,55,55,No Club,Free,"RM, CAM, RB",175,73,Left,55,RM,"Jul 12, 2019",,€0,€0,€0,244,52,45,48,50,49,259,51,53,55,48,52,339,71,75,72,48,73,327,53,87,71,65,51,279,77,48,51,52,51,48,123,18,54,51,55,13,13,7,9,13,1626,338,3 ★,2★,High,Medium,1 ★,73,49,51,54,41,70,


### Cleaning The Loan_Date_End Column

Before cleaning the column, lets analyse it carefully

In [42]:
#Getting the uniqu values in the column
data['Loan_Date_End'].unique()

array([nan, 'Jun 30, 2021', 'Dec 31, 2020', 'Jan 30, 2021',
       'Jun 30, 2022', 'May 31, 2021', 'Jul 5, 2021', 'Dec 31, 2021',
       'Jul 1, 2021', 'Jan 1, 2021', 'Aug 31, 2021', 'Jan 31, 2021',
       'Dec 30, 2021', 'Jun 23, 2021', 'Jan 3, 2021', 'Nov 27, 2021',
       'Jan 17, 2021', 'Jun 30, 2023', 'Jul 31, 2021', 'Nov 22, 2020',
       'May 31, 2022', 'Dec 30, 2020', 'Jan 4, 2021', 'Nov 30, 2020',
       'Aug 1, 2021'], dtype=object)

It is observed that the column contains NAN (Missing values) values

Let see how many NAN values is in the column

In [43]:
#Getting the number of Nan value in the Load_Date_End column
data['Loan_Date_End'].isnull().sum()

17966

There are 17966 missing value in the Load_date_End, since the amount of missing is far greater than the Available values, It is ideal to drop the entire column, Not all players are on loan, The Contract_Status column already justified that

In [44]:
# Removing the Loan_Date_End from the data
data = data.drop('Loan_Date_End', axis=1)

In [45]:
#Checking if our change is applied
data.head(4)

Unnamed: 0,ID,Name,LongName,photoUrl,playerUrl,Nationality,Age,OVA,POT,Club,Contract_Status,Positions,Height_cm,Weight_kg,Preferred_Foot,BOV,Best_Position,Joined,Value,Wage,Release_Clause,Attacking,Crossing,Finishing,Heading_Accuracy,Short_Passing,Volleys,Skill,Dribbling,Curve,FK_Accuracy,Long_Passing,Ball_Control,Movement,Acceleration,Sprint_Speed,Agility,Reactions,Balance,Power,Shot_Power,Jumping,Stamina,Strength,Long_Shots,Mentality,Aggression,Interceptions,Positioning,Vision,Penalties,Composure,Defending,Marking,Standing_Tackle,Sliding_Tackle,Goalkeeping,GK_Diving,GK_Handling,GK_Kicking,GK_Positioning,GK_Reflexes,Total_Stats,Base_Stats,W/F,SM,A/W,D/W,IR,PAC,SHO,PAS,DRI,DEF,PHY,Hits
0,158023,L. Messi,Lionel Messi,https://cdn.sofifa.com/players/158/023/21_60.png,http://sofifa.com/player/158023/lionel-messi/2...,Argentina,33,93,93,FC Barcelona,Active,"RW, ST, CF",170,72,Left,93,RW,"Jul 1, 2004",€103.5M,€560K,€138.4M,429,85,95,70,91,88,470,96,93,94,91,96,451,91,80,91,94,95,389,86,68,72,69,94,347,44,40,93,95,75,96,91,32,35,24,54,6,11,15,14,8,2231,466,4 ★,4★,Medium,Low,5 ★,85,92,91,95,38,65,771
1,20801,Cristiano Ronaldo,C. Ronaldo dos Santos Aveiro,https://cdn.sofifa.com/players/020/801/21_60.png,http://sofifa.com/player/20801/c-ronaldo-dos-s...,Portugal,35,92,92,Juventus,Active,"ST, LW",187,83,Right,92,ST,"Jul 10, 2018",€63M,€220K,€75.9M,437,84,95,90,82,86,414,88,81,76,77,92,431,87,91,87,95,71,444,94,95,84,78,93,353,63,29,95,82,84,95,84,28,32,24,58,7,11,15,14,11,2221,464,4 ★,5★,High,Low,5 ★,89,93,81,89,35,77,562
2,200389,J. Oblak,Jan Oblak,https://cdn.sofifa.com/players/200/389/21_60.png,http://sofifa.com/player/200389/jan-oblak/210006/,Slovenia,27,91,93,Atlético Madrid,Active,GK,188,87,Right,91,GK,"Jul 16, 2014",€120M,€125K,€159.4M,95,13,11,15,43,13,109,12,13,14,40,30,307,43,60,67,88,49,268,59,78,41,78,12,140,34,19,11,65,11,68,57,27,12,18,437,87,92,78,90,90,1413,489,3 ★,1★,Medium,Medium,3 ★,87,92,78,90,52,90,150
3,192985,K. De Bruyne,Kevin De Bruyne,https://cdn.sofifa.com/players/192/985/21_60.png,http://sofifa.com/player/192985/kevin-de-bruyn...,Belgium,29,91,91,Manchester City,Active,"CAM, CM",181,70,Right,91,CAM,"Aug 30, 2015",€129M,€370K,€161M,407,94,82,55,94,82,441,88,85,83,93,92,398,77,76,78,91,76,408,91,63,89,74,91,408,76,66,88,94,84,91,186,68,65,53,56,15,13,5,10,13,2304,485,5 ★,4★,High,High,4 ★,76,86,93,88,64,78,207


### Cleaning Prefer foot column

In [46]:
#Checking for Uniqueness
data['Preferred_Foot'].unique()

array(['Left', 'Right'], dtype=object)

Since This column contains two type of Uniqueness, we can convert it to a categorical column

In [47]:
data['Preferred_Foot'] = data['Preferred_Foot'].astype('category')

### Cleaning The Joined Column

In [48]:
# Checking for the Uniqueness
data['Joined'].unique()

array(['Jul 1, 2004', 'Jul 10, 2018', 'Jul 16, 2014', ..., 'Sep 22, 2018',
       'Feb 28, 2015', 'Mar 6, 2018'], dtype=object)

This column contains the date the player Joined Fifa. So, it can be converted to a datetime column

In [49]:
#Using the pandas library to convert the Joined Column into a datatime

data['Joined'] = pd.to_datetime(data['Joined'])
print(data['Joined'].info())

<class 'pandas.core.series.Series'>
RangeIndex: 18979 entries, 0 to 18978
Series name: Joined
Non-Null Count  Dtype         
--------------  -----         
18979 non-null  datetime64[ns]
dtypes: datetime64[ns](1)
memory usage: 148.4 KB
None


In [50]:
#Checking if the change is applied
data.head(3)

Unnamed: 0,ID,Name,LongName,photoUrl,playerUrl,Nationality,Age,OVA,POT,Club,Contract_Status,Positions,Height_cm,Weight_kg,Preferred_Foot,BOV,Best_Position,Joined,Value,Wage,Release_Clause,Attacking,Crossing,Finishing,Heading_Accuracy,Short_Passing,Volleys,Skill,Dribbling,Curve,FK_Accuracy,Long_Passing,Ball_Control,Movement,Acceleration,Sprint_Speed,Agility,Reactions,Balance,Power,Shot_Power,Jumping,Stamina,Strength,Long_Shots,Mentality,Aggression,Interceptions,Positioning,Vision,Penalties,Composure,Defending,Marking,Standing_Tackle,Sliding_Tackle,Goalkeeping,GK_Diving,GK_Handling,GK_Kicking,GK_Positioning,GK_Reflexes,Total_Stats,Base_Stats,W/F,SM,A/W,D/W,IR,PAC,SHO,PAS,DRI,DEF,PHY,Hits
0,158023,L. Messi,Lionel Messi,https://cdn.sofifa.com/players/158/023/21_60.png,http://sofifa.com/player/158023/lionel-messi/2...,Argentina,33,93,93,FC Barcelona,Active,"RW, ST, CF",170,72,Left,93,RW,2004-07-01,€103.5M,€560K,€138.4M,429,85,95,70,91,88,470,96,93,94,91,96,451,91,80,91,94,95,389,86,68,72,69,94,347,44,40,93,95,75,96,91,32,35,24,54,6,11,15,14,8,2231,466,4 ★,4★,Medium,Low,5 ★,85,92,91,95,38,65,771
1,20801,Cristiano Ronaldo,C. Ronaldo dos Santos Aveiro,https://cdn.sofifa.com/players/020/801/21_60.png,http://sofifa.com/player/20801/c-ronaldo-dos-s...,Portugal,35,92,92,Juventus,Active,"ST, LW",187,83,Right,92,ST,2018-07-10,€63M,€220K,€75.9M,437,84,95,90,82,86,414,88,81,76,77,92,431,87,91,87,95,71,444,94,95,84,78,93,353,63,29,95,82,84,95,84,28,32,24,58,7,11,15,14,11,2221,464,4 ★,5★,High,Low,5 ★,89,93,81,89,35,77,562
2,200389,J. Oblak,Jan Oblak,https://cdn.sofifa.com/players/200/389/21_60.png,http://sofifa.com/player/200389/jan-oblak/210006/,Slovenia,27,91,93,Atlético Madrid,Active,GK,188,87,Right,91,GK,2014-07-16,€120M,€125K,€159.4M,95,13,11,15,43,13,109,12,13,14,40,30,307,43,60,67,88,49,268,59,78,41,78,12,140,34,19,11,65,11,68,57,27,12,18,437,87,92,78,90,90,1413,489,3 ★,1★,Medium,Medium,3 ★,87,92,78,90,52,90,150


### Cleaning the Best Position Column

Let is first analyse the type of values in the Best Position column

In [51]:
# We preview the unique values in that column
data['Best_Position'].unique()

array(['RW', 'ST', 'GK', 'CAM', 'LW', 'CB', 'CDM', 'CF', 'CM', 'RB', 'LB',
       'LM', 'RM', 'LWB', 'RWB'], dtype=object)

We can observe that it contain the best position fit for the player, and it looks like it is categorical. We can then convert the column into a categorical column

In [52]:
#converting the Best_Position column into a categorical type
data['Best_Position'] = data['Best_Position'].astype('category')

# Checking if the change is applied
print(data['Best_Position'].info())

<class 'pandas.core.series.Series'>
RangeIndex: 18979 entries, 0 to 18978
Series name: Best_Position
Non-Null Count  Dtype   
--------------  -----   
18979 non-null  category
dtypes: category(1)
memory usage: 19.3 KB
None


### Setting the Joined Column in ascending order

Rearranging the Joined column in ascending order and setting it as the Index of the dataframe

This enable us to when the players joined in ascending order

In [53]:
#Arrange the Joined column inscending order
data.sort_values(by='Joined', inplace=True)

#Setting the Joined column as Index
data.set_index('Joined', inplace=True) 

In [54]:
 
#Setting the Joined column as Index
data.head()

Unnamed: 0_level_0,ID,Name,LongName,photoUrl,playerUrl,Nationality,Age,OVA,POT,Club,Contract_Status,Positions,Height_cm,Weight_kg,Preferred_Foot,BOV,Best_Position,Value,Wage,Release_Clause,Attacking,Crossing,Finishing,Heading_Accuracy,Short_Passing,Volleys,Skill,Dribbling,Curve,FK_Accuracy,Long_Passing,Ball_Control,Movement,Acceleration,Sprint_Speed,Agility,Reactions,Balance,Power,Shot_Power,Jumping,Stamina,Strength,Long_Shots,Mentality,Aggression,Interceptions,Positioning,Vision,Penalties,Composure,Defending,Marking,Standing_Tackle,Sliding_Tackle,Goalkeeping,GK_Diving,GK_Handling,GK_Kicking,GK_Positioning,GK_Reflexes,Total_Stats,Base_Stats,W/F,SM,A/W,D/W,IR,PAC,SHO,PAS,DRI,DEF,PHY,Hits
Joined,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1
1998-01-01,140181,H. Sogahata,Hitoshi Sogahata,https://cdn.sofifa.com/players/140/181/21_60.png,http://sofifa.com/player/140181/hitoshi-sogaha...,Japan,40,65,65,Kashima Antlers,Active,GK,187,80,Right,65,GK,€80K,€1K,€63K,87,13,12,18,26,18,84,14,10,11,33,16,143,17,19,28,47,32,203,48,52,37,51,15,100,21,12,10,45,12,20,42,17,12,13,331,64,66,64,72,65,990,349,3 ★,1★,Medium,Medium,1 ★,64,66,64,65,18,72,1
2002-01-01,184900,Kim Kwang Suk,Kwang Suk Kim,https://cdn.sofifa.com/players/184/900/21_60.png,http://sofifa.com/player/184900/kwang-suk-kim/...,Korea Republic,37,70,70,Pohang Steelers,Active,CB,183,73,Right,70,CB,€275K,€3K,€438K,260,58,38,70,55,39,232,51,46,30,49,56,325,69,61,54,73,68,316,48,89,63,75,41,285,71,73,55,43,43,56,205,70,69,66,53,9,8,16,13,7,1676,355,3 ★,2★,Low,High,1 ★,65,42,51,55,70,72,3
2002-01-01,148119,I. Akinfeev,Igor Akinfeev,https://cdn.sofifa.com/players/148/119/21_60.png,http://sofifa.com/player/148119/igor-akinfeev/...,Russia,34,80,80,PFC CSKA Moscow,Active,GK,186,78,Right,80,GK,€3.6M,€40K,€10.4M,83,19,13,18,23,10,91,15,19,13,22,22,270,51,50,53,71,45,241,59,62,35,72,13,121,24,17,11,59,10,67,44,18,13,13,393,77,72,78,82,84,1243,443,3 ★,1★,Medium,Medium,3 ★,77,72,78,84,50,82,18
2002-08-01,138830,D. Lewington,Dean Lewington,https://cdn.sofifa.com/players/138/830/21_60.png,http://sofifa.com/player/138830/dean-lewington...,England,36,65,65,Milton Keynes Dons,Active,"LB, LWB",183,86,Left,65,LB,€160K,€2K,€175K,229,66,22,58,65,18,300,54,59,64,62,61,273,48,41,67,62,55,322,51,66,80,78,47,267,68,63,58,56,22,71,193,63,67,63,63,7,12,16,14,14,1647,339,3 ★,2★,Medium,Medium,1 ★,44,34,63,58,64,76,5
2003-01-01,142333,M. McNulty,Mark McNulty,https://cdn.sofifa.com/players/142/333/21_60.png,http://sofifa.com/player/142333/mark-mcnulty/2...,Republic of Ireland,39,60,60,Cork City,Active,GK,190,70,Right,60,GK,€25K,€500,€31K,90,16,12,12,36,14,89,12,19,12,23,23,215,34,34,48,57,42,179,41,43,31,49,15,122,37,16,12,35,22,50,34,10,10,14,292,60,61,54,60,57,1021,326,2 ★,1★,Medium,Medium,1 ★,60,61,54,57,34,60,2


### Cleaning the Value Column

In [55]:
#Let see how the values in the Value columnn looks like

data['Value'].unique()

array(['€80K', '€275K', '€3.6M', '€160K', '€25K', '€400K', '€600K',
       '€35K', '€180K', '€70K', '€550K', '€575K', '€230K', '€1M',
       '€103.5M', '€3.5M', '€1.2M', '€850K', '€0', '€240K', '€1.5M',
       '€110K', '€21M', '€1.6M', '€33.5M', '€500K', '€825K', '€375K',
       '€3.7M', '€475K', '€25M', '€925K', '€90K', '€800K', '€130K',
       '€15.5M', '€1.3M', '€625K', '€325K', '€2.4M', '€1.4M', '€4.9M',
       '€7M', '€700K', '€6.5M', '€425K', '€2.9M', '€750K', '€2.6M',
       '€190K', '€18.5M', '€875K', '€60K', '€32.5M', '€65.5M', '€56M',
       '€1.1M', '€2.1M', '€12.5M', '€210K', '€675K', '€10M', '€150K',
       '€7.5M', '€4.6M', '€450K', '€8.5M', '€83.5M', '€4.1M', '€38.5M',
       '€22M', '€975K', '€120K', '€950K', '€2.5M', '€775K', '€900K',
       '€36.5M', '€2.7M', '€5.5M', '€14.5M', '€2M', '€2.3M', '€32M',
       '€220K', '€12M', '€4.8M', '€109M', '€53M', '€14M', '€4M', '€22.5M',
       '€3.8M', '€17.5M', '€6M', '€1.9M', '€350K', '€54M', '€20K',
       '€1.8M', '€300K', '€

we observe that the value have Euro sign as the prefix and M,K as the suffix

We are going to remove the Euro sign from the entire value in the column and the M as Millions and K as thousand, then wr can convert the column to an int type

Also we are going to rename the column so that it can indicate the type of currency used to estimate the player's value

In [56]:
# Defining a function that can elimated the unwanted string and will convert the values into integers
def clean_value(x):
    # replacing the euro sign with an empty string
    x = x.replace('€', '')
    # setting a condition if there is K in the column it should remove it 
    # and then convert the value to a float
    # then multipy the value by 1000
    if  'K' in x:
        x = x.replace('K', '')
        x = float(x)*1000
        # return the answer as a rounded off value
        return round(x)
    # setting a condition if there is M in the column it should remove it 
    # and then convert the value to a float
    # then multipy the value by 1000000
    elif 'M' in x:
        x = x.replace('M','')
        x = float(x)*1000000
        # return the answer as a rounded off value
        return round(x)
    # setting a condition if there is B in the column it should remove it 
    # and then convert the value to a float
    # then multipy the value by 1000000000
    elif 'B' in x:
        x = x.replace('B','')
        x = float(x)*1000000000
        # return the answer as a rounded off value
        return round(x)
    # setting a condition if there is 0 in the column
    # it should convert it to a float
    elif '0' in x:
        x = float(x)
    # returning it as a rounded off value
        return round(x)

In [57]:
# Applying the function to the Value Column
data['Value'] = data['Value'].apply(clean_value)

In [58]:
# checking if the changes are applied
data['Value'].unique()

array([    80000,    275000,   3600000,    160000,     25000,    400000,
          600000,     35000,    180000,     70000,    550000,    575000,
          230000,   1000000, 103500000,   3500000,   1200000,    850000,
               0,    240000,   1500000,    110000,  21000000,   1600000,
        33500000,    500000,    825000,    375000,   3700000,    475000,
        25000000,    925000,     90000,    800000,    130000,  15500000,
         1300000,    625000,    325000,   2400000,   1400000,   4900000,
         7000000,    700000,   6500000,    425000,   2900000,    750000,
         2600000,    190000,  18500000,    875000,     60000,  32500000,
        65500000,  56000000,   1100000,   2100000,  12500000,    210000,
          675000,  10000000,    150000,   7500000,   4600000,    450000,
         8500000,  83500000,   4100000,  38500000,  22000000,    975000,
          120000,    950000,   2500000,    775000,    900000,  36500000,
         2700000,   5500000,  14500000,   2000000, 

 - ##### We can now rename the Value column so it can represent the currency used

In [59]:
# renaming the column from Value to Value_Euro
data.rename(columns = {'Value': 'Value_in_Euro'}, inplace = True)

### Cleaning Wage Column 

We are going to remove the Euro sign from the entire value in the column and the M as Millions and K as thousand, then wr can convert the column to an int type

Also we are going to rename the column so that it can indicate the type of currency used to estimate the player's value

In [60]:
# Checking the unique values in the wage column 
data['Wage'].unique()

array(['€1K', '€3K', '€40K', '€2K', '€500', '€4K', '€6K', '€5K', '€650',
       '€560K', '€36K', '€7K', '€0', '€8K', '€700', '€95K', '€12K',
       '€19K', '€300K', '€30K', '€170K', '€800', '€9K', '€85K', '€37K',
       '€10K', '€51K', '€850', '€15K', '€38K', '€17K', '€16K', '€220K',
       '€130K', '€240K', '€29K', '€11K', '€24K', '€76K', '€23K', '€25K',
       '€49K', '€350K', '€950', '€41K', '€72K', '€105K', '€60K', '€900',
       '€71K', '€34K', '€28K', '€120K', '€56K', '€21K', '€44K', '€33K',
       '€135K', '€550', '€14K', '€90K', '€600', '€13K', '€42K', '€32K',
       '€59K', '€26K', '€140K', '€150K', '€31K', '€27K', '€750', '€79K',
       '€18K', '€35K', '€69K', '€20K', '€82K', '€64K', '€46K', '€22K',
       '€48K', '€65K', '€160K', '€53K', '€50K', '€39K', '€100K', '€260K',
       '€58K', '€84K', '€125K', '€70K', '€55K', '€115K', '€78K', '€110K',
       '€175K', '€86K', '€210K', '€54K', '€67K', '€230K', '€310K', '€43K',
       '€200K', '€57K', '€73K', '€63K', '€66K', '€68K', '€

We will be removing the Euro sign and changing K to thousanf and converting the column to an integer

We can also use the same function used for the Value column too

In [61]:
# we will apply the clean Value function on the wage column

data['Wage'] = data['Wage'].apply(clean_value)

In [62]:
# Checking if our changes applies
data['Wage'].unique()

array([  1000,   3000,  40000,   2000,    500,   4000,   6000,   5000,
          650, 560000,  36000,   7000,      0,   8000,    700,  95000,
        12000,  19000, 300000,  30000, 170000,    800,   9000,  85000,
        37000,  10000,  51000,    850,  15000,  38000,  17000,  16000,
       220000, 130000, 240000,  29000,  11000,  24000,  76000,  23000,
        25000,  49000, 350000,    950,  41000,  72000, 105000,  60000,
          900,  71000,  34000,  28000, 120000,  56000,  21000,  44000,
        33000, 135000,    550,  14000,  90000,    600,  13000,  42000,
        32000,  59000,  26000, 140000, 150000,  31000,  27000,    750,
        79000,  18000,  35000,  69000,  20000,  82000,  64000,  46000,
        22000,  48000,  65000, 160000,  53000,  50000,  39000, 100000,
       260000,  58000,  84000, 125000,  70000,  55000, 115000,  78000,
       110000, 175000,  86000, 210000,  54000,  67000, 230000, 310000,
        43000, 200000,  57000,  73000,  63000,  66000,  68000,  47000,
      

In [63]:
# renaming the column from Value to Value_Euro
data.rename(columns = {'Wage': 'Wage_in_Euro'}, inplace = True)

In [64]:
# confirming the type of column the Wage is 
print(data['Wage_in_Euro'].info())

<class 'pandas.core.series.Series'>
DatetimeIndex: 18979 entries, 1998-01-01 to 2020-10-08
Series name: Wage_in_Euro
Non-Null Count  Dtype
--------------  -----
18979 non-null  int64
dtypes: int64(1)
memory usage: 296.5 KB
None


In [65]:
data.head(3)

Unnamed: 0_level_0,ID,Name,LongName,photoUrl,playerUrl,Nationality,Age,OVA,POT,Club,Contract_Status,Positions,Height_cm,Weight_kg,Preferred_Foot,BOV,Best_Position,Value_in_Euro,Wage_in_Euro,Release_Clause,Attacking,Crossing,Finishing,Heading_Accuracy,Short_Passing,Volleys,Skill,Dribbling,Curve,FK_Accuracy,Long_Passing,Ball_Control,Movement,Acceleration,Sprint_Speed,Agility,Reactions,Balance,Power,Shot_Power,Jumping,Stamina,Strength,Long_Shots,Mentality,Aggression,Interceptions,Positioning,Vision,Penalties,Composure,Defending,Marking,Standing_Tackle,Sliding_Tackle,Goalkeeping,GK_Diving,GK_Handling,GK_Kicking,GK_Positioning,GK_Reflexes,Total_Stats,Base_Stats,W/F,SM,A/W,D/W,IR,PAC,SHO,PAS,DRI,DEF,PHY,Hits
Joined,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1
1998-01-01,140181,H. Sogahata,Hitoshi Sogahata,https://cdn.sofifa.com/players/140/181/21_60.png,http://sofifa.com/player/140181/hitoshi-sogaha...,Japan,40,65,65,Kashima Antlers,Active,GK,187,80,Right,65,GK,80000,1000,€63K,87,13,12,18,26,18,84,14,10,11,33,16,143,17,19,28,47,32,203,48,52,37,51,15,100,21,12,10,45,12,20,42,17,12,13,331,64,66,64,72,65,990,349,3 ★,1★,Medium,Medium,1 ★,64,66,64,65,18,72,1
2002-01-01,184900,Kim Kwang Suk,Kwang Suk Kim,https://cdn.sofifa.com/players/184/900/21_60.png,http://sofifa.com/player/184900/kwang-suk-kim/...,Korea Republic,37,70,70,Pohang Steelers,Active,CB,183,73,Right,70,CB,275000,3000,€438K,260,58,38,70,55,39,232,51,46,30,49,56,325,69,61,54,73,68,316,48,89,63,75,41,285,71,73,55,43,43,56,205,70,69,66,53,9,8,16,13,7,1676,355,3 ★,2★,Low,High,1 ★,65,42,51,55,70,72,3
2002-01-01,148119,I. Akinfeev,Igor Akinfeev,https://cdn.sofifa.com/players/148/119/21_60.png,http://sofifa.com/player/148119/igor-akinfeev/...,Russia,34,80,80,PFC CSKA Moscow,Active,GK,186,78,Right,80,GK,3600000,40000,€10.4M,83,19,13,18,23,10,91,15,19,13,22,22,270,51,50,53,71,45,241,59,62,35,72,13,121,24,17,11,59,10,67,44,18,13,13,393,77,72,78,82,84,1243,443,3 ★,1★,Medium,Medium,3 ★,77,72,78,84,50,82,18


### Cleaning Release_Clause Column

We are going to remove the Euro sign from the entire value in the column and the M as Millions and K as thousand, then wr can convert the column to an int type

Also we are going to rename the column so that it can indicate the type of currency used to estimate the player's value

In [66]:
# Checking the unique values in the wage column 
data['Release_Clause'].unique()

array(['€63K', '€438K', '€10.4M', ..., '€69.1M', '€61.2M', '€47.7M'],
      dtype=object)

We will be removing the Euro sign and changing K to thousang and converting the column to an integer

We can also use the same function used for the Value column too

In [67]:
# we will apply the clean Value function on the Release_Clause column

data['Release_Clause'] = data['Release_Clause'].apply(clean_value)

In [68]:
data['Release_Clause'].unique()

array([   63000,   438000, 10400000, ..., 69100000, 61200000, 47700000])

In [69]:
# renaming the column from Release_Clause to Release_Clause_in_Euro
data.rename(columns = {'Release_Clause': 'Release_Clause_in_Euro'}, inplace = True)

In [70]:
data.head()

Unnamed: 0_level_0,ID,Name,LongName,photoUrl,playerUrl,Nationality,Age,OVA,POT,Club,Contract_Status,Positions,Height_cm,Weight_kg,Preferred_Foot,BOV,Best_Position,Value_in_Euro,Wage_in_Euro,Release_Clause_in_Euro,Attacking,Crossing,Finishing,Heading_Accuracy,Short_Passing,Volleys,Skill,Dribbling,Curve,FK_Accuracy,Long_Passing,Ball_Control,Movement,Acceleration,Sprint_Speed,Agility,Reactions,Balance,Power,Shot_Power,Jumping,Stamina,Strength,Long_Shots,Mentality,Aggression,Interceptions,Positioning,Vision,Penalties,Composure,Defending,Marking,Standing_Tackle,Sliding_Tackle,Goalkeeping,GK_Diving,GK_Handling,GK_Kicking,GK_Positioning,GK_Reflexes,Total_Stats,Base_Stats,W/F,SM,A/W,D/W,IR,PAC,SHO,PAS,DRI,DEF,PHY,Hits
Joined,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1
1998-01-01,140181,H. Sogahata,Hitoshi Sogahata,https://cdn.sofifa.com/players/140/181/21_60.png,http://sofifa.com/player/140181/hitoshi-sogaha...,Japan,40,65,65,Kashima Antlers,Active,GK,187,80,Right,65,GK,80000,1000,63000,87,13,12,18,26,18,84,14,10,11,33,16,143,17,19,28,47,32,203,48,52,37,51,15,100,21,12,10,45,12,20,42,17,12,13,331,64,66,64,72,65,990,349,3 ★,1★,Medium,Medium,1 ★,64,66,64,65,18,72,1
2002-01-01,184900,Kim Kwang Suk,Kwang Suk Kim,https://cdn.sofifa.com/players/184/900/21_60.png,http://sofifa.com/player/184900/kwang-suk-kim/...,Korea Republic,37,70,70,Pohang Steelers,Active,CB,183,73,Right,70,CB,275000,3000,438000,260,58,38,70,55,39,232,51,46,30,49,56,325,69,61,54,73,68,316,48,89,63,75,41,285,71,73,55,43,43,56,205,70,69,66,53,9,8,16,13,7,1676,355,3 ★,2★,Low,High,1 ★,65,42,51,55,70,72,3
2002-01-01,148119,I. Akinfeev,Igor Akinfeev,https://cdn.sofifa.com/players/148/119/21_60.png,http://sofifa.com/player/148119/igor-akinfeev/...,Russia,34,80,80,PFC CSKA Moscow,Active,GK,186,78,Right,80,GK,3600000,40000,10400000,83,19,13,18,23,10,91,15,19,13,22,22,270,51,50,53,71,45,241,59,62,35,72,13,121,24,17,11,59,10,67,44,18,13,13,393,77,72,78,82,84,1243,443,3 ★,1★,Medium,Medium,3 ★,77,72,78,84,50,82,18
2002-08-01,138830,D. Lewington,Dean Lewington,https://cdn.sofifa.com/players/138/830/21_60.png,http://sofifa.com/player/138830/dean-lewington...,England,36,65,65,Milton Keynes Dons,Active,"LB, LWB",183,86,Left,65,LB,160000,2000,175000,229,66,22,58,65,18,300,54,59,64,62,61,273,48,41,67,62,55,322,51,66,80,78,47,267,68,63,58,56,22,71,193,63,67,63,63,7,12,16,14,14,1647,339,3 ★,2★,Medium,Medium,1 ★,44,34,63,58,64,76,5
2003-01-01,142333,M. McNulty,Mark McNulty,https://cdn.sofifa.com/players/142/333/21_60.png,http://sofifa.com/player/142333/mark-mcnulty/2...,Republic of Ireland,39,60,60,Cork City,Active,GK,190,70,Right,60,GK,25000,500,31000,90,16,12,12,36,14,89,12,19,12,23,23,215,34,34,48,57,42,179,41,43,31,49,15,122,37,16,12,35,22,50,34,10,10,14,292,60,61,54,60,57,1021,326,2 ★,1★,Medium,Medium,1 ★,60,61,54,57,34,60,2


### Cleaning The W/F column

In [71]:
# Previewing the unique values in the W/F column
data['W/F'].unique()

array(['3 ★', '2 ★', '4 ★', '1 ★', '5 ★'], dtype=object)

From observation, the W/F column represent a rating, it represent the situation of the player's weaker foot

It can be considered as categorigal also

But we have to remove the unwanted character from it
we will be removing the star from it '★'

In [72]:
# I prefer to use a function to achieve this since there might be other column that might have similar situation
# also using the function to convert it to an interger

def remove_star(x):
    # replace the star with empty string
    x = x.replace('★', '')
    # return the integer version the column
    return int(x)

In [73]:
# Apply the function to W/F column 
data['W/F'] = data['W/F'].apply(remove_star)


We may want to as well rename the column from W/F_rating for better understanding

In [74]:
data.rename(columns = {'W/F': 'WF_Rate'}, inplace = True)

In [75]:
# Checking if the change is applied
data.head(3)

Unnamed: 0_level_0,ID,Name,LongName,photoUrl,playerUrl,Nationality,Age,OVA,POT,Club,Contract_Status,Positions,Height_cm,Weight_kg,Preferred_Foot,BOV,Best_Position,Value_in_Euro,Wage_in_Euro,Release_Clause_in_Euro,Attacking,Crossing,Finishing,Heading_Accuracy,Short_Passing,Volleys,Skill,Dribbling,Curve,FK_Accuracy,Long_Passing,Ball_Control,Movement,Acceleration,Sprint_Speed,Agility,Reactions,Balance,Power,Shot_Power,Jumping,Stamina,Strength,Long_Shots,Mentality,Aggression,Interceptions,Positioning,Vision,Penalties,Composure,Defending,Marking,Standing_Tackle,Sliding_Tackle,Goalkeeping,GK_Diving,GK_Handling,GK_Kicking,GK_Positioning,GK_Reflexes,Total_Stats,Base_Stats,WF_Rate,SM,A/W,D/W,IR,PAC,SHO,PAS,DRI,DEF,PHY,Hits
Joined,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1
1998-01-01,140181,H. Sogahata,Hitoshi Sogahata,https://cdn.sofifa.com/players/140/181/21_60.png,http://sofifa.com/player/140181/hitoshi-sogaha...,Japan,40,65,65,Kashima Antlers,Active,GK,187,80,Right,65,GK,80000,1000,63000,87,13,12,18,26,18,84,14,10,11,33,16,143,17,19,28,47,32,203,48,52,37,51,15,100,21,12,10,45,12,20,42,17,12,13,331,64,66,64,72,65,990,349,3,1★,Medium,Medium,1 ★,64,66,64,65,18,72,1
2002-01-01,184900,Kim Kwang Suk,Kwang Suk Kim,https://cdn.sofifa.com/players/184/900/21_60.png,http://sofifa.com/player/184900/kwang-suk-kim/...,Korea Republic,37,70,70,Pohang Steelers,Active,CB,183,73,Right,70,CB,275000,3000,438000,260,58,38,70,55,39,232,51,46,30,49,56,325,69,61,54,73,68,316,48,89,63,75,41,285,71,73,55,43,43,56,205,70,69,66,53,9,8,16,13,7,1676,355,3,2★,Low,High,1 ★,65,42,51,55,70,72,3
2002-01-01,148119,I. Akinfeev,Igor Akinfeev,https://cdn.sofifa.com/players/148/119/21_60.png,http://sofifa.com/player/148119/igor-akinfeev/...,Russia,34,80,80,PFC CSKA Moscow,Active,GK,186,78,Right,80,GK,3600000,40000,10400000,83,19,13,18,23,10,91,15,19,13,22,22,270,51,50,53,71,45,241,59,62,35,72,13,121,24,17,11,59,10,67,44,18,13,13,393,77,72,78,82,84,1243,443,3,1★,Medium,Medium,3 ★,77,72,78,84,50,82,18


### Cleaning The SM column

In [76]:
# Previewing the unique values in the SM column
data['SM'].unique()

array(['1★', '2★', '3★', '4★', '5★'], dtype=object)

From observation, the SM column represent a rating, it represent the situation of the player's skill moves ability

It can be considered as categorigal also

But we have to remove the unwanted character from it
we will be removing the star from it '★'



In [77]:
# Apply the function to W/F column 
data['SM'] = data['SM'].apply(remove_star)

We may want to as well rename the column to SM_rate for better understanding

In [78]:
data.rename(columns = {'SM': 'SM_Rate'}, inplace = True)

In [79]:
# Checking if the change is applied
data.head(3)

Unnamed: 0_level_0,ID,Name,LongName,photoUrl,playerUrl,Nationality,Age,OVA,POT,Club,Contract_Status,Positions,Height_cm,Weight_kg,Preferred_Foot,BOV,Best_Position,Value_in_Euro,Wage_in_Euro,Release_Clause_in_Euro,Attacking,Crossing,Finishing,Heading_Accuracy,Short_Passing,Volleys,Skill,Dribbling,Curve,FK_Accuracy,Long_Passing,Ball_Control,Movement,Acceleration,Sprint_Speed,Agility,Reactions,Balance,Power,Shot_Power,Jumping,Stamina,Strength,Long_Shots,Mentality,Aggression,Interceptions,Positioning,Vision,Penalties,Composure,Defending,Marking,Standing_Tackle,Sliding_Tackle,Goalkeeping,GK_Diving,GK_Handling,GK_Kicking,GK_Positioning,GK_Reflexes,Total_Stats,Base_Stats,WF_Rate,SM_Rate,A/W,D/W,IR,PAC,SHO,PAS,DRI,DEF,PHY,Hits
Joined,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1
1998-01-01,140181,H. Sogahata,Hitoshi Sogahata,https://cdn.sofifa.com/players/140/181/21_60.png,http://sofifa.com/player/140181/hitoshi-sogaha...,Japan,40,65,65,Kashima Antlers,Active,GK,187,80,Right,65,GK,80000,1000,63000,87,13,12,18,26,18,84,14,10,11,33,16,143,17,19,28,47,32,203,48,52,37,51,15,100,21,12,10,45,12,20,42,17,12,13,331,64,66,64,72,65,990,349,3,1,Medium,Medium,1 ★,64,66,64,65,18,72,1
2002-01-01,184900,Kim Kwang Suk,Kwang Suk Kim,https://cdn.sofifa.com/players/184/900/21_60.png,http://sofifa.com/player/184900/kwang-suk-kim/...,Korea Republic,37,70,70,Pohang Steelers,Active,CB,183,73,Right,70,CB,275000,3000,438000,260,58,38,70,55,39,232,51,46,30,49,56,325,69,61,54,73,68,316,48,89,63,75,41,285,71,73,55,43,43,56,205,70,69,66,53,9,8,16,13,7,1676,355,3,2,Low,High,1 ★,65,42,51,55,70,72,3
2002-01-01,148119,I. Akinfeev,Igor Akinfeev,https://cdn.sofifa.com/players/148/119/21_60.png,http://sofifa.com/player/148119/igor-akinfeev/...,Russia,34,80,80,PFC CSKA Moscow,Active,GK,186,78,Right,80,GK,3600000,40000,10400000,83,19,13,18,23,10,91,15,19,13,22,22,270,51,50,53,71,45,241,59,62,35,72,13,121,24,17,11,59,10,67,44,18,13,13,393,77,72,78,82,84,1243,443,3,1,Medium,Medium,3 ★,77,72,78,84,50,82,18


In [80]:
# Checking the unique value in the A/W column
data['A/W'].unique()

array(['Medium', 'Low', 'High'], dtype=object)

We can convert this column into a categorical type

Also we can rename it 

In [81]:
# Renaming column from A/W to AW_Rate Attack working Rate

data.rename(columns= {'A/W':'AW_Rate'}, inplace=True)

# Converting the column to a categorical type
data['AW_Rate'] = data['AW_Rate'].astype('category')

In [82]:
# Renaming column from D/W to DW_Rate, defensive work rate

data.rename(columns= {'D/W':'DW_Rate'}, inplace=True)

# Converting the column to a categorical type
data['DW_Rate'] = data['DW_Rate'].astype('category')

In [83]:
data.head(3)

Unnamed: 0_level_0,ID,Name,LongName,photoUrl,playerUrl,Nationality,Age,OVA,POT,Club,Contract_Status,Positions,Height_cm,Weight_kg,Preferred_Foot,BOV,Best_Position,Value_in_Euro,Wage_in_Euro,Release_Clause_in_Euro,Attacking,Crossing,Finishing,Heading_Accuracy,Short_Passing,Volleys,Skill,Dribbling,Curve,FK_Accuracy,Long_Passing,Ball_Control,Movement,Acceleration,Sprint_Speed,Agility,Reactions,Balance,Power,Shot_Power,Jumping,Stamina,Strength,Long_Shots,Mentality,Aggression,Interceptions,Positioning,Vision,Penalties,Composure,Defending,Marking,Standing_Tackle,Sliding_Tackle,Goalkeeping,GK_Diving,GK_Handling,GK_Kicking,GK_Positioning,GK_Reflexes,Total_Stats,Base_Stats,WF_Rate,SM_Rate,AW_Rate,DW_Rate,IR,PAC,SHO,PAS,DRI,DEF,PHY,Hits
Joined,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1
1998-01-01,140181,H. Sogahata,Hitoshi Sogahata,https://cdn.sofifa.com/players/140/181/21_60.png,http://sofifa.com/player/140181/hitoshi-sogaha...,Japan,40,65,65,Kashima Antlers,Active,GK,187,80,Right,65,GK,80000,1000,63000,87,13,12,18,26,18,84,14,10,11,33,16,143,17,19,28,47,32,203,48,52,37,51,15,100,21,12,10,45,12,20,42,17,12,13,331,64,66,64,72,65,990,349,3,1,Medium,Medium,1 ★,64,66,64,65,18,72,1
2002-01-01,184900,Kim Kwang Suk,Kwang Suk Kim,https://cdn.sofifa.com/players/184/900/21_60.png,http://sofifa.com/player/184900/kwang-suk-kim/...,Korea Republic,37,70,70,Pohang Steelers,Active,CB,183,73,Right,70,CB,275000,3000,438000,260,58,38,70,55,39,232,51,46,30,49,56,325,69,61,54,73,68,316,48,89,63,75,41,285,71,73,55,43,43,56,205,70,69,66,53,9,8,16,13,7,1676,355,3,2,Low,High,1 ★,65,42,51,55,70,72,3
2002-01-01,148119,I. Akinfeev,Igor Akinfeev,https://cdn.sofifa.com/players/148/119/21_60.png,http://sofifa.com/player/148119/igor-akinfeev/...,Russia,34,80,80,PFC CSKA Moscow,Active,GK,186,78,Right,80,GK,3600000,40000,10400000,83,19,13,18,23,10,91,15,19,13,22,22,270,51,50,53,71,45,241,59,62,35,72,13,121,24,17,11,59,10,67,44,18,13,13,393,77,72,78,82,84,1243,443,3,1,Medium,Medium,3 ★,77,72,78,84,50,82,18


### Cleaning the IR column

In [84]:
# Checking the unique value in the IR column
data['IR'].unique()

array(['1 ★', '3 ★', '2 ★', '5 ★', '4 ★'], dtype=object)

From observation, the IR column represent a rating, it represent the player's injury resistance ability

It can be considered as categorigal also

But we have to remove the unwanted character from it
we will be removing the star from it '★'


In [85]:
# Apply the function to IR column 
data['IR'] = data['IR'].apply(remove_star)

We may want to as well rename the column to IR_rate for better understanding

In [86]:
# Renaming column from A/W to AW_Rate Attack working Rate

data.rename(columns= {'IR':'IR_Rate'}, inplace=True)

In [87]:
# CHecking if the change is applied
data.head(3)

Unnamed: 0_level_0,ID,Name,LongName,photoUrl,playerUrl,Nationality,Age,OVA,POT,Club,Contract_Status,Positions,Height_cm,Weight_kg,Preferred_Foot,BOV,Best_Position,Value_in_Euro,Wage_in_Euro,Release_Clause_in_Euro,Attacking,Crossing,Finishing,Heading_Accuracy,Short_Passing,Volleys,Skill,Dribbling,Curve,FK_Accuracy,Long_Passing,Ball_Control,Movement,Acceleration,Sprint_Speed,Agility,Reactions,Balance,Power,Shot_Power,Jumping,Stamina,Strength,Long_Shots,Mentality,Aggression,Interceptions,Positioning,Vision,Penalties,Composure,Defending,Marking,Standing_Tackle,Sliding_Tackle,Goalkeeping,GK_Diving,GK_Handling,GK_Kicking,GK_Positioning,GK_Reflexes,Total_Stats,Base_Stats,WF_Rate,SM_Rate,AW_Rate,DW_Rate,IR_Rate,PAC,SHO,PAS,DRI,DEF,PHY,Hits
Joined,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1
1998-01-01,140181,H. Sogahata,Hitoshi Sogahata,https://cdn.sofifa.com/players/140/181/21_60.png,http://sofifa.com/player/140181/hitoshi-sogaha...,Japan,40,65,65,Kashima Antlers,Active,GK,187,80,Right,65,GK,80000,1000,63000,87,13,12,18,26,18,84,14,10,11,33,16,143,17,19,28,47,32,203,48,52,37,51,15,100,21,12,10,45,12,20,42,17,12,13,331,64,66,64,72,65,990,349,3,1,Medium,Medium,1,64,66,64,65,18,72,1
2002-01-01,184900,Kim Kwang Suk,Kwang Suk Kim,https://cdn.sofifa.com/players/184/900/21_60.png,http://sofifa.com/player/184900/kwang-suk-kim/...,Korea Republic,37,70,70,Pohang Steelers,Active,CB,183,73,Right,70,CB,275000,3000,438000,260,58,38,70,55,39,232,51,46,30,49,56,325,69,61,54,73,68,316,48,89,63,75,41,285,71,73,55,43,43,56,205,70,69,66,53,9,8,16,13,7,1676,355,3,2,Low,High,1,65,42,51,55,70,72,3
2002-01-01,148119,I. Akinfeev,Igor Akinfeev,https://cdn.sofifa.com/players/148/119/21_60.png,http://sofifa.com/player/148119/igor-akinfeev/...,Russia,34,80,80,PFC CSKA Moscow,Active,GK,186,78,Right,80,GK,3600000,40000,10400000,83,19,13,18,23,10,91,15,19,13,22,22,270,51,50,53,71,45,241,59,62,35,72,13,121,24,17,11,59,10,67,44,18,13,13,393,77,72,78,82,84,1243,443,3,1,Medium,Medium,3,77,72,78,84,50,82,18


### Cleaning the Hits column

In [88]:
# Checking the unique values in the Hits column
data['Hits'].unique()

array(['1', '3', '18', '5', '2', '10', '14', '771', '35', '7', '125',
       '130', '6', '212', '4', '133', '19', '71', '11', '22', '107', '15',
       '32', '30', '9', '175', '126', '119', '42', '12', '16', '31', '20',
       '33', '216', '64', '43', nan, '28', '8', '256', '41', '75', '94',
       '332', '116', '17', '47', '138', '24', '90', '40', '13', '123',
       '23', '134', '46', '25', '194', '313', '154', '217', '29', '58',
       '52', '167', '45', '118', '36', '96', '87', '26', '65', '62', '21',
       '421', '182', '84', '86', '53', '99', '197', '145', '165', '156',
       '109', '196', '34', '27', '170', '39', '124', '141', '8.4K', '137',
       '214', '200', '89', '121', '142', '565', '169', '128', '69', '68',
       '77', '211', '85', '229', '282', '406', '38', '177', '193', '76',
       '367', '324', '50', '55', '402', '173', '220', '222', '248',
       '3.2K', '37', '88', '48', '150', '161', '102', '114', '103', '67',
       '388', '377', '239', '279', '61', '262', '93'

In [89]:
# Checking the number of null values available in the column
data['Hits'].isnull().sum()

2595

 - #### Observation 

 - From Observation, It was observed that the Hits column contains NaN value
 - The column contain an alphabetical value 'K' which represent thousand
 #### How to resolve it
  - Apply the function that will remove the 'K' and convert it by multipling with 1000,
  - also converting the values into integers
  - Filling NaN value with Zero

In [90]:
# Filling null values with "0" so that it wont affect our function

data['Hits'] = data['Hits'].fillna('0')

In [91]:
#confirming if there is still any null value left
data['Hits'].isnull().sum()

0

Now that we are sure we dont have any null values, we can now define a function
A function that will replace 'K' with an empty string
convert the values into a float and then multiply the values with K by 1000...

In [92]:
# definind a function that will remove 'K' and convert it to thousand in integers
def clean_hits(x):
    # a condition that if there is 'K' in the column
    if 'K' in x:
        # replace the 'K' with an empty string
        x = x.replace('K', '')
        # convert the column to a float and multiply it by 1000
        x = float(x) * 1000
        # return the round off value of the column
        return round(x)
    else:
        # if there is no 'K' in the column
        # convert the column into a float
        x = float(x)
        # return a round off value for the entire column
        return round(x)

We can now apply our function to the column

In [93]:
# Appling the function
data['Hits'] = data['Hits'].apply(clean_hits)

In [94]:
# Confirming if the change is applied

# This below code return the properties of the player which their Hits column value is greater 1000 hits
data[data['Hits'] > 1000]

Unnamed: 0_level_0,ID,Name,LongName,photoUrl,playerUrl,Nationality,Age,OVA,POT,Club,Contract_Status,Positions,Height_cm,Weight_kg,Preferred_Foot,BOV,Best_Position,Value_in_Euro,Wage_in_Euro,Release_Clause_in_Euro,Attacking,Crossing,Finishing,Heading_Accuracy,Short_Passing,Volleys,Skill,Dribbling,Curve,FK_Accuracy,Long_Passing,Ball_Control,Movement,Acceleration,Sprint_Speed,Agility,Reactions,Balance,Power,Shot_Power,Jumping,Stamina,Strength,Long_Shots,Mentality,Aggression,Interceptions,Positioning,Vision,Penalties,Composure,Defending,Marking,Standing_Tackle,Sliding_Tackle,Goalkeeping,GK_Diving,GK_Handling,GK_Kicking,GK_Positioning,GK_Reflexes,Total_Stats,Base_Stats,WF_Rate,SM_Rate,AW_Rate,DW_Rate,IR_Rate,PAC,SHO,PAS,DRI,DEF,PHY,Hits
Joined,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1
2013-07-01,231485,A. Tuanzebe,Axel Tuanzebe,https://cdn.sofifa.com/players/231/485/21_60.png,http://sofifa.com/player/231485/axel-tuanzebe/...,England,22,73,81,Manchester United,Active,CB,185,75,Right,75,CB,6500000,55000,11600000,265,52,26,71,75,41,292,68,45,37,70,72,365,75,77,73,73,67,302,53,73,67,75,34,308,73,71,50,64,50,77,222,75,74,73,43,5,8,5,13,12,1797,392,3,3,Medium,Medium,1,76,36,64,70,73,73,8400
2014-07-01,231677,M. Rashford,Marcus Rashford,https://cdn.sofifa.com/players/231/677/21_60.png,http://sofifa.com/player/231677/marcus-rashfor...,England,22,85,91,Manchester United,Active,"LM, ST",186,70,Right,86,RM,86500000,150000,111300000,382,77,83,68,81,73,399,87,82,76,69,85,429,89,93,86,86,75,406,90,72,87,76,81,360,73,42,82,82,81,82,120,47,40,33,53,11,6,15,7,14,2149,461,4,5,High,Medium,2,91,83,78,86,45,78,3200
2016-10-24,238071,D. Sterling,Dujon Sterling,https://cdn.sofifa.com/players/238/071/21_60.png,http://sofifa.com/player/238071/dujon-sterling...,England,20,67,80,Chelsea,Active,RB,180,72,Right,67,RB,2300000,16000,3300000,258,64,50,57,58,29,256,67,48,34,46,61,344,80,76,69,61,58,296,53,64,64,73,42,284,64,66,52,59,43,54,194,64,64,66,39,11,8,8,7,5,1671,380,3,3,Medium,Medium,1,78,48,56,65,64,69,2500
2017-01-13,229558,D. Upamecano,Dayot Upamecano,https://cdn.sofifa.com/players/229/558/21_60.png,http://sofifa.com/player/229558/dayot-upamecan...,France,21,79,90,RB Leipzig,Active,CB,186,90,Right,81,CB,37000000,36000,38000000,264,48,39,81,75,21,266,68,32,28,73,65,350,69,84,62,70,65,327,54,88,61,90,34,300,81,77,45,58,39,69,236,75,84,77,45,6,7,8,15,9,1788,404,3,2,Medium,Medium,1,77,40,61,66,79,81,1100
2017-08-31,233049,J. Sancho,Jadon Sancho,https://cdn.sofifa.com/players/233/049/21_60.png,http://sofifa.com/player/233049/jadon-sancho/2...,England,20,87,93,Borussia Dortmund,Active,"RM, LM, CAM",180,76,Right,89,CAM,124000000,82000,132100000,373,83,81,38,88,83,380,92,81,48,68,91,435,86,81,91,87,90,328,70,51,77,67,63,313,44,39,83,87,60,84,105,32,41,32,52,7,11,10,11,13,1986,430,3,5,High,Medium,3,83,74,81,91,37,64,1100
2018-02-02,255611,D. Maldini,Daniel Maldini,https://cdn.sofifa.com/players/255/611/21_60.png,http://sofifa.com/player/255611/daniel-maldini...,Italy,18,59,79,Milan,Active,CAM,186,76,Right,61,CAM,625000,2000,878000,262,52,48,48,66,48,300,60,60,60,58,62,323,67,69,65,56,66,264,60,54,47,53,50,242,46,34,54,56,52,55,114,30,44,40,65,12,12,14,13,14,1570,327,4,3,Medium,Medium,1,68,51,59,61,38,50,6000
2018-07-01,231747,K. Mbappé,Kylian Mbappé,https://cdn.sofifa.com/players/231/747/21_60.png,http://sofifa.com/player/231747/kylian-mbappe/...,France,21,90,95,Paris Saint-Germain,Active,"ST, LW, RW",178,73,Right,91,ST,185500000,160000,203100000,408,78,91,73,83,83,394,92,79,63,70,90,458,96,96,92,92,82,404,86,77,86,76,79,341,62,38,91,80,70,84,100,34,34,32,42,13,5,7,11,6,2147,466,4,5,High,Low,3,96,86,78,91,39,76,1600
2018-07-01,251852,K. Adeyemi,Karim Adeyemi,https://cdn.sofifa.com/players/251/852/21_60.png,http://sofifa.com/player/251852/karim-adeyemi/...,Germany,18,69,87,FC Red Bull Salzburg,Active,"ST, LW",177,68,Left,72,LW,3600000,6000,4700000,325,61,69,62,67,66,324,75,66,59,51,73,403,89,88,87,63,76,333,67,83,58,56,69,248,36,17,63,66,66,70,62,18,19,25,50,12,12,7,12,7,1745,371,3,4,Medium,Medium,1,88,68,63,75,23,54,1100
2018-07-01,253004,Ansu Fati,Anssumane Fati,https://cdn.sofifa.com/players/253/004/21_60.png,http://sofifa.com/player/253004/anssumane-fati...,Spain,17,76,90,FC Barcelona,Active,LW,178,66,Right,76,LW,17000000,24000,40100000,333,69,75,58,72,59,334,79,64,45,69,77,417,89,87,89,70,82,322,67,73,64,48,70,276,48,19,68,67,74,73,83,23,32,28,40,6,9,8,10,7,1805,388,4,4,High,Medium,1,88,71,68,79,29,53,2800
2018-07-01,251470,C. De Ketelaere,Charles De Ketelaere,https://cdn.sofifa.com/players/251/470/21_60.png,http://sofifa.com/player/251470/charles-de-ket...,Belgium,19,70,82,Club Brugge KV,Active,"CF, LW, CAM",192,74,Left,73,LM,3800000,9000,5000000,338,73,63,68,75,59,357,74,71,68,68,76,338,68,77,70,68,55,327,64,61,73,68,61,280,47,55,64,68,46,72,160,55,57,48,49,11,8,14,8,8,1849,401,4,3,Medium,Medium,1,73,62,72,73,56,65,2300


# EDA

## Visualization

In [104]:
pd.set_option('display.max_rows', None)
data.corr()

Unnamed: 0,ID,Age,OVA,POT,Height_cm,Weight_kg,BOV,Value_in_Euro,Wage_in_Euro,Release_Clause_in_Euro,Attacking,Crossing,Finishing,Heading_Accuracy,Short_Passing,Volleys,Skill,Dribbling,Curve,FK_Accuracy,Long_Passing,Ball_Control,Movement,Acceleration,Sprint_Speed,Agility,Reactions,Balance,Power,Shot_Power,Jumping,Stamina,Strength,Long_Shots,Mentality,Aggression,Interceptions,Positioning,Vision,Penalties,Composure,Defending,Marking,Standing_Tackle,Sliding_Tackle,Goalkeeping,GK_Diving,GK_Handling,GK_Kicking,GK_Positioning,GK_Reflexes,Total_Stats,Base_Stats,WF_Rate,SM_Rate,IR_Rate,PAC,SHO,PAS,DRI,DEF,PHY,Hits
ID,1.0,-0.753413,-0.486968,0.023736,-0.108373,-0.20982,-0.443686,-0.131001,-0.256117,-0.16186,-0.180955,-0.156939,-0.117892,-0.140575,-0.173973,-0.189012,-0.184415,-0.07919,-0.196558,-0.222498,-0.22078,-0.136157,-0.038604,0.093328,0.091788,-0.059803,-0.472715,0.019268,-0.310576,-0.318014,-0.206838,-0.101355,-0.297844,-0.194421,-0.252161,-0.240876,-0.170786,-0.127357,-0.25434,-0.170147,-0.41284,-0.109787,-0.156884,-0.09307,-0.075177,-0.115777,-0.111569,-0.113757,-0.112879,-0.123415,-0.110397,-0.291523,-0.434793,-0.106433,-0.123692,-0.3803,0.079109,-0.308248,-0.399298,-0.291203,-0.216042,-0.399517,0.002505
Age,-0.753413,1.0,0.46614,-0.269473,0.090232,0.241969,0.401796,0.040994,0.157751,0.074079,0.146765,0.124639,0.081765,0.146402,0.14281,0.138232,0.132555,0.024563,0.139144,0.181559,0.183336,0.091793,-0.0221,-0.142046,-0.135785,-0.012064,0.454699,-0.076881,0.299444,0.266381,0.204923,0.11809,0.348577,0.155406,0.226098,0.244119,0.180499,0.095557,0.196022,0.136748,0.357505,0.118043,0.162004,0.101411,0.085888,0.121684,0.117085,0.117987,0.119175,0.129497,0.117392,0.249822,0.390236,0.071559,0.060805,0.27472,-0.129725,0.265495,0.350054,0.214831,0.223683,0.445806,-0.079903
OVA,-0.486968,0.46614,1.0,0.632166,0.033368,0.147973,0.987149,0.552893,0.589601,0.599142,0.446337,0.4066,0.323638,0.325182,0.500041,0.37252,0.458098,0.376388,0.417868,0.382736,0.484577,0.447363,0.354798,0.203445,0.212523,0.276601,0.866954,0.12679,0.57278,0.557946,0.283194,0.379696,0.356192,0.405656,0.499758,0.397913,0.315541,0.352186,0.507402,0.327826,0.703051,0.272015,0.317043,0.256152,0.229146,0.006304,0.005138,0.004731,0.001663,0.013473,0.006018,0.620825,0.845894,0.222609,0.381024,0.442388,0.266494,0.479176,0.698816,0.654745,0.363017,0.572159,0.224161
POT,0.023736,-0.269473,0.632166,1.0,-0.009826,-0.024619,0.669677,0.5282,0.485241,0.548897,0.284542,0.255631,0.216708,0.179505,0.351508,0.229958,0.314697,0.307785,0.276144,0.213187,0.314743,0.331928,0.311086,0.241466,0.242221,0.235622,0.509078,0.155335,0.286291,0.334879,0.111208,0.207132,0.064985,0.241704,0.288993,0.187498,0.158974,0.228352,0.338202,0.197405,0.423777,0.162243,0.173577,0.158544,0.145731,-0.039022,-0.038078,-0.038337,-0.042994,-0.036007,-0.037498,0.382014,0.520473,0.163596,0.298001,0.306951,0.305882,0.275101,0.428741,0.479501,0.188153,0.205002,0.321966
Height_cm,-0.108373,0.090232,0.033368,-0.009826,1.0,0.772076,0.022455,0.004179,0.023596,0.003975,-0.364712,-0.488562,-0.372443,0.01302,-0.352683,-0.343208,-0.452129,-0.4805,-0.43853,-0.40178,-0.318005,-0.41122,-0.618818,-0.543916,-0.461061,-0.614316,-0.000388,-0.765199,-0.143947,-0.15688,-0.001952,-0.284809,0.530854,-0.379277,-0.316762,-0.044524,-0.058434,-0.438813,-0.361554,-0.319803,-0.156926,-0.075788,-0.071567,-0.072905,-0.078414,0.36862,0.364673,0.36427,0.361142,0.364441,0.365756,-0.364731,-0.103992,-0.166367,-0.417859,0.038795,-0.376886,-0.073124,-0.178023,-0.277766,0.080168,0.419781,0.001072
Weight_kg,-0.20982,0.241969,0.147973,-0.024619,0.772076,1.0,0.128572,0.034038,0.064426,0.039528,-0.275435,-0.397172,-0.285255,0.045014,-0.273213,-0.254636,-0.359982,-0.399634,-0.346926,-0.309091,-0.244799,-0.32946,-0.521003,-0.482639,-0.407377,-0.531029,0.106702,-0.65282,-0.017577,-0.044117,0.068452,-0.218596,0.613206,-0.275575,-0.223308,0.026364,-0.036676,-0.347259,-0.272038,-0.232592,-0.058945,-0.061083,-0.047097,-0.06171,-0.070374,0.342958,0.337883,0.33904,0.335349,0.341689,0.33966,-0.251634,0.015089,-0.125822,-0.344596,0.090828,-0.326396,0.019785,-0.080484,-0.178149,0.091895,0.509861,-0.015411
BOV,-0.443686,0.401796,0.987149,0.669677,0.022455,0.128572,1.0,0.563253,0.592303,0.608117,0.487301,0.423047,0.363071,0.364678,0.54618,0.4077,0.497027,0.418627,0.444667,0.408111,0.522753,0.493528,0.390551,0.241146,0.251391,0.306608,0.868576,0.156286,0.603853,0.578588,0.289512,0.413252,0.357247,0.441063,0.535622,0.421078,0.332573,0.38661,0.533718,0.362157,0.72589,0.294215,0.338623,0.279104,0.24989,-0.047807,-0.048567,-0.048833,-0.051871,-0.039552,-0.047385,0.653336,0.841199,0.236708,0.415148,0.43341,0.274963,0.470838,0.687184,0.648649,0.367787,0.565163,0.237636
Value_in_Euro,-0.131001,0.040994,0.552893,0.5282,0.004179,0.034038,0.563253,1.0,0.81487,0.96644,0.259654,0.226968,0.211382,0.155289,0.289522,0.235258,0.27007,0.237724,0.253495,0.213673,0.270468,0.264349,0.243221,0.162393,0.167224,0.178332,0.489513,0.119546,0.286883,0.302929,0.119615,0.199411,0.120346,0.230634,0.269909,0.171277,0.134581,0.222466,0.321933,0.190222,0.388461,0.118303,0.138977,0.113409,0.096561,-0.011616,-0.012163,-0.01143,-0.012592,-0.011071,-0.010186,0.342279,0.462458,0.144755,0.260043,0.533569,0.212277,0.284386,0.394543,0.393347,0.153987,0.232937,0.376162
Wage_in_Euro,-0.256117,0.157751,0.589601,0.485241,0.023596,0.064426,0.592303,0.81487,1.0,0.823969,0.290757,0.252138,0.224584,0.199727,0.31658,0.259902,0.298696,0.252362,0.279906,0.247819,0.304007,0.288414,0.230242,0.138154,0.141812,0.169419,0.52508,0.106395,0.323084,0.342198,0.146255,0.202361,0.154353,0.257941,0.311487,0.216928,0.167959,0.2404,0.341914,0.227911,0.433297,0.147061,0.170266,0.140845,0.122572,-0.020743,-0.020605,-0.020749,-0.021334,-0.019237,-0.020539,0.376914,0.486999,0.151791,0.268597,0.608113,0.166306,0.300732,0.422128,0.401712,0.188922,0.267258,0.300267
Release_Clause_in_Euro,-0.16186,0.074079,0.599142,0.548897,0.003975,0.039528,0.608117,0.96644,0.823969,1.0,0.278302,0.242838,0.225241,0.169578,0.309703,0.251491,0.289642,0.251021,0.271887,0.232967,0.293027,0.281264,0.253365,0.164813,0.169275,0.186003,0.531362,0.120197,0.311435,0.329562,0.133992,0.211791,0.13323,0.249505,0.291749,0.189175,0.146338,0.235614,0.345304,0.208417,0.42002,0.12745,0.150252,0.12177,0.103935,-0.007261,-0.00799,-0.007041,-0.009004,-0.005768,-0.006137,0.368501,0.498764,0.153222,0.271855,0.5391,0.2167,0.309493,0.427631,0.421543,0.168644,0.256989,0.365468


In [None]:
sns.boxplot(x=data['Age'])
plt.show()

In [99]:
data.Skill.unique()

array([ 84, 232,  91, 300,  89, 231, 386, 206, 101, 301, 104, 326, 263,
       256, 165, 470, 367, 246, 258, 233, 274,  92, 244, 314,  88, 290,
       276,  96, 323, 381, 297,  87, 309,  67, 281, 207, 313, 311, 107,
       271, 266,  77, 304, 408, 118, 353, 152, 273, 310, 336, 347, 260,
       344, 324, 322, 288, 319, 270, 298, 312, 368, 242, 295,  72, 254,
       221, 331, 350, 245, 340, 334, 346, 316, 151, 110, 197, 343, 205,
       341, 373,  65,  93, 325, 375, 249, 317, 415, 333, 291, 365,  83,
       315, 262, 354,  90,  76, 349, 264, 337, 283, 332, 302, 269, 366,
       404, 307, 394, 181, 117, 320, 359, 257, 237, 296, 306,  80, 376,
       321, 195, 299, 255, 236, 401, 338, 235, 213, 318, 358, 363, 103,
       228, 282,  63, 268, 124, 395, 141, 425, 303,  78, 292, 360, 225,
       285, 345, 293, 294,  94, 252, 335, 327, 409, 208, 279, 277, 289,
       189, 222, 188, 112, 265, 387, 419, 126, 377, 403, 371, 143,  75,
       220, 243, 364,  98, 355, 229, 223, 384, 261,  82, 250, 16

In [103]:
data.head()

Unnamed: 0_level_0,ID,Name,LongName,photoUrl,playerUrl,Nationality,Age,OVA,POT,Club,Contract_Status,Positions,Height_cm,Weight_kg,Preferred_Foot,BOV,Best_Position,Value_in_Euro,Wage_in_Euro,Release_Clause_in_Euro,Attacking,Crossing,Finishing,Heading_Accuracy,Short_Passing,Volleys,Skill,Dribbling,Curve,FK_Accuracy,Long_Passing,Ball_Control,Movement,Acceleration,Sprint_Speed,Agility,Reactions,Balance,Power,Shot_Power,Jumping,Stamina,Strength,Long_Shots,Mentality,Aggression,Interceptions,Positioning,Vision,Penalties,Composure,Defending,Marking,Standing_Tackle,Sliding_Tackle,Goalkeeping,GK_Diving,GK_Handling,GK_Kicking,GK_Positioning,GK_Reflexes,Total_Stats,Base_Stats,WF_Rate,SM_Rate,AW_Rate,DW_Rate,IR_Rate,PAC,SHO,PAS,DRI,DEF,PHY,Hits
Joined,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1
1998-01-01,140181,H. Sogahata,Hitoshi Sogahata,https://cdn.sofifa.com/players/140/181/21_60.png,http://sofifa.com/player/140181/hitoshi-sogaha...,Japan,40,65,65,Kashima Antlers,Active,GK,187,80,Right,65,GK,80000,1000,63000,87,13,12,18,26,18,84,14,10,11,33,16,143,17,19,28,47,32,203,48,52,37,51,15,100,21,12,10,45,12,20,42,17,12,13,331,64,66,64,72,65,990,349,3,1,Medium,Medium,1,64,66,64,65,18,72,1
2002-01-01,184900,Kim Kwang Suk,Kwang Suk Kim,https://cdn.sofifa.com/players/184/900/21_60.png,http://sofifa.com/player/184900/kwang-suk-kim/...,Korea Republic,37,70,70,Pohang Steelers,Active,CB,183,73,Right,70,CB,275000,3000,438000,260,58,38,70,55,39,232,51,46,30,49,56,325,69,61,54,73,68,316,48,89,63,75,41,285,71,73,55,43,43,56,205,70,69,66,53,9,8,16,13,7,1676,355,3,2,Low,High,1,65,42,51,55,70,72,3
2002-01-01,148119,I. Akinfeev,Igor Akinfeev,https://cdn.sofifa.com/players/148/119/21_60.png,http://sofifa.com/player/148119/igor-akinfeev/...,Russia,34,80,80,PFC CSKA Moscow,Active,GK,186,78,Right,80,GK,3600000,40000,10400000,83,19,13,18,23,10,91,15,19,13,22,22,270,51,50,53,71,45,241,59,62,35,72,13,121,24,17,11,59,10,67,44,18,13,13,393,77,72,78,82,84,1243,443,3,1,Medium,Medium,3,77,72,78,84,50,82,18
2002-08-01,138830,D. Lewington,Dean Lewington,https://cdn.sofifa.com/players/138/830/21_60.png,http://sofifa.com/player/138830/dean-lewington...,England,36,65,65,Milton Keynes Dons,Active,"LB, LWB",183,86,Left,65,LB,160000,2000,175000,229,66,22,58,65,18,300,54,59,64,62,61,273,48,41,67,62,55,322,51,66,80,78,47,267,68,63,58,56,22,71,193,63,67,63,63,7,12,16,14,14,1647,339,3,2,Medium,Medium,1,44,34,63,58,64,76,5
2003-01-01,142333,M. McNulty,Mark McNulty,https://cdn.sofifa.com/players/142/333/21_60.png,http://sofifa.com/player/142333/mark-mcnulty/2...,Republic of Ireland,39,60,60,Cork City,Active,GK,190,70,Right,60,GK,25000,500,31000,90,16,12,12,36,14,89,12,19,12,23,23,215,34,34,48,57,42,179,41,43,31,49,15,122,37,16,12,35,22,50,34,10,10,14,292,60,61,54,60,57,1021,326,2,1,Medium,Medium,1,60,61,54,57,34,60,2


In [None]:
data['Age'].mean()

In [None]:
data[data['Age']>50]

In [None]:
sns.pairplot(data[[]])