## #DataCleaningchallenge 

by Omojughare Erowo-Oghene

* [LinkedIn](https://www.linkedin.com/in/david-omojughare/)
* [Twitter](https://www.twitter.com/leoknaw_/)

## Introduction

#DataCleaningchallenge proposed by Promise Nonso provides an opportunity for every Data Analyst at all levels of expertise to build a portfolio-worthy project that can be shared with recruiters.

The challenge also provides an avenue for Data Analysts to meet with fellow learners and build a great network.

In [201]:
# Importing relevant tools and initializing datasets
import warnings
import pandas as pd
warnings.filterwarnings('ignore')
fifa = pd.read_csv("fifa21 raw data v2.csv")

In [202]:
pd.set_option("display.max_columns", 79)

# Variable description
Here is a brief documentation for each column name in the given dataset:

* photoUrl: The URL of the player's photo.
* LongName: The full name of the player.
* playerUrl: The URL of the player's page on sofifa.com.
* Nationality: The nationality of the player.
* Positions: The positions the player can play.
* Name: The short name of the player.
* Age: The age of the player.
* OVA: The overall rating of the player in FIFA 21.
* POT: The potential rating of the player in FIFA 21.
* Team & Contract: The team the player is playing for in FIFA 21, along with their contract details.
* ID: The unique identifier for the player.
* Height: The height of the player in feet and inches.
* Weight: The weight of the player in pounds.
* foot: The preferred foot of the player.
* BOV: The best overall rating the player has achieved in their career.
* BP: The best position the player has played in their career.
* Growth: The difference between the potential rating and overall rating of the player.
* Joined: The date the player joined their current team in FIFA 21.
* Loan Date End: The date the player's loan contract ends.
* Value: The market value of the player in FIFA 21.
* Wage: The weekly wage of the player in FIFA 21.
* Release Clause: The release clause value of the player in FIFA 21.
* Attacking: The attacking attributes of the player.
* Crossing: The crossing attribute of the player.
* Finishing: The finishing attribute of the player.
* Heading Accuracy: The heading accuracy attribute of the player.
* Short Passing: The short passing attribute of the player.
* Volleys: The volleys attribute of the player.
* Skill: The skill attributes of the player.
* Dribbling: The dribbling attribute of the player.
* Curve: The curve attribute of the player.
* FK Accuracy: The free kick accuracy attribute of the player.
* Long Passing: The long passing attribute of the player.
* Ball Control: The ball control attribute of the player.
* Movement: The movement attributes of the player.
* Acceleration: The acceleration attribute of the player.
* Sprint Speed: The sprint speed attribute of the player.
* Agility: The agility attribute of the player.
* Reactions: The reactions attribute of the player.
* Balance: The balance attribute of the player.
* Power: The power attributes of the player.
* Shot Power: The shot power attribute of the player.
* Jumping: The jumping attribute of the player.
* Stamina: The stamina attribute of the player.
* Strength: The strength attribute of the player.
* Long Shots: The long shots attribute of the player.
* Mentality: The mentality attributes of the player.
* Aggression: The aggression attribute of the player.
* Interceptions: The interceptions attribute of the player.
* Positioning: The positioning attribute of the player.
* Vision: The vision attribute of the player.
* Penalties: The penalties attribute of the player.
* Composure: The composure attribute of the player.
* Defending: The defending attributes of the player.
* Marking: The marking attribute of the player.
* Standing Tackle: The standing tackle attribute of the player.
* Sliding Tackle: The sliding tackle attribute of the player.
* Goalkeeping: The goalkeeping attributes of the player.
* GK Diving: The goalkeeper diving attribute of the player.
* GK Handling: The goalkeeper handling attribute of the player.
* GK Kicking: The goalkeeper kicking attribute of the player.
* GK Positioning: The goalkeeper positioning attribute of the player.
* GK Reflexes: This refers to the goalkeeper's ability to react and make saves quickly.
* Total Stats: This refers to the overall rating of the player based on their performance in all areas of the game.
* Base Stats: This refers to the player's rating in the six main areas of the game: Pace, Shooting, Passing, Dribbling, Defending, and Physicality.
* W/F: This refers to the player's weaker foot ability.
* SM: This refers to the player's skill moves ability. 
* A/W: This refers to the player's attacking work rate. It measures how frequently the player participates in attacking actions, such as making runs or positioning themselves in the opponent's half.
* D/W: This refers to the player's defensive work rate. It measures how frequently the player participates in defensive actions, such as tracking back or making tackles.
* IR: This refers to the player's injury resistance. It measures the player's ability to avoid injuries and how quickly they recover from them.
* PAC: This refers to the player's pace or speed attribute. It measures how quickly the player can move with and without the ball.
* SHO: This refers to the player's shooting ability. It measures the player's accuracy and power when shooting the ball.
* PAS: This refers to the player's passing ability. It measures the player's accuracy and range when passing the ball.
* DRI: This refers to the player's dribbling ability. It measures the player's agility, balance, and ball control when dribbling the ball.
* DEF: This refers to the player's defensive ability. It measures the player's ability to tackle, intercept, and defend against opposing players. 
* PHY: This refers to the player's physicality or strength. It measures the player's ability to win physical battles and maintain possession of the ball. 
* Hits: This refers to the number of times the player's profile has been viewed on the website.


## Getting a glimpse of the data

In [203]:
fifa.shape # 18979 rows and 77 columns

(18979, 77)

In [204]:
fifa.info() 

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18979 entries, 0 to 18978
Data columns (total 77 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   ID                18979 non-null  int64 
 1   Name              18979 non-null  object
 2   LongName          18979 non-null  object
 3   photoUrl          18979 non-null  object
 4   playerUrl         18979 non-null  object
 5   Nationality       18979 non-null  object
 6   Age               18979 non-null  int64 
 7   ↓OVA              18979 non-null  int64 
 8   POT               18979 non-null  int64 
 9   Club              18979 non-null  object
 10  Contract          18979 non-null  object
 11  Positions         18979 non-null  object
 12  Height            18979 non-null  object
 13  Weight            18979 non-null  object
 14  Preferred Foot    18979 non-null  object
 15  BOV               18979 non-null  int64 
 16  Best Position     18979 non-null  object
 17  Joined      

The only columns with null values is the `Loan end date` column and the `Hits` column. This is evident because theya re the only 2 columns that dont have 18979 non-null values

In [205]:
fifa.head()

Unnamed: 0,ID,Name,LongName,photoUrl,playerUrl,Nationality,Age,↓OVA,POT,Club,Contract,Positions,Height,Weight,Preferred Foot,BOV,Best Position,Joined,Loan Date End,Value,Wage,Release Clause,Attacking,Crossing,Finishing,Heading Accuracy,Short Passing,Volleys,Skill,Dribbling,Curve,FK Accuracy,Long Passing,Ball Control,Movement,Acceleration,Sprint Speed,Agility,Reactions,Balance,Power,Shot Power,Jumping,Stamina,Strength,Long Shots,Mentality,Aggression,Interceptions,Positioning,Vision,Penalties,Composure,Defending,Marking,Standing Tackle,Sliding Tackle,Goalkeeping,GK Diving,GK Handling,GK Kicking,GK Positioning,GK Reflexes,Total Stats,Base Stats,W/F,SM,A/W,D/W,IR,PAC,SHO,PAS,DRI,DEF,PHY,Hits
0,158023,L. Messi,Lionel Messi,https://cdn.sofifa.com/players/158/023/21_60.png,http://sofifa.com/player/158023/lionel-messi/2...,Argentina,33,93,93,\n\n\n\nFC Barcelona,2004 ~ 2021,"RW, ST, CF",170cm,72kg,Left,93,RW,"Jul 1, 2004",,€103.5M,€560K,€138.4M,429,85,95,70,91,88,470,96,93,94,91,96,451,91,80,91,94,95,389,86,68,72,69,94,347,44,40,93,95,75,96,91,32,35,24,54,6,11,15,14,8,2231,466,4 ★,4★,Medium,Low,5 ★,85,92,91,95,38,65,771
1,20801,Cristiano Ronaldo,C. Ronaldo dos Santos Aveiro,https://cdn.sofifa.com/players/020/801/21_60.png,http://sofifa.com/player/20801/c-ronaldo-dos-s...,Portugal,35,92,92,\n\n\n\nJuventus,2018 ~ 2022,"ST, LW",187cm,83kg,Right,92,ST,"Jul 10, 2018",,€63M,€220K,€75.9M,437,84,95,90,82,86,414,88,81,76,77,92,431,87,91,87,95,71,444,94,95,84,78,93,353,63,29,95,82,84,95,84,28,32,24,58,7,11,15,14,11,2221,464,4 ★,5★,High,Low,5 ★,89,93,81,89,35,77,562
2,200389,J. Oblak,Jan Oblak,https://cdn.sofifa.com/players/200/389/21_60.png,http://sofifa.com/player/200389/jan-oblak/210006/,Slovenia,27,91,93,\n\n\n\nAtlético Madrid,2014 ~ 2023,GK,188cm,87kg,Right,91,GK,"Jul 16, 2014",,€120M,€125K,€159.4M,95,13,11,15,43,13,109,12,13,14,40,30,307,43,60,67,88,49,268,59,78,41,78,12,140,34,19,11,65,11,68,57,27,12,18,437,87,92,78,90,90,1413,489,3 ★,1★,Medium,Medium,3 ★,87,92,78,90,52,90,150
3,192985,K. De Bruyne,Kevin De Bruyne,https://cdn.sofifa.com/players/192/985/21_60.png,http://sofifa.com/player/192985/kevin-de-bruyn...,Belgium,29,91,91,\n\n\n\nManchester City,2015 ~ 2023,"CAM, CM",181cm,70kg,Right,91,CAM,"Aug 30, 2015",,€129M,€370K,€161M,407,94,82,55,94,82,441,88,85,83,93,92,398,77,76,78,91,76,408,91,63,89,74,91,408,76,66,88,94,84,91,186,68,65,53,56,15,13,5,10,13,2304,485,5 ★,4★,High,High,4 ★,76,86,93,88,64,78,207
4,190871,Neymar Jr,Neymar da Silva Santos Jr.,https://cdn.sofifa.com/players/190/871/21_60.png,http://sofifa.com/player/190871/neymar-da-silv...,Brazil,28,91,91,\n\n\n\nParis Saint-Germain,2017 ~ 2022,"LW, CAM",175cm,68kg,Right,91,LW,"Aug 3, 2017",,€132M,€270K,€166.5M,408,85,87,62,87,87,448,95,88,89,81,95,453,94,89,96,91,83,357,80,62,81,50,84,356,51,36,87,90,92,93,94,35,30,29,59,9,9,15,15,11,2175,451,5 ★,5★,High,Medium,5 ★,91,85,86,94,36,59,595


## Checking for duplicates

In [206]:
fifa[fifa.duplicated()]

Unnamed: 0,ID,Name,LongName,photoUrl,playerUrl,Nationality,Age,↓OVA,POT,Club,Contract,Positions,Height,Weight,Preferred Foot,BOV,Best Position,Joined,Loan Date End,Value,Wage,Release Clause,Attacking,Crossing,Finishing,Heading Accuracy,Short Passing,Volleys,Skill,Dribbling,Curve,FK Accuracy,Long Passing,Ball Control,Movement,Acceleration,Sprint Speed,Agility,Reactions,Balance,Power,Shot Power,Jumping,Stamina,Strength,Long Shots,Mentality,Aggression,Interceptions,Positioning,Vision,Penalties,Composure,Defending,Marking,Standing Tackle,Sliding Tackle,Goalkeeping,GK Diving,GK Handling,GK Kicking,GK Positioning,GK Reflexes,Total Stats,Base Stats,W/F,SM,A/W,D/W,IR,PAC,SHO,PAS,DRI,DEF,PHY,Hits


Seems there are no duplicate rows which means every player here is unique

## Cleaning the `IR`, `SM`, and `W/F` columns

In [207]:
fifa["IR"] = fifa["IR"].apply(lambda x: x.replace(x[1:], "")).astype(int)
fifa["SM"] = fifa["SM"].apply(lambda x: x[0]).astype(int)
fifa['W/F'] = fifa["W/F"].apply(lambda x: x[0]).astype(int)

## Cleaning the `Club` column

In [208]:
fifa["Club"] = fifa["Club"].apply(lambda x: x.strip())

In [209]:
fifa["Club"]

0               FC Barcelona
1                   Juventus
2            Atlético Madrid
3            Manchester City
4        Paris Saint-Germain
                ...         
18974             Wuhan Zall
18975        Oldham Athletic
18976             Derry City
18977       Dalian YiFang FC
18978       Dalian YiFang FC
Name: Club, Length: 18979, dtype: object

The `\n` characters that were there before are now gone

## Cleaning the `wage`, `value` and `Release clause` columns

There are "K" and "M" characters which represent 1 thousand and 1 million respectively. We would like these values to be in integers. So we define the following function below

In [210]:
def convert_numbers(number):
    
    if number[-1:] == 'K':  # Check if the last digit is K
        return float(number[:-1]) * 1000  # Remove the last digit with [:-1], and convert to int and multiply by 1000
    elif number[-1:] == 'M':  # Check if the last digit is M
        return float(number[:-1]) * 1000000  # Remove the last digit with [:-1], and convert to int and multiply by 1000000
    else:  # just in case data doesnt have an M or K
      
        return float(number)

In [211]:
fifa["Value"] = fifa["Value"].apply(lambda x: x.replace(x[0], ""))
fifa["Value"] = fifa["Value"].apply(convert_numbers)

In [212]:
fifa["Wage"] = fifa["Wage"].apply(lambda x: x.replace(x[0], ""))

In [213]:
fifa["Wage"].apply(convert_numbers)

0        560000.0
1        220000.0
2        125000.0
3        370000.0
4        270000.0
           ...   
18974      1000.0
18975       500.0
18976       500.0
18977      2000.0
18978      1000.0
Name: Wage, Length: 18979, dtype: float64

In [214]:
fifa["Release Clause"] = fifa["Release Clause"].apply(lambda x: x.replace(x[0], ""))
fifa["Release Clause"] = fifa["Release Clause"].apply(convert_numbers)

## Cleaning `height` and  `weight` columns

We realised that height is stored in cm for some players and feet and inches for others. So we are converting everything to meters 

In [215]:
def convert_height(x):
    if x[-1] == "\"": # if the last character is "
        x = x.replace("\"", "")
        foot = int(x[0]) * 30.48 # converts foot to cm
        inch = int(x[2:]) * 2.54 # converts inches to cm
        return (foot+inch)/100 # returns it in meters
    elif x[-1] == "m":
        return int(x[:-2])/100 # convert height in cm to m

In [216]:
fifa["Height"] = round(fifa["Height"].apply(convert_height),2)

In [217]:
fifa["Height"]

0        1.70
1        1.87
2        1.88
3        1.81
4        1.75
         ... 
18974    1.78
18975    1.75
18976    1.79
18977    1.75
18978    1.88
Name: Height, Length: 18979, dtype: float64

We also realised that weight is stored in kilograms for some players and pounds(lbs) for others 

In [218]:
def convert_weight(x):
    if x[-1] == 's':
        x = x.replace("lbs","")
        kg = int(x[:3]) / 2.205
        return round(kg)
    elif x[-1] == "g":
        return int(x[:-2])

In [219]:
fifa["Weight"] = round(fifa["Weight"].apply(convert_weight),2)

In [220]:
fifa["Weight"]

0        72
1        83
2        87
3        70
4        68
         ..
18974    66
18975    65
18976    74
18977    69
18978    75
Name: Weight, Length: 18979, dtype: int64

## Categorical attributes and Total Stats

We realised that attributes such as Attacking, Defending, Skill, Power, Defending, Mentality, Movement and Goalkeeping are the sum of other stats under them. We also thought that it would be a good idea to make it /100 in order to make it consistent. So that will mean,
* Attacking = Crossing + Heading Accuracy + Finishing + Short Passing + Short Passing + Volleys
* Skill = Dribbling + FK Accuracy  + Curve + Long Passing + Ball Control
* Power = Shot Power + Jumping + Stamina + Strength + Long Shots
* Defending = Standing Tackle + Sliding Tackle + Marking
* Mentality = Aggression + Interceptions + Positioning + Vision + Penalties + Composure
* Movement = Acceleration + Sprint Speed + Agility + Reactions + Balance
* Goalkeeping = GK Diving + GK Handling + GK Kicking + GK Positioning + GK Reflexes
* Total Stats = Attacking + Skill + Power + Defending + Mentality + Movement + Goalkeeping



In [221]:
fifa["Attacking"] = fifa["Crossing"] + fifa["Heading Accuracy"] + fifa["Finishing"] + fifa["Short Passing"] + fifa["Volleys"]
fifa["Skill"] = fifa["Dribbling"] + fifa["FK Accuracy"] + fifa["Curve"] + fifa["Long Passing"] + fifa["Ball Control"]
fifa["Power"] = fifa["Shot Power"] + fifa["Jumping"] + fifa["Stamina"] + fifa["Strength"] + fifa["Long Shots"]
fifa["Defending"] = fifa["Standing Tackle"] + fifa["Sliding Tackle"] + fifa["Marking"]
fifa["Mentality"] = fifa["Aggression"] + fifa["Interceptions"] + fifa["Positioning"] + fifa["Vision"] + fifa["Penalties"] + fifa["Composure"]
fifa["Goalkeeping"] = fifa["GK Diving"] + fifa["GK Handling"] + fifa["GK Kicking"] + fifa["GK Positioning"] + fifa["GK Reflexes"]
fifa["Movement"] = fifa["Acceleration"] + fifa["Sprint Speed"] + fifa["Agility"] + fifa["Reactions"] + fifa["Balance"]

In [222]:
fifa["Attacking"] = round(fifa["Attacking"]/5)
fifa["Skill"] = round(fifa["Skill"]/5)
fifa["Movement"] = round(fifa["Movement"]/5)
fifa["Power"] = round(fifa["Power"]/5)
fifa["Mentality"] = round(fifa["Mentality"]/6)
fifa["Defending"] = round(fifa["Defending"]/3)
fifa["Goalkeeping"] = round(fifa["Goalkeeping"]/5)

In [223]:
fifa["Base Stats"] = fifa["PAC"] + fifa["PAS"] + fifa["PHY"] + fifa["SHO"] + fifa["DEF"] + fifa["DRI"]

In [224]:
# Here's what the data looks like now
fifa.head()

Unnamed: 0,ID,Name,LongName,photoUrl,playerUrl,Nationality,Age,↓OVA,POT,Club,Contract,Positions,Height,Weight,Preferred Foot,BOV,Best Position,Joined,Loan Date End,Value,Wage,Release Clause,Attacking,Crossing,Finishing,Heading Accuracy,Short Passing,Volleys,Skill,Dribbling,Curve,FK Accuracy,Long Passing,Ball Control,Movement,Acceleration,Sprint Speed,Agility,Reactions,Balance,Power,Shot Power,Jumping,Stamina,Strength,Long Shots,Mentality,Aggression,Interceptions,Positioning,Vision,Penalties,Composure,Defending,Marking,Standing Tackle,Sliding Tackle,Goalkeeping,GK Diving,GK Handling,GK Kicking,GK Positioning,GK Reflexes,Total Stats,Base Stats,W/F,SM,A/W,D/W,IR,PAC,SHO,PAS,DRI,DEF,PHY,Hits
0,158023,L. Messi,Lionel Messi,https://cdn.sofifa.com/players/158/023/21_60.png,http://sofifa.com/player/158023/lionel-messi/2...,Argentina,33,93,93,FC Barcelona,2004 ~ 2021,"RW, ST, CF",1.7,72,Left,93,RW,"Jul 1, 2004",,103500000.0,560K,138400000.0,86.0,85,95,70,91,88,94.0,96,93,94,91,96,90.0,91,80,91,94,95,78.0,86,68,72,69,94,74.0,44,40,93,95,75,96,30.0,32,35,24,11.0,6,11,15,14,8,2231,466,4,4,Medium,Low,5,85,92,91,95,38,65,771
1,20801,Cristiano Ronaldo,C. Ronaldo dos Santos Aveiro,https://cdn.sofifa.com/players/020/801/21_60.png,http://sofifa.com/player/20801/c-ronaldo-dos-s...,Portugal,35,92,92,Juventus,2018 ~ 2022,"ST, LW",1.87,83,Right,92,ST,"Jul 10, 2018",,63000000.0,220K,75900000.0,87.0,84,95,90,82,86,83.0,88,81,76,77,92,86.0,87,91,87,95,71,89.0,94,95,84,78,93,75.0,63,29,95,82,84,95,28.0,28,32,24,12.0,7,11,15,14,11,2221,464,4,5,High,Low,5,89,93,81,89,35,77,562
2,200389,J. Oblak,Jan Oblak,https://cdn.sofifa.com/players/200/389/21_60.png,http://sofifa.com/player/200389/jan-oblak/210006/,Slovenia,27,91,93,Atlético Madrid,2014 ~ 2023,GK,1.88,87,Right,91,GK,"Jul 16, 2014",,120000000.0,125K,159400000.0,19.0,13,11,15,43,13,22.0,12,13,14,40,30,61.0,43,60,67,88,49,54.0,59,78,41,78,12,35.0,34,19,11,65,11,68,19.0,27,12,18,87.0,87,92,78,90,90,1413,489,3,1,Medium,Medium,3,87,92,78,90,52,90,150
3,192985,K. De Bruyne,Kevin De Bruyne,https://cdn.sofifa.com/players/192/985/21_60.png,http://sofifa.com/player/192985/kevin-de-bruyn...,Belgium,29,91,91,Manchester City,2015 ~ 2023,"CAM, CM",1.81,70,Right,91,CAM,"Aug 30, 2015",,129000000.0,370K,161000000.0,81.0,94,82,55,94,82,88.0,88,85,83,93,92,80.0,77,76,78,91,76,82.0,91,63,89,74,91,83.0,76,66,88,94,84,91,62.0,68,65,53,11.0,15,13,5,10,13,2304,485,5,4,High,High,4,76,86,93,88,64,78,207
4,190871,Neymar Jr,Neymar da Silva Santos Jr.,https://cdn.sofifa.com/players/190/871/21_60.png,http://sofifa.com/player/190871/neymar-da-silv...,Brazil,28,91,91,Paris Saint-Germain,2017 ~ 2022,"LW, CAM",1.75,68,Right,91,LW,"Aug 3, 2017",,132000000.0,270K,166500000.0,82.0,85,87,62,87,87,90.0,95,88,89,81,95,91.0,94,89,96,91,83,71.0,80,62,81,50,84,75.0,51,36,87,90,92,93,31.0,35,30,29,12.0,9,9,15,15,11,2175,451,5,5,High,Medium,5,91,85,86,94,36,59,595


## Categorizing the players contract Status
Instead of using their dates on the `Loan Date End` Column, we decided to just categorize them into 3 namely, 
* Under Contract: For players that are currently at their parent clubs and not on loan
* Loanee: For players away from their parent club on loan
* Free Agents: For players not under any contract at any club

In [225]:
fifa.loc[fifa["Loan Date End"].notna(), "Loan Date End"] = "Loanee"
fifa.loc[fifa["Contract"] == "Free", "Loan Date End"] = "Free"
fifa["Loan Date End"].fillna("Under Contract", inplace=True)

In [226]:
fifa.head()

Unnamed: 0,ID,Name,LongName,photoUrl,playerUrl,Nationality,Age,↓OVA,POT,Club,Contract,Positions,Height,Weight,Preferred Foot,BOV,Best Position,Joined,Loan Date End,Value,Wage,Release Clause,Attacking,Crossing,Finishing,Heading Accuracy,Short Passing,Volleys,Skill,Dribbling,Curve,FK Accuracy,Long Passing,Ball Control,Movement,Acceleration,Sprint Speed,Agility,Reactions,Balance,Power,Shot Power,Jumping,Stamina,Strength,Long Shots,Mentality,Aggression,Interceptions,Positioning,Vision,Penalties,Composure,Defending,Marking,Standing Tackle,Sliding Tackle,Goalkeeping,GK Diving,GK Handling,GK Kicking,GK Positioning,GK Reflexes,Total Stats,Base Stats,W/F,SM,A/W,D/W,IR,PAC,SHO,PAS,DRI,DEF,PHY,Hits
0,158023,L. Messi,Lionel Messi,https://cdn.sofifa.com/players/158/023/21_60.png,http://sofifa.com/player/158023/lionel-messi/2...,Argentina,33,93,93,FC Barcelona,2004 ~ 2021,"RW, ST, CF",1.7,72,Left,93,RW,"Jul 1, 2004",Under Contract,103500000.0,560K,138400000.0,86.0,85,95,70,91,88,94.0,96,93,94,91,96,90.0,91,80,91,94,95,78.0,86,68,72,69,94,74.0,44,40,93,95,75,96,30.0,32,35,24,11.0,6,11,15,14,8,2231,466,4,4,Medium,Low,5,85,92,91,95,38,65,771
1,20801,Cristiano Ronaldo,C. Ronaldo dos Santos Aveiro,https://cdn.sofifa.com/players/020/801/21_60.png,http://sofifa.com/player/20801/c-ronaldo-dos-s...,Portugal,35,92,92,Juventus,2018 ~ 2022,"ST, LW",1.87,83,Right,92,ST,"Jul 10, 2018",Under Contract,63000000.0,220K,75900000.0,87.0,84,95,90,82,86,83.0,88,81,76,77,92,86.0,87,91,87,95,71,89.0,94,95,84,78,93,75.0,63,29,95,82,84,95,28.0,28,32,24,12.0,7,11,15,14,11,2221,464,4,5,High,Low,5,89,93,81,89,35,77,562
2,200389,J. Oblak,Jan Oblak,https://cdn.sofifa.com/players/200/389/21_60.png,http://sofifa.com/player/200389/jan-oblak/210006/,Slovenia,27,91,93,Atlético Madrid,2014 ~ 2023,GK,1.88,87,Right,91,GK,"Jul 16, 2014",Under Contract,120000000.0,125K,159400000.0,19.0,13,11,15,43,13,22.0,12,13,14,40,30,61.0,43,60,67,88,49,54.0,59,78,41,78,12,35.0,34,19,11,65,11,68,19.0,27,12,18,87.0,87,92,78,90,90,1413,489,3,1,Medium,Medium,3,87,92,78,90,52,90,150
3,192985,K. De Bruyne,Kevin De Bruyne,https://cdn.sofifa.com/players/192/985/21_60.png,http://sofifa.com/player/192985/kevin-de-bruyn...,Belgium,29,91,91,Manchester City,2015 ~ 2023,"CAM, CM",1.81,70,Right,91,CAM,"Aug 30, 2015",Under Contract,129000000.0,370K,161000000.0,81.0,94,82,55,94,82,88.0,88,85,83,93,92,80.0,77,76,78,91,76,82.0,91,63,89,74,91,83.0,76,66,88,94,84,91,62.0,68,65,53,11.0,15,13,5,10,13,2304,485,5,4,High,High,4,76,86,93,88,64,78,207
4,190871,Neymar Jr,Neymar da Silva Santos Jr.,https://cdn.sofifa.com/players/190/871/21_60.png,http://sofifa.com/player/190871/neymar-da-silv...,Brazil,28,91,91,Paris Saint-Germain,2017 ~ 2022,"LW, CAM",1.75,68,Right,91,LW,"Aug 3, 2017",Under Contract,132000000.0,270K,166500000.0,82.0,85,87,62,87,87,90.0,95,88,89,81,95,91.0,94,89,96,91,83,71.0,80,62,81,50,84,75.0,51,36,87,90,92,93,31.0,35,30,29,12.0,9,9,15,15,11,2175,451,5,5,High,Medium,5,91,85,86,94,36,59,595


## Cleaning the `Hits` column

In [227]:
fifa["Hits"] = fifa["Hits"].fillna(0)

In [228]:
fifa["Hits"] = fifa["Hits"].astype(str)
fifa["Hits"] = fifa["Hits"].apply(convert_numbers)

In [229]:
fifa["Hits"]

0        771.0
1        562.0
2        150.0
3        207.0
4        595.0
         ...  
18974      0.0
18975      0.0
18976      0.0
18977      0.0
18978      0.0
Name: Hits, Length: 18979, dtype: float64

## Changing the format of the `Joined` column

In [230]:
fifa["Joined"] = pd.to_datetime(fifa["Joined"])
fifa["Joined"]

0       2004-07-01
1       2018-07-10
2       2014-07-16
3       2015-08-30
4       2017-08-03
           ...    
18974   2018-07-13
18975   2020-08-01
18976   2019-03-08
18977   2020-09-22
18978   2019-07-29
Name: Joined, Length: 18979, dtype: datetime64[ns]

What's left is just to change the names of the columns

In [231]:
fifa.rename(columns = {'Height(cm)': 'Height(m)', 'Wage':'Wage (€)', 'Value':'Value (€)', 'Release Clause': 'Release Clause (€)', 
                        'Weight':'Weight (kg)', 'W/F': 'Weaker Foot', 'SM': 'Skill Moves', 
                       'A/W': 'Attacking Work Rate', 'D/W': 'Defensive Work Rate', 
                       'IR': 'Injury Resistance', '↓OVA':'Overall Rating', 'BOV': 'Best Overall', 'POT': 'Potential Rating', 
                       'LongName': 'Full Name', 'Loan Date End': 'Contract Status'}, inplace = True)

In [232]:
fifa.head()

Unnamed: 0,ID,Name,Full Name,photoUrl,playerUrl,Nationality,Age,Overall Rating,Potential Rating,Club,Contract,Positions,Height,Weight (kg),Preferred Foot,Best Overall,Best Position,Joined,Contract Status,Value (€),Wage (€),Release Clause (€),Attacking,Crossing,Finishing,Heading Accuracy,Short Passing,Volleys,Skill,Dribbling,Curve,FK Accuracy,Long Passing,Ball Control,Movement,Acceleration,Sprint Speed,Agility,Reactions,Balance,Power,Shot Power,Jumping,Stamina,Strength,Long Shots,Mentality,Aggression,Interceptions,Positioning,Vision,Penalties,Composure,Defending,Marking,Standing Tackle,Sliding Tackle,Goalkeeping,GK Diving,GK Handling,GK Kicking,GK Positioning,GK Reflexes,Total Stats,Base Stats,Weaker Foot,Skill Moves,Attacking Work Rate,Defensive Work Rate,Injury Resistance,PAC,SHO,PAS,DRI,DEF,PHY,Hits
0,158023,L. Messi,Lionel Messi,https://cdn.sofifa.com/players/158/023/21_60.png,http://sofifa.com/player/158023/lionel-messi/2...,Argentina,33,93,93,FC Barcelona,2004 ~ 2021,"RW, ST, CF",1.7,72,Left,93,RW,2004-07-01,Under Contract,103500000.0,560K,138400000.0,86.0,85,95,70,91,88,94.0,96,93,94,91,96,90.0,91,80,91,94,95,78.0,86,68,72,69,94,74.0,44,40,93,95,75,96,30.0,32,35,24,11.0,6,11,15,14,8,2231,466,4,4,Medium,Low,5,85,92,91,95,38,65,771.0
1,20801,Cristiano Ronaldo,C. Ronaldo dos Santos Aveiro,https://cdn.sofifa.com/players/020/801/21_60.png,http://sofifa.com/player/20801/c-ronaldo-dos-s...,Portugal,35,92,92,Juventus,2018 ~ 2022,"ST, LW",1.87,83,Right,92,ST,2018-07-10,Under Contract,63000000.0,220K,75900000.0,87.0,84,95,90,82,86,83.0,88,81,76,77,92,86.0,87,91,87,95,71,89.0,94,95,84,78,93,75.0,63,29,95,82,84,95,28.0,28,32,24,12.0,7,11,15,14,11,2221,464,4,5,High,Low,5,89,93,81,89,35,77,562.0
2,200389,J. Oblak,Jan Oblak,https://cdn.sofifa.com/players/200/389/21_60.png,http://sofifa.com/player/200389/jan-oblak/210006/,Slovenia,27,91,93,Atlético Madrid,2014 ~ 2023,GK,1.88,87,Right,91,GK,2014-07-16,Under Contract,120000000.0,125K,159400000.0,19.0,13,11,15,43,13,22.0,12,13,14,40,30,61.0,43,60,67,88,49,54.0,59,78,41,78,12,35.0,34,19,11,65,11,68,19.0,27,12,18,87.0,87,92,78,90,90,1413,489,3,1,Medium,Medium,3,87,92,78,90,52,90,150.0
3,192985,K. De Bruyne,Kevin De Bruyne,https://cdn.sofifa.com/players/192/985/21_60.png,http://sofifa.com/player/192985/kevin-de-bruyn...,Belgium,29,91,91,Manchester City,2015 ~ 2023,"CAM, CM",1.81,70,Right,91,CAM,2015-08-30,Under Contract,129000000.0,370K,161000000.0,81.0,94,82,55,94,82,88.0,88,85,83,93,92,80.0,77,76,78,91,76,82.0,91,63,89,74,91,83.0,76,66,88,94,84,91,62.0,68,65,53,11.0,15,13,5,10,13,2304,485,5,4,High,High,4,76,86,93,88,64,78,207.0
4,190871,Neymar Jr,Neymar da Silva Santos Jr.,https://cdn.sofifa.com/players/190/871/21_60.png,http://sofifa.com/player/190871/neymar-da-silv...,Brazil,28,91,91,Paris Saint-Germain,2017 ~ 2022,"LW, CAM",1.75,68,Right,91,LW,2017-08-03,Under Contract,132000000.0,270K,166500000.0,82.0,85,87,62,87,87,90.0,95,88,89,81,95,91.0,94,89,96,91,83,71.0,80,62,81,50,84,75.0,51,36,87,90,92,93,31.0,35,30,29,12.0,9,9,15,15,11,2175,451,5,5,High,Medium,5,91,85,86,94,36,59,595.0


Now to drop irrelevant columns

In [233]:
fifa.drop(["Name", "photoUrl"], axis=1, inplace=True)

...and we're done with the cleaning

In [234]:
fifa

Unnamed: 0,ID,Full Name,playerUrl,Nationality,Age,Overall Rating,Potential Rating,Club,Contract,Positions,Height,Weight (kg),Preferred Foot,Best Overall,Best Position,Joined,Contract Status,Value (€),Wage (€),Release Clause (€),Attacking,Crossing,Finishing,Heading Accuracy,Short Passing,Volleys,Skill,Dribbling,Curve,FK Accuracy,Long Passing,Ball Control,Movement,Acceleration,Sprint Speed,Agility,Reactions,Balance,Power,Shot Power,Jumping,Stamina,Strength,Long Shots,Mentality,Aggression,Interceptions,Positioning,Vision,Penalties,Composure,Defending,Marking,Standing Tackle,Sliding Tackle,Goalkeeping,GK Diving,GK Handling,GK Kicking,GK Positioning,GK Reflexes,Total Stats,Base Stats,Weaker Foot,Skill Moves,Attacking Work Rate,Defensive Work Rate,Injury Resistance,PAC,SHO,PAS,DRI,DEF,PHY,Hits
0,158023,Lionel Messi,http://sofifa.com/player/158023/lionel-messi/2...,Argentina,33,93,93,FC Barcelona,2004 ~ 2021,"RW, ST, CF",1.70,72,Left,93,RW,2004-07-01,Under Contract,103500000.0,560K,138400000.0,86.0,85,95,70,91,88,94.0,96,93,94,91,96,90.0,91,80,91,94,95,78.0,86,68,72,69,94,74.0,44,40,93,95,75,96,30.0,32,35,24,11.0,6,11,15,14,8,2231,466,4,4,Medium,Low,5,85,92,91,95,38,65,771.0
1,20801,C. Ronaldo dos Santos Aveiro,http://sofifa.com/player/20801/c-ronaldo-dos-s...,Portugal,35,92,92,Juventus,2018 ~ 2022,"ST, LW",1.87,83,Right,92,ST,2018-07-10,Under Contract,63000000.0,220K,75900000.0,87.0,84,95,90,82,86,83.0,88,81,76,77,92,86.0,87,91,87,95,71,89.0,94,95,84,78,93,75.0,63,29,95,82,84,95,28.0,28,32,24,12.0,7,11,15,14,11,2221,464,4,5,High,Low,5,89,93,81,89,35,77,562.0
2,200389,Jan Oblak,http://sofifa.com/player/200389/jan-oblak/210006/,Slovenia,27,91,93,Atlético Madrid,2014 ~ 2023,GK,1.88,87,Right,91,GK,2014-07-16,Under Contract,120000000.0,125K,159400000.0,19.0,13,11,15,43,13,22.0,12,13,14,40,30,61.0,43,60,67,88,49,54.0,59,78,41,78,12,35.0,34,19,11,65,11,68,19.0,27,12,18,87.0,87,92,78,90,90,1413,489,3,1,Medium,Medium,3,87,92,78,90,52,90,150.0
3,192985,Kevin De Bruyne,http://sofifa.com/player/192985/kevin-de-bruyn...,Belgium,29,91,91,Manchester City,2015 ~ 2023,"CAM, CM",1.81,70,Right,91,CAM,2015-08-30,Under Contract,129000000.0,370K,161000000.0,81.0,94,82,55,94,82,88.0,88,85,83,93,92,80.0,77,76,78,91,76,82.0,91,63,89,74,91,83.0,76,66,88,94,84,91,62.0,68,65,53,11.0,15,13,5,10,13,2304,485,5,4,High,High,4,76,86,93,88,64,78,207.0
4,190871,Neymar da Silva Santos Jr.,http://sofifa.com/player/190871/neymar-da-silv...,Brazil,28,91,91,Paris Saint-Germain,2017 ~ 2022,"LW, CAM",1.75,68,Right,91,LW,2017-08-03,Under Contract,132000000.0,270K,166500000.0,82.0,85,87,62,87,87,90.0,95,88,89,81,95,91.0,94,89,96,91,83,71.0,80,62,81,50,84,75.0,51,36,87,90,92,93,31.0,35,30,29,12.0,9,9,15,15,11,2175,451,5,5,High,Medium,5,91,85,86,94,36,59,595.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
18974,247223,Ao Xia,http://sofifa.com/player/247223/ao-xia/210006/,China PR,21,47,55,Wuhan Zall,2018 ~ 2022,CB,1.78,66,Right,49,CB,2018-07-13,Under Contract,100000.0,1K,70000.0,29.0,23,26,43,26,27,28.0,27,23,21,29,42,59.0,68,60,69,46,51,44.0,36,57,54,50,24,39.0,48,50,28,28,38,44,49.0,45,52,50,9.0,7,8,5,14,11,1186,255,2,2,Medium,Medium,1,64,28,26,38,48,51,0.0
18975,258760,Ben Hough,http://sofifa.com/player/258760/ben-hough/210006/,England,17,47,67,Oldham Athletic,2020 ~ 2021,CM,1.75,65,Right,51,CAM,2020-08-01,Under Contract,130000.0,500,165000.0,42.0,38,42,40,56,35,44.0,46,40,35,50,48,61.0,63,64,61,51,66,45.0,48,58,43,47,30,38.0,40,23,47,47,36,38,39.0,32,44,40,9.0,12,10,9,6,8,1315,281,2,2,Medium,Medium,1,64,40,48,49,35,45,0.0
18976,252757,Ronan McKinley,http://sofifa.com/player/252757/ronan-mckinley...,England,18,47,65,Derry City,2019 ~ 2020,CM,1.79,74,Right,49,CAM,2019-03-08,Under Contract,120000.0,500,131000.0,40.0,30,34,43,54,39,41.0,43,39,31,47,47,58.0,59,66,51,47,67,48.0,45,52,50,54,41,46.0,56,42,47,43,42,43,40.0,33,43,45,10.0,13,12,6,6,11,1338,285,2,2,Medium,Medium,1,63,39,44,46,40,53,0.0
18977,243790,Zhen'ao Wang,http://sofifa.com/player/243790/zhenao-wang/21...,China PR,20,47,57,Dalian YiFang FC,2020 ~ 2022,RW,1.75,69,Right,48,ST,2020-09-22,Under Contract,100000.0,2K,88000.0,43.0,45,52,34,42,42,39.0,51,35,31,31,46,51.0,62,55,50,33,54,47.0,56,45,46,48,40,39.0,31,25,42,46,46,45,33.0,26,32,42,11.0,14,12,9,8,12,1243,271,3,2,Medium,Medium,1,58,49,41,49,30,44,0.0


In [236]:
fifa.to_csv("Fifa 21_cleaned.csv", index=False)