<a href="https://colab.research.google.com/github/adleyliuu/portfolio/blob/main/FIFA_21_players_Data_Cleaning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# DATA CLEANING & TRANSFORMATION

# Dataset: FIFA 21 players dataset
*   https://www.kaggle.com/datasets/yagunnersya/fifa-21-messy-raw-dataset-for-cleaning-exploring/data



# Guiding Questions:

1.   Do the height and weight columns have the appropriate data types?
2.   Can you separate the joined column into year, month and day columns?
3.   Can you clean and transform the value, wage and release clause columns into columns of integers?
4.   How can you remove the newline characters from the Hits column?
5.   Should you separate the Team & Contract column into separate team and contract columns?



## DATA LOADING


### Import Relevant Dataset and Libraries

In [None]:
#import libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from IPython.display import display, HTML

In [None]:
#Function to create scrollable table within a small window
def create_scrollable_table(df, table_id, title):
  html = f'<h3>{title}</h3>'
  html += f'<div id="{table_id}" style="height: 200px; overflow:auto;">'
  html += df.to_html()
  html += '</div>'
  return html

In [None]:
df = pd.read_csv("/content/fifa21_raw_data.csv") #loading dataset

  df = pd.read_csv("/content/fifa21_raw_data.csv") #loading dataset


### * STRUCTURE of DATA

In [None]:
df.shape

(18979, 77)

In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18979 entries, 0 to 18978
Data columns (total 77 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   photoUrl          18979 non-null  object
 1   LongName          18979 non-null  object
 2   playerUrl         18979 non-null  object
 3   Nationality       18979 non-null  object
 4   Positions         18979 non-null  object
 5   Name              18979 non-null  object
 6   Age               18979 non-null  int64 
 7   ↓OVA              18979 non-null  int64 
 8   POT               18979 non-null  int64 
 9   Team & Contract   18979 non-null  object
 10  ID                18979 non-null  int64 
 11  Height            18979 non-null  object
 12  Weight            18979 non-null  object
 13  foot              18979 non-null  object
 14  BOV               18979 non-null  int64 
 15  BP                18979 non-null  object
 16  Growth            18979 non-null  int64 
 17  Joined      

### * Categorical Features

In [None]:
categorical_features = df.select_dtypes("object")
# T: transposes
# cat_summary_stats = categorical_features.describe().T
html_categorical = create_scrollable_table(categorical_features.head(), "categorical_features", "Data Frame for Categorical Features.")

display(HTML(html_categorical))

Unnamed: 0,photoUrl,LongName,playerUrl,Nationality,Positions,Name,Team & Contract,Height,Weight,foot,BP,Joined,Loan Date End,Value,Wage,Release Clause,W/F,SM,A/W,D/W,IR,Hits
0,https://cdn.sofifa.com/players/158/023/21_60.png,Lionel Messi,http://sofifa.com/player/158023/lionel-messi/210005/,Argentina,RW ST CF,L. Messi,\n\n\n\nFC Barcelona\n2004 ~ 2021\n\n,"5'7""",159lbs,Left,RW,"Jul 1, 2004",,€67.5M,€560K,€138.4M,4 ★,4★,Medium,Low,5 ★,\n372
1,https://cdn.sofifa.com/players/020/801/21_60.png,C. Ronaldo dos Santos Aveiro,http://sofifa.com/player/20801/c-ronaldo-dos-santos-aveiro/210005/,Portugal,ST LW,Cristiano Ronaldo,\n\n\n\nJuventus\n2018 ~ 2022\n\n,"6'2""",183lbs,Right,ST,"Jul 10, 2018",,€46M,€220K,€75.9M,4 ★,5★,High,Low,5 ★,\n344
2,https://cdn.sofifa.com/players/200/389/21_60.png,Jan Oblak,http://sofifa.com/player/200389/jan-oblak/210005/,Slovenia,GK,J. Oblak,\n\n\n\nAtlético Madrid\n2014 ~ 2023\n\n,"6'2""",192lbs,Right,GK,"Jul 16, 2014",,€75M,€125K,€159.4M,3 ★,1★,Medium,Medium,3 ★,\n86
3,https://cdn.sofifa.com/players/192/985/21_60.png,Kevin De Bruyne,http://sofifa.com/player/192985/kevin-de-bruyne/210005/,Belgium,CAM CM,K. De Bruyne,\n\n\n\nManchester City\n2015 ~ 2023\n\n,"5'11""",154lbs,Right,CAM,"Aug 30, 2015",,€87M,€370K,€161M,5 ★,4★,High,High,4 ★,\n163
4,https://cdn.sofifa.com/players/190/871/21_60.png,Neymar da Silva Santos Jr.,http://sofifa.com/player/190871/neymar-da-silva-santos-jr/210005/,Brazil,LW CAM,Neymar Jr,\n\n\n\nParis Saint-Germain\n2017 ~ 2022\n\n,"5'9""",150lbs,Right,LW,"Aug 3, 2017",,€90M,€270K,€166.5M,5 ★,5★,High,Medium,5 ★,\n273


In [None]:
categorical_features.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18979 entries, 0 to 18978
Data columns (total 22 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   photoUrl         18979 non-null  object
 1   LongName         18979 non-null  object
 2   playerUrl        18979 non-null  object
 3   Nationality      18979 non-null  object
 4   Positions        18979 non-null  object
 5   Name             18979 non-null  object
 6   Team & Contract  18979 non-null  object
 7   Height           18979 non-null  object
 8   Weight           18979 non-null  object
 9   foot             18979 non-null  object
 10  BP               18979 non-null  object
 11  Joined           18979 non-null  object
 12  Loan Date End    1013 non-null   object
 13  Value            18979 non-null  object
 14  Wage             18979 non-null  object
 15  Release Clause   18979 non-null  object
 16  W/F              18979 non-null  object
 17  SM               18979 non-null

In [None]:
categorical_features.isnull().sum()

photoUrl               0
LongName               0
playerUrl              0
Nationality            0
Positions              0
Name                   0
Team & Contract        0
Height                 0
Weight                 0
foot                   0
BP                     0
Joined                 0
Loan Date End      17966
Value                  0
Wage                   0
Release Clause         0
W/F                    0
SM                     0
A/W                    0
D/W                    0
IR                     0
Hits                   0
dtype: int64

### * Numerical Features

In [None]:
numerical_features = df.select_dtypes("int64", "float64")

html_numerical= create_scrollable_table(numerical_features.head(10), "numerical_features", "Data Frame for numerical features.")

display(HTML(html_numerical))

Unnamed: 0,Age,↓OVA,POT,ID,BOV,Growth,Attacking,Crossing,Finishing,Heading Accuracy,Short Passing,Volleys,Skill,Dribbling,Curve,FK Accuracy,Long Passing,Ball Control,Movement,Acceleration,Sprint Speed,Agility,Reactions,Balance,Power,Shot Power,Jumping,Stamina,Strength,Long Shots,Mentality,Aggression,Interceptions,Positioning,Vision,Penalties,Composure,Defending,Marking,Standing Tackle,Sliding Tackle,Goalkeeping,GK Diving,GK Handling,GK Kicking,GK Positioning,GK Reflexes,Total Stats,Base Stats,PAC,SHO,PAS,DRI,DEF,PHY
0,33,93,93,158023,93,0,429,85,95,70,91,88,470,96,93,94,91,96,451,91,80,91,94,95,389,86,68,72,69,94,347,44,40,93,95,75,96,91,32,35,24,54,6,11,15,14,8,2231,466,85,92,91,95,38,65
1,35,92,92,20801,92,0,437,84,95,90,82,86,414,88,81,76,77,92,431,87,91,87,95,71,444,94,95,84,78,93,353,63,29,95,82,84,95,84,28,32,24,58,7,11,15,14,11,2221,464,89,93,81,89,35,77
2,27,91,93,200389,91,2,95,13,11,15,43,13,109,12,13,14,40,30,307,43,60,67,88,49,268,59,78,41,78,12,140,34,19,11,65,11,68,57,27,12,18,437,87,92,78,90,90,1413,489,87,92,78,90,52,90
3,29,91,91,192985,91,0,407,94,82,55,94,82,441,88,85,83,93,92,398,77,76,78,91,76,408,91,63,89,74,91,408,76,66,88,94,84,91,186,68,65,53,56,15,13,5,10,13,2304,485,76,86,93,88,64,78
4,28,91,91,190871,91,0,408,85,87,62,87,87,448,95,88,89,81,95,453,94,89,96,91,83,357,80,62,81,50,84,356,51,36,87,90,92,93,94,35,30,29,59,9,9,15,15,11,2175,451,91,85,86,94,36,59
5,31,91,91,188545,91,0,423,71,94,85,84,89,407,85,79,85,70,88,407,77,78,77,93,82,420,89,84,76,86,85,391,81,49,94,79,88,88,96,35,42,19,51,15,6,12,8,10,2195,457,78,91,78,85,43,82
6,21,90,95,231747,91,5,408,78,91,73,83,83,394,92,79,63,70,90,458,96,96,92,92,82,404,86,77,86,76,79,341,62,38,91,80,70,84,100,34,34,32,42,13,5,7,11,6,2147,466,96,86,78,91,39,76
7,27,90,91,212831,90,1,114,17,13,19,45,20,138,27,19,18,44,30,268,56,47,40,88,37,240,64,52,32,78,14,140,27,11,13,66,23,65,50,15,19,16,439,86,88,85,91,89,1389,490,86,88,85,89,51,91
8,28,90,90,209331,90,0,392,79,91,59,84,79,406,90,83,69,75,89,460,94,92,91,92,91,393,80,69,85,75,84,376,63,55,91,84,83,90,122,38,43,41,62,14,14,9,11,14,2211,470,93,86,81,90,45,75
9,28,90,90,208722,90,0,410,76,90,84,85,75,391,91,76,64,71,89,460,95,93,93,93,86,406,84,86,88,70,78,358,75,35,92,85,71,84,122,42,42,38,56,10,10,15,7,14,2203,469,94,85,80,90,44,76


In [None]:
numerical_features.isnull().sum()

Age                 0
↓OVA                0
POT                 0
ID                  0
BOV                 0
Growth              0
Attacking           0
Crossing            0
Finishing           0
Heading Accuracy    0
Short Passing       0
Volleys             0
Skill               0
Dribbling           0
Curve               0
FK Accuracy         0
Long Passing        0
Ball Control        0
Movement            0
Acceleration        0
Sprint Speed        0
Agility             0
Reactions           0
Balance             0
Power               0
Shot Power          0
Jumping             0
Stamina             0
Strength            0
Long Shots          0
Mentality           0
Aggression          0
Interceptions       0
Positioning         0
Vision              0
Penalties           0
Composure           0
Defending           0
Marking             0
Standing Tackle     0
Sliding Tackle      0
Goalkeeping         0
GK Diving           0
GK Handling         0
GK Kicking          0
GK Positio

In [None]:
numerical_features.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18979 entries, 0 to 18978
Data columns (total 55 columns):
 #   Column            Non-Null Count  Dtype
---  ------            --------------  -----
 0   Age               18979 non-null  int64
 1   ↓OVA              18979 non-null  int64
 2   POT               18979 non-null  int64
 3   ID                18979 non-null  int64
 4   BOV               18979 non-null  int64
 5   Growth            18979 non-null  int64
 6   Attacking         18979 non-null  int64
 7   Crossing          18979 non-null  int64
 8   Finishing         18979 non-null  int64
 9   Heading Accuracy  18979 non-null  int64
 10  Short Passing     18979 non-null  int64
 11  Volleys           18979 non-null  int64
 12  Skill             18979 non-null  int64
 13  Dribbling         18979 non-null  int64
 14  Curve             18979 non-null  int64
 15  FK Accuracy       18979 non-null  int64
 16  Long Passing      18979 non-null  int64
 17  Ball Control      18979 non-nul

## DATA CLEANING

### DUPLICATED VALUE

In [None]:
df.duplicated().sum()

1

In [None]:
# Check the duplicate rows
df.loc[df.duplicated()]

Unnamed: 0,photoUrl,LongName,playerUrl,Nationality,Positions,Name,Age,↓OVA,POT,Team & Contract,...,A/W,D/W,IR,PAC,SHO,PAS,DRI,DEF,PHY,Hits
944,https://cdn.sofifa.com/players/251/698/21_60.png,Kevin Berlaso,http://sofifa.com/player/251698/kevin-berlaso/...,Ecuador,RB,K. Berlaso,32,77,77,\n Ecuador\nFree\n\n,...,High,Medium,2 ★,78,56,69,77,72,68,\n12


### * DROP unrelated COLUMNS

In [None]:
columns_to_drop = ['photoUrl', 'playerUrl', 'Loan Date End', 'Release Clause']
df = df.drop(columns = columns_to_drop)


### Columns to clean:
1.   Team & Contract
2.   Height
3.   Weight
4.   Joined (Year and Months)
5.   Value
6.   Wage
7.  Positions
8. Hits


### 1. Team & Contract separate into two columns


In [None]:
original_value = df["Team & Contract"].head(10)
original_value

0           \n\n\n\nFC Barcelona\n2004 ~ 2021\n\n
1               \n\n\n\nJuventus\n2018 ~ 2022\n\n
2        \n\n\n\nAtlético Madrid\n2014 ~ 2023\n\n
3        \n\n\n\nManchester City\n2015 ~ 2023\n\n
4    \n\n\n\nParis Saint-Germain\n2017 ~ 2022\n\n
5      \n\n\n\nFC Bayern München\n2014 ~ 2023\n\n
6    \n\n\n\nParis Saint-Germain\n2018 ~ 2022\n\n
7              \n\n\n\nLiverpool\n2018 ~ 2024\n\n
8              \n\n\n\nLiverpool\n2017 ~ 2023\n\n
9              \n\n\n\nLiverpool\n2016 ~ 2023\n\n
Name: Team & Contract, dtype: object



*   Remove the '\n' values in each rows



In [None]:
df['Team & Contract'].replace('\n', ' ', inplace=True, regex=True) #remove the '\n' values



*   Obtain the string inside the columns




In [None]:
Team = []
Contract_Duration = []

for x in range(len(df['Team & Contract'])):
  team_contract = df['Team & Contract'][x] #new reference
  parts = team_contract.split('2' , 1) #split the string into two parts at the first occurence of '2'
  if len(parts) == 2: # [(parts[0]), (parts[1])]
    team = parts[0]
    duration = '2' + parts[1] #once splitted, the 2 will be disappered, so contanentaion with '2'
  else:
    team = team_contract
    duration = '0'

  #append to lists
  Team.append(team)
  Contract_Duration.append(duration)

df['Team'] = Team
df['Contract Duration'] = Contract_Duration

df = df.drop(columns = ['Team & Contract'])

In [None]:
team_value = df["Team"].head(10)
team_value

0               FC Barcelona 
1                   Juventus 
2            Atlético Madrid 
3            Manchester City 
4        Paris Saint-Germain 
5          FC Bayern München 
6        Paris Saint-Germain 
7                  Liverpool 
8                  Liverpool 
9                  Liverpool 
Name: Team, dtype: object

In [None]:
contract_duration = df["Contract Duration"].head(10)
contract_duration

0    2004 ~ 2021  
1    2018 ~ 2022  
2    2014 ~ 2023  
3    2015 ~ 2023  
4    2017 ~ 2022  
5    2014 ~ 2023  
6    2018 ~ 2022  
7    2018 ~ 2024  
8    2017 ~ 2023  
9    2016 ~ 2023  
Name: Contract Duration, dtype: object

### 2. Convert the height and weight columns to numerical forms.
*   height (feet and inches --> cm)
*   weight: (pounds) --> (kilograms)




In [None]:
original_height_value = df["Height"].head(10)
original_height_value

0     5'7"
1     6'2"
2     6'2"
3    5'11"
4     5'9"
5     6'0"
6    5'10"
7     6'3"
8     5'9"
9     5'9"
Name: Height, dtype: object

In [None]:
original_weight_value = df["Weight"].head(10)
original_weight_value

0    159lbs
1    183lbs
2    192lbs
3    154lbs
4    150lbs
5    176lbs
6    161lbs
7    201lbs
8    157lbs
9    152lbs
Name: Weight, dtype: object

In [None]:
#Function to convert lbs to kg
def lbs_to_kg(weight):
    # Extract the numeric part from the string and convert it to integer
    numeric_part = int(''.join(filter(str.isdigit, weight)))

    # Convert pounds to kilograms
    kilograms = numeric_part * 0.453592
    return int(kilograms)

#Function to conbert feet to cm
def feet_to_cm(height):
  feet, inches = height.split("'")
  feet = int(feet)
  inches = int(inches.replace('"', ' ' ))

  total_inches = feet * 12 + inches
  centimeters = total_inches * 2.54
  return int(centimeters)

df['Height'] = df['Height'].apply(feet_to_cm)
df['Weight'] = df['Weight'].apply(lbs_to_kg)



In [None]:
height_value = df["Height"].head(10)
height_value

0    170
1    187
2    187
3    180
4    175
5    182
6    177
7    190
8    175
9    175
Name: Height, dtype: int64

In [None]:
weight_value = df["Weight"].head(10)
weight_value

0    72
1    83
2    87
3    69
4    68
5    79
6    73
7    91
8    71
9    68
Name: Weight, dtype: int64

### 3. Joined --> (Year, Month)

In [None]:
joined_value = df["Joined"].head(10)
joined_value

0     Jul 1, 2004
1    Jul 10, 2018
2    Jul 16, 2014
3    Aug 30, 2015
4     Aug 3, 2017
5     Jul 1, 2014
6     Jul 1, 2018
7    Jul 19, 2018
8     Jul 1, 2017
9     Jul 1, 2016
Name: Joined, dtype: object

In [None]:
df[["Joined_Month", "Joined_Year"]] = df["Joined"].str.split(',',  expand=True)
df.drop(columns=["Joined"], inplace=True)

In [None]:
df["Joined_Year"].head(10)

0     2004
1     2018
2     2014
3     2015
4     2017
5     2014
6     2018
7     2018
8     2017
9     2016
Name: Joined_Year, dtype: object

In [None]:
df["Joined_Month"].head(10)

0     Jul 1
1    Jul 10
2    Jul 16
3    Aug 30
4     Aug 3
5     Jul 1
6     Jul 1
7    Jul 19
8     Jul 1
9     Jul 1
Name: Joined_Month, dtype: object

### 4. Value converts into INTEGERS



In [None]:
temp_value = []

for x in range(len(df["Value"])):
  value = df["Value"][x]
  value = value.replace("€", "");
  if 'M' in str(value):
    value = value.replace("M", "");
    value = float(value) * 1000000
  elif 'K' in str(value) :
    value = value.replace("K", "");
    value = float(value) * 1000
  elif  'F' in value:
    value = value.replace('F','')
    value = value = int(value) / 10
  temp_value.append(int(value))

df["Value"] = temp_value

In [None]:
value_columns = df["Value"]
print(value_columns)

0        67500000
1        46000000
2        75000000
3        87000000
4        90000000
           ...   
18974       35000
18975       60000
18976       40000
18977       60000
18978       60000
Name: Value, Length: 18979, dtype: int64


### 5. Wage into INTEGERS

In [None]:
temp_wage = []

for x in range(len(df["Wage"])):
  wage = df["Wage"][x]
  wage = wage.replace("€", "");
  if 'M' in str(wage):
    wage = wage.replace("M", "");
    wage = float(wage) * 1000000
  elif 'K' in str(wage) :
    wage = wage.replace("K", "");
    wage = float(wage) * 1000
  elif  'F' in wage:
    wage = wage.replace('F','')
    wage = int(wage) / 10

  temp_wage.append(int(wage))
print(temp_wage)

df["Wage"] = temp_wage

[560000, 220000, 125000, 370000, 270000, 240000, 160000, 160000, 250000, 250000, 210000, 260000, 310000, 250000, 125000, 350000, 300000, 300000, 190000, 145000, 190000, 195000, 270000, 220000, 140000, 350000, 310000, 100000, 82000, 110000, 230000, 155000, 200000, 195000, 155000, 190000, 165000, 290000, 110000, 125000, 240000, 170000, 105000, 160000, 260000, 115000, 125000, 94000, 160000, 230000, 220000, 135000, 190000, 150000, 130000, 220000, 140000, 93000, 55000, 58000, 220000, 100000, 105000, 80000, 130000, 145000, 150000, 34000, 190000, 94000, 100000, 190000, 135000, 120000, 140000, 115000, 100000, 210000, 115000, 135000, 99000, 120000, 92000, 105000, 165000, 120000, 155000, 59000, 150000, 170000, 94000, 65000, 110000, 130000, 56000, 150000, 220000, 115000, 93000, 56000, 130000, 98000, 47000, 96000, 18000, 70000, 125000, 94000, 145000, 75000, 84000, 155000, 27000, 20000, 120000, 86000, 105000, 77000, 74000, 140000, 92000, 110000, 91000, 130000, 25000, 210000, 105000, 105000, 25000, 

In [None]:
wage_columns = df["Wage"]
print(wage_columns)

0        560000
1        220000
2        125000
3        370000
4        270000
          ...  
18974      1000
18975       500
18976      1000
18977       500
18978       500
Name: Wage, Length: 18979, dtype: int64


### 6. Team Position

In [None]:
count = 0
unique_value = df["Positions"].unique()
unique_value
for x in unique_value:
  count += 1
print(count)

640


In [None]:
temp_position = []
for x in range(len(df["Positions"])):
  value = sorted(df["Positions"][x].split(" "))
  yx = ' '.join(value) #join the words into a single string with spaces between them
  temp_position.append(yx)

df["Positions"] = temp_position

In [None]:
count = 0
unique_value = df["Positions"].unique()
unique_value
for x in unique_value:
  count += 1
print(count)

238


### 7. Hits


In [None]:
hits_value = []

for value in df["Hits"]:
    if isinstance(value, str):  #check if the value is a string(), then perform the necessary conversions
        value = value.replace("\n", "").replace("K", "000")  # Replace '\n' and 'K' with '000'
        value = int(float(value)) if '.' in value else int(value)  # Convert to integer
    hits_value.append(value)

df["Hits"] = hits_value


## Data Structure after Cleaning

In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18979 entries, 0 to 18978
Data columns (total 75 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   LongName           18979 non-null  object
 1   Nationality        18979 non-null  object
 2   Positions          18979 non-null  object
 3   Name               18979 non-null  object
 4   Age                18979 non-null  int64 
 5   ↓OVA               18979 non-null  int64 
 6   POT                18979 non-null  int64 
 7   ID                 18979 non-null  int64 
 8   Height             18979 non-null  int64 
 9   Weight             18979 non-null  int64 
 10  foot               18979 non-null  object
 11  BOV                18979 non-null  int64 
 12  BP                 18979 non-null  object
 13  Growth             18979 non-null  int64 
 14  Value              18979 non-null  int64 
 15  Wage               18979 non-null  int64 
 16  Attacking          18979 non-null  int64

In [None]:
df.to_csv("fifa_21_cleaned.csv")

In [None]:
final_categorical_features = df.select_dtypes("object")
# T: transposes
# cat_summary_stats = categorical_features.describe().T
final_html_categorical = create_scrollable_table(final_categorical_features.head(), "categorical_features", "Categorical Features After Cleaning")

display(HTML(final_html_categorical))

Unnamed: 0,LongName,Nationality,Positions,Name,foot,BP,W/F,SM,A/W,D/W,IR,Team,Contract Duration,Joined_Month,Joined_Year
0,Lionel Messi,Argentina,CF RW ST,L. Messi,Left,RW,4 ★,4★,Medium,Low,5 ★,FC Barcelona,2004 ~ 2021,Jul 1,2004
1,C. Ronaldo dos Santos Aveiro,Portugal,LW ST,Cristiano Ronaldo,Right,ST,4 ★,5★,High,Low,5 ★,Juventus,2018 ~ 2022,Jul 10,2018
2,Jan Oblak,Slovenia,GK,J. Oblak,Right,GK,3 ★,1★,Medium,Medium,3 ★,Atlético Madrid,2014 ~ 2023,Jul 16,2014
3,Kevin De Bruyne,Belgium,CAM CM,K. De Bruyne,Right,CAM,5 ★,4★,High,High,4 ★,Manchester City,2015 ~ 2023,Aug 30,2015
4,Neymar da Silva Santos Jr.,Brazil,CAM LW,Neymar Jr,Right,LW,5 ★,5★,High,Medium,5 ★,Paris Saint-Germain,2017 ~ 2022,Aug 3,2017


In [None]:
final_numerical_features = df.select_dtypes("int64", "float64")

final_html_numerical = create_scrollable_table(final_numerical_features.head(), "categorical_features", "Numerical Features After Cleaning")

display(HTML(final_html_numerical))

Unnamed: 0,Age,↓OVA,POT,ID,Height,Weight,BOV,Growth,Value,Wage,Attacking,Crossing,Finishing,Heading Accuracy,Short Passing,Volleys,Skill,Dribbling,Curve,FK Accuracy,Long Passing,Ball Control,Movement,Acceleration,Sprint Speed,Agility,Reactions,Balance,Power,Shot Power,Jumping,Stamina,Strength,Long Shots,Mentality,Aggression,Interceptions,Positioning,Vision,Penalties,Composure,Defending,Marking,Standing Tackle,Sliding Tackle,Goalkeeping,GK Diving,GK Handling,GK Kicking,GK Positioning,GK Reflexes,Total Stats,Base Stats,PAC,SHO,PAS,DRI,DEF,PHY,Hits
0,33,93,93,158023,170,72,93,0,67500000,560000,429,85,95,70,91,88,470,96,93,94,91,96,451,91,80,91,94,95,389,86,68,72,69,94,347,44,40,93,95,75,96,91,32,35,24,54,6,11,15,14,8,2231,466,85,92,91,95,38,65,372
1,35,92,92,20801,187,83,92,0,46000000,220000,437,84,95,90,82,86,414,88,81,76,77,92,431,87,91,87,95,71,444,94,95,84,78,93,353,63,29,95,82,84,95,84,28,32,24,58,7,11,15,14,11,2221,464,89,93,81,89,35,77,344
2,27,91,93,200389,187,87,91,2,75000000,125000,95,13,11,15,43,13,109,12,13,14,40,30,307,43,60,67,88,49,268,59,78,41,78,12,140,34,19,11,65,11,68,57,27,12,18,437,87,92,78,90,90,1413,489,87,92,78,90,52,90,86
3,29,91,91,192985,180,69,91,0,87000000,370000,407,94,82,55,94,82,441,88,85,83,93,92,398,77,76,78,91,76,408,91,63,89,74,91,408,76,66,88,94,84,91,186,68,65,53,56,15,13,5,10,13,2304,485,76,86,93,88,64,78,163
4,28,91,91,190871,175,68,91,0,90000000,270000,408,85,87,62,87,87,448,95,88,89,81,95,453,94,89,96,91,83,357,80,62,81,50,84,356,51,36,87,90,92,93,94,35,30,29,59,9,9,15,15,11,2175,451,91,85,86,94,36,59,273
