# Data Preprocessing

This notebook clarifies the nature of our data and the series of preprocessing steps implemented. We've adopted the Medallion Architecture as our guiding principle for data refinement. Within this structure, every data source progresses through three distinct layers: Bronze, Silver, and Gold. Each layer serves its own unique purpose in the pipeline.

## Overview of the Layers:

### 1. **Bronze Layer (Raw Data)**
- **Nature**: This is the landing area for our raw data, ingested directly from the source without any alterations.
- **Purpose**: To store an immutable, 1:1 replica of the source data. It serves as the foundational bedrock upon which further layers are built.

### 2. **Silver Layer (Cleaned Data)**
- **Nature**: Data in this layer has been cleaned, enriched, and is stored in a format suitable for analysis. Any inconsistencies, missing values, or anomalies from the Bronze layer have been addressed here.
- **Purpose**: To have a reliable, single version of the truth which is suitable for analysis but without any specific business logic applied. This is the primary layer for data scientists and analysts to query against.

### 3. **Gold Layer (Business-Ready Data)**
- **Nature**: This layer houses data that has been aggregated, enriched, and optimized for specific business use-cases. It is derived from the Silver layer.
- **Purpose**: To provide business-ready datasets for driving insights, reports, visualizations, and machine learning models. This layer is tailored to end-users and specific analytical objectives.


In [2]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
import seaborn as sns
from scipy.stats import normaltest


### Player skill dataset

In [46]:
# Bronze Layer

raw_player_ratings = pd.read_csv('../data/FUT_player_data.csv')
pd.set_option('display.max_columns', None)
raw_player_ratings.head()


Unnamed: 0,id,futbin_id,name,height,weight,age,club,league,nation,rarity,position,foot,attackWorkRate,defenseWorkRate,cardColor,overallRating,pace,shooting,passing,dribbling,defending,physicality,pace_acceleration,pace_sprintSpeed,shooting_positioning,shooting_finishing,shooting_shotPower,shooting_longShots,shooting_volleys,shooting_penalties,passing_vision,passing_crossing,passing_freeKickAccuracy,passing_shortPassing,passing_longPassing,passing_curve,dribbling_agility,dribbling_balance,dribbling_reactions,dribbling_ballControl,dribbling_dribbling,dribbling_composure,defending_interceptions,defending_headingAccuracy,defending_standingTackle,defending_slidingTackle,defending_defenseAwareness,phsyicality_jumping,physicality_stamina,physicality_strength,physicality_aggression,goalkeeper_diving,goalkeeper_handling,goalkeeper_kicking,goalkeeper_positioning,goalkeeper_reflexes,goalkeeper_speed
0,18949,54231,Kylian Mbappé,182,73,24,73,16,18,16,ST,Right,High,Low,gold,99,99,98,92,99,45,87,99.0,99.0,99.0,99.0,99.0,94.0,95.0,93.0,95.0,90.0,80.0,97.0,82.0,92.0,99.0,91.0,99.0,99.0,99.0,99.0,48.0,87.0,43.0,40.0,33.0,88.0,99.0,87.0,73.0,,,,,,99
1,18981,54251,Karim Benzema,185,81,35,607,350,18,164,CF,Right,High,Med,gold,99,97,99,93,98,50,97,97.0,97.0,99.0,99.0,99.0,97.0,99.0,97.0,99.0,84.0,83.0,98.0,86.0,93.0,91.0,90.0,99.0,99.0,99.0,99.0,55.0,99.0,33.0,25.0,56.0,99.0,99.0,99.0,90.0,,,,,,97
2,18982,54249,Zinedine Zidane,185,77,51,112658,2118,18,171,CAM,Right,Med,Med,gold,99,92,96,99,97,87,90,93.0,92.0,96.0,95.0,95.0,99.0,97.0,94.0,99.0,99.0,99.0,99.0,99.0,99.0,88.0,90.0,99.0,99.0,99.0,99.0,95.0,99.0,88.0,72.0,83.0,87.0,94.0,92.0,83.0,,,,,,92
3,18730,54005,Pelé,173,70,82,112658,2118,54,153,LW,Right,High,Med,gold,99,96,97,94,99,61,78,96.0,96.0,98.0,99.0,95.0,95.0,96.0,94.0,98.0,91.0,90.0,97.0,89.0,90.0,97.0,96.0,99.0,99.0,99.0,99.0,68.0,96.0,54.0,50.0,56.0,90.0,91.0,78.0,61.0,,,,,,96
4,19001,54277,Robert Lewandowski,185,81,35,241,53,37,164,ST,Right,High,Med,gold,99,97,99,92,97,53,99,98.0,97.0,99.0,99.0,99.0,99.0,99.0,99.0,94.0,83.0,99.0,98.0,82.0,92.0,87.0,92.0,99.0,99.0,97.0,99.0,60.0,99.0,52.0,24.0,42.0,99.0,97.0,99.0,99.0,,,,,,97


In [None]:
# List of numerical columns

numerical_cols = [
    'height', 'weight', 'age', 'overallRating', 'pace', 'shooting',
    'passing', 'dribbling', 'defending', 'physicality',
    'pace_acceleration', 'pace_sprintSpeed', 'shooting_positioning',
    'shooting_finishing', 'shooting_shotPower', 'shooting_longShots',
    'shooting_volleys', 'shooting_penalties', 'passing_vision',
    'passing_crossing', 'passing_freeKickAccuracy', 'passing_shortPassing',
    'passing_longPassing', 'passing_curve', 'dribbling_agility',
    'dribbling_balance', 'dribbling_reactions', 'dribbling_ballControl',
    'dribbling_dribbling', 'dribbling_composure', 'defending_interceptions',
    'defending_headingAccuracy', 'defending_standingTackle',
    'defending_slidingTackle', 'defending_defenseAwareness',
    'phsyicality_jumping', 'physicality_stamina', 'physicality_strength',
    'physicality_aggression', 'goalkeeper_diving', 'goalkeeper_handling',
    'goalkeeper_kicking', 'goalkeeper_positioning', 'goalkeeper_reflexes',
    'goalkeeper_speed'
]


# List of categorical columns
categorical_cols = [
     'position', 'foot',
    'attackWorkRate', 'defenseWorkRate', 'cardColor'
]

In [38]:
# Silver Layer

silver_player_ratings = raw_player_ratings.copy()

# substitute the foreign keys with the actual values 

# read csv files as dicts
club_ids = pd.read_csv('../data/club_ids.csv', index_col = 'id')
league_ids = pd.read_csv('../data/league_ids.csv', index_col = 'id')
nation_ids = pd.read_csv('../data/nation_ids.csv', index_col = 'id')

club_ids =  {k:v[0] for k, v in zip(club_ids.index, club_ids.values)}
league_ids = {k:v[0] for k, v in zip(league_ids.index, league_ids.values)}
nation_ids = {k:v[0] for k, v in zip(nation_ids.index, nation_ids.values)}

# this is a master dictionary 
master_dict = {'club': club_ids, 'league': league_ids, 'nation': nation_ids}

# replace the values 
silver_player_ratings = silver_player_ratings.replace(master_dict)

# Handle missing values for numerical columns with mean of that column
for col in numerical_cols:
    silver_player_ratings[col].fillna(silver_player_ratings[col].mean(), inplace=True)

# Handle missing values for categorical columns with mode (most frequent value) of that column
for col in categorical_cols:
    silver_player_ratings[col].fillna(silver_player_ratings[col].mode()[0], inplace=True)

# Drop Icon players
silver_player_ratings = silver_player_ratings[silver_player_ratings['league'] != 'Icons']


silver_player_ratings


Unnamed: 0,id,futbin_id,name,height,weight,age,club,league,nation,rarity,position,foot,attackWorkRate,defenseWorkRate,cardColor,overallRating,pace,shooting,passing,dribbling,defending,physicality,pace_acceleration,pace_sprintSpeed,shooting_positioning,shooting_finishing,shooting_shotPower,shooting_longShots,shooting_volleys,shooting_penalties,passing_vision,passing_crossing,passing_freeKickAccuracy,passing_shortPassing,passing_longPassing,passing_curve,dribbling_agility,dribbling_balance,dribbling_reactions,dribbling_ballControl,dribbling_dribbling,dribbling_composure,defending_interceptions,defending_headingAccuracy,defending_standingTackle,defending_slidingTackle,defending_defenseAwareness,phsyicality_jumping,physicality_stamina,physicality_strength,physicality_aggression,goalkeeper_diving,goalkeeper_handling,goalkeeper_kicking,goalkeeper_positioning,goalkeeper_reflexes,goalkeeper_speed
0,18949,54231,Kylian Mbappé,182,73,24,Paris SG,Ligue 1,France,16,ST,Right,High,Low,gold,99,99,98,92,99,45,87,99.000000,99.000000,99.000000,99.000000,99.000000,94.000000,95.000000,93.000000,95.000000,90.000000,80.000000,97.000000,82.000000,92.000000,99.00000,91.000000,99.00000,99.000000,99.000000,99.0,48.000000,87.000000,43.000000,40.000000,33.000000,88.000000,99.000000,87.000000,73.000000,67.048519,64.859551,63.755363,65.376404,68.046476,99
1,18981,54251,Karim Benzema,185,81,35,Al Ittihad,MBS Pro League (SAU 1),France,164,CF,Right,High,Med,gold,99,97,99,93,98,50,97,97.000000,97.000000,99.000000,99.000000,99.000000,97.000000,99.000000,97.000000,99.000000,84.000000,83.000000,98.000000,86.000000,93.000000,91.00000,90.000000,99.00000,99.000000,99.000000,99.0,55.000000,99.000000,33.000000,25.000000,56.000000,99.000000,99.000000,99.000000,90.000000,67.048519,64.859551,63.755363,65.376404,68.046476,97
4,19001,54277,Robert Lewandowski,185,81,35,FC Barcelona,LaLiga Santander,Poland,164,ST,Right,High,Med,gold,99,97,99,92,97,53,99,98.000000,97.000000,99.000000,99.000000,99.000000,99.000000,99.000000,99.000000,94.000000,83.000000,99.000000,98.000000,82.000000,92.000000,87.00000,92.000000,99.00000,99.000000,97.000000,99.0,60.000000,99.000000,52.000000,24.000000,42.000000,99.000000,97.000000,99.000000,99.000000,67.048519,64.859551,63.755363,65.376404,68.046476,97
5,19003,54275,Erling Haaland,195,94,23,Manchester City,Premier League,Norway,164,ST,Left,High,Med,gold,99,99,99,88,94,63,99,99.000000,99.000000,99.000000,99.000000,99.000000,99.000000,99.000000,90.000000,96.000000,62.000000,80.000000,96.000000,90.000000,99.000000,91.00000,92.000000,99.00000,97.000000,92.000000,99.0,56.000000,99.000000,69.000000,38.000000,57.000000,91.000000,99.000000,99.000000,99.000000,67.048519,64.859551,63.755363,65.376404,68.046476,99
6,19004,54274,Gianluigi Donnarumma,196,90,24,Paris SG,Ligue 1,Italy,164,GK,Right,Med,High,gold,99,0,0,0,0,0,0,68.066656,68.147454,55.533568,51.065535,61.969581,51.730281,47.307503,51.847422,58.530099,53.992422,47.120931,63.440068,57.733643,52.584587,66.94786,66.941669,65.43169,63.142011,60.557904,65.0,49.955171,55.798484,51.261874,48.711869,49.854147,67.265183,67.075675,67.807237,59.355534,99.000000,95.000000,92.000000,98.000000,99.000000,77
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19030,14789,47568,Jiahui Zhang,182,0,20,Hebei CFFC,Chinese FA Super L. (CHN 1),China PR,0,CM,Right,Med,Med,bronze,46,54,37,46,46,43,51,48.000000,58.000000,52.000000,30.000000,51.000000,33.000000,36.000000,42.000000,44.000000,38.000000,35.000000,53.000000,50.000000,33.000000,61.00000,62.000000,41.00000,45.000000,42.000000,43.0,40.000000,49.000000,45.000000,46.000000,40.000000,51.000000,43.000000,54.000000,51.000000,67.048519,64.859551,63.755363,65.376404,68.046476,53
19031,14790,47577,Ziye Zhao,180,0,19,Hebei CFFC,Chinese FA Super L. (CHN 1),China PR,0,RW,Right,Med,Med,bronze,46,63,48,40,47,24,44,67.000000,60.000000,40.000000,49.000000,64.000000,31.000000,43.000000,54.000000,42.000000,48.000000,35.000000,38.000000,33.000000,39.000000,54.00000,66.000000,39.00000,45.000000,47.000000,40.0,21.000000,36.000000,21.000000,25.000000,23.000000,41.000000,42.000000,52.000000,26.000000,67.048519,64.859551,63.755363,65.376404,68.046476,63
19032,15077,47296,Antonio D'Silva,182,0,23,Odisha FC,Indian Super League (IND 1),India,0,GK,Right,Med,Med,bronze,46,19,13,17,28,11,32,17.000000,21.000000,6.000000,8.000000,35.000000,10.000000,10.000000,14.000000,23.000000,14.000000,13.000000,21.000000,21.000000,14.000000,36.00000,48.000000,35.00000,14.000000,8.000000,38.0,13.000000,11.000000,13.000000,12.000000,7.000000,53.000000,18.000000,33.000000,26.000000,51.000000,45.000000,46.000000,41.000000,50.000000,19
19033,13807,47579,Junjie Wu,188,0,20,Guangzhou R&F,Chinese FA Super L. (CHN 1),China PR,0,LB,Left,Med,Med,bronze,46,55,25,29,34,48,57,54.000000,56.000000,31.000000,19.000000,33.000000,25.000000,25.000000,31.000000,31.000000,30.000000,26.000000,30.000000,25.000000,24.000000,45.00000,48.000000,42.00000,32.000000,30.000000,35.0,46.000000,42.000000,51.000000,45.000000,50.000000,56.000000,53.000000,64.000000,46.000000,67.048519,64.859551,63.755363,65.376404,68.046476,55


### Transfer Fees

In [41]:
# Bronze Layer

top_5_leagues = [
    'combined_premier-league.csv', 
    'combined_serie-a.csv', 
    'combined_laliga.csv', 
    'combined_1-bundesliga.csv', 
    'combined_ligue-1.csv'
    ]

dfs  = []
for league in top_5_leagues:
    dfs.append(pd.read_csv(f'../data/{league}'))

bronze_transfer_fees = pd.concat(dfs, ignore_index=True)
print(bronze_transfer_fees.columns)

bronze_transfer_fees = bronze_transfer_fees[bronze_transfer_fees['season'] >= 2016]
bronze_transfer_fees

Index(['club', 'name', 'age', 'nationality', 'position', 'short_pos',
       'market_value', 'dealing_club', 'dealing_country', 'fee', 'movement',
       'window', 'league', 'season', 'is_loan', 'loan_status', 'Year'],
      dtype='object')


Unnamed: 0,club,name,age,nationality,position,short_pos,market_value,dealing_club,dealing_country,fee,movement,window,league,season,is_loan,loan_status,Year
13630,AFC Bournemouth,Jordon Ibe,20.0,England,Right Winger,RW,7000000.0,Liverpool,England,18000000.0,in,summer,Premier League,2016,False,,2016
13631,AFC Bournemouth,Lewis Cook,19.0,England,Central Midfield,CM,4000000.0,Leeds,England,7000000.0,in,summer,Premier League,2016,False,,2016
13632,AFC Bournemouth,Lys Mousset,20.0,France,Centre-Forward,CF,400000.0,AC Le Havre,France,6500000.0,in,summer,Premier League,2016,False,,2016
13633,AFC Bournemouth,Brad Smith,22.0,Australia,Left-Back,LB,100000.0,Liverpool,England,3600000.0,in,summer,Premier League,2016,False,,2016
13634,AFC Bournemouth,Jack Wilshere,24.0,England,Central Midfield,CM,23000000.0,Arsenal,England,2350000.0,in,summer,Premier League,2016,True,loan with fee,2016
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
77091,Stade Rennais FC,Jérémy Gélin,23.0,France,Centre-Back,CB,4800000.0,Royal Antwerp,Belgium,0.0,out,summer,Ligue 1,2020,True,free loan,2020
77092,Stade Rennais FC,Riffi Mandanda,27.0,DR Congo,Goalkeeper,GK,325000.0,Without Club,,,out,summer,Ligue 1,2020,False,,2020
77093,Stade Rennais FC,Joris Gnagnon,23.0,France,Centre-Back,CB,4800000.0,Sevilla FC,Spain,0.0,out,summer,Ligue 1,2020,True,end of loan,2020
77094,Stade Rennais FC,M'Baye Niang,26.0,Senegal,Centre-Forward,CF,10000000.0,Ahli,Saudi Arabia,1500000.0,out,winter,Ligue 1,2020,True,loan with fee,2020


In [42]:
# Silver Layer
silver_transfer_fees = bronze_transfer_fees.copy()

# Fill NaN values
silver_transfer_fees['market_value'].fillna(0, inplace=True)
silver_transfer_fees['dealing_club'].fillna("Unknown", inplace=True)
silver_transfer_fees['dealing_country'].fillna("Unknown", inplace=True)
silver_transfer_fees['fee'].fillna(0, inplace=True)
silver_transfer_fees['movement'].fillna("Unknown", inplace=True)
silver_transfer_fees['window'].fillna("Unknown", inplace=True)
silver_transfer_fees['loan_status'].fillna("Not Applicable", inplace=True)

# Standardize String Values
silver_transfer_fees['club'] = silver_transfer_fees['club'].str.title()
silver_transfer_fees['name'] = silver_transfer_fees['name'].str.title()
silver_transfer_fees['position'] = silver_transfer_fees['position'].str.title()
silver_transfer_fees['short_pos'] = silver_transfer_fees['short_pos'].str.upper()
silver_transfer_fees['nationality'] = silver_transfer_fees['nationality'].str.title()
silver_transfer_fees['dealing_club'] = silver_transfer_fees['dealing_club'].str.title()
silver_transfer_fees['dealing_country'] = silver_transfer_fees['dealing_country'].str.title()
silver_transfer_fees['league'] = silver_transfer_fees['league'].str.title()

# Derive Age Group
silver_transfer_fees['age_group'] = pd.cut(silver_transfer_fees['age'], bins=[0, 20, 25, 30, 100], labels=['<20', '20-25', '25-30', '30+'])


# Filter out free transfers
silver_transfer_fees = silver_transfer_fees[silver_transfer_fees['fee'] > 0.0]


silver_transfer_fees

Unnamed: 0,club,name,age,nationality,position,short_pos,market_value,dealing_club,dealing_country,fee,movement,window,league,season,is_loan,loan_status,Year,age_group
13630,Afc Bournemouth,Jordon Ibe,20.0,England,Right Winger,RW,7000000.0,Liverpool,England,18000000.0,in,summer,Premier League,2016,False,Not Applicable,2016,<20
13631,Afc Bournemouth,Lewis Cook,19.0,England,Central Midfield,CM,4000000.0,Leeds,England,7000000.0,in,summer,Premier League,2016,False,Not Applicable,2016,<20
13632,Afc Bournemouth,Lys Mousset,20.0,France,Centre-Forward,CF,400000.0,Ac Le Havre,France,6500000.0,in,summer,Premier League,2016,False,Not Applicable,2016,<20
13633,Afc Bournemouth,Brad Smith,22.0,Australia,Left-Back,LB,100000.0,Liverpool,England,3600000.0,in,summer,Premier League,2016,False,Not Applicable,2016,20-25
13634,Afc Bournemouth,Jack Wilshere,24.0,England,Central Midfield,CM,23000000.0,Arsenal,England,2350000.0,in,summer,Premier League,2016,True,loan with fee,2016,20-25
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
77079,Stade Rennais Fc,Souleyman Doumbia,23.0,Cote D'Ivoire,Left-Back,LB,1600000.0,Sco Angers,France,3000000.0,out,summer,Ligue 1,2020,False,Not Applicable,2020,20-25
77080,Stade Rennais Fc,Lucas Da Cunha,19.0,France,Left Winger,LW,1800000.0,Ogc Nice,France,1000000.0,out,summer,Ligue 1,2020,False,Not Applicable,2020,<20
77081,Stade Rennais Fc,Denis Will Poha,23.0,France,Central Midfield,CM,1200000.0,Vit. Guimarães,Portugal,300000.0,out,summer,Ligue 1,2020,False,Not Applicable,2020,20-25
77094,Stade Rennais Fc,M'Baye Niang,26.0,Senegal,Centre-Forward,CF,10000000.0,Ahli,Saudi Arabia,1500000.0,out,winter,Ligue 1,2020,True,loan with fee,2020,25-30


## Unified Silver

In [43]:
silver_transfer_fees_short = silver_transfer_fees[['name','fee']]
silver_transfer_fees_short

Unnamed: 0,name,fee
13630,Jordon Ibe,18000000.0
13631,Lewis Cook,7000000.0
13632,Lys Mousset,6500000.0
13633,Brad Smith,3600000.0
13634,Jack Wilshere,2350000.0
...,...,...
77079,Souleyman Doumbia,3000000.0
77080,Lucas Da Cunha,1000000.0
77081,Denis Will Poha,300000.0
77094,M'Baye Niang,1500000.0


In [44]:
merged_df = pd.merge(silver_transfer_fees_short, silver_player_ratings, on='name', how='left', )
merged_df

Unnamed: 0,name,fee,id,futbin_id,height,weight,age,club,league,nation,rarity,position,foot,attackWorkRate,defenseWorkRate,cardColor,overallRating,pace,shooting,passing,dribbling,defending,physicality,pace_acceleration,pace_sprintSpeed,shooting_positioning,shooting_finishing,shooting_shotPower,shooting_longShots,shooting_volleys,shooting_penalties,passing_vision,passing_crossing,passing_freeKickAccuracy,passing_shortPassing,passing_longPassing,passing_curve,dribbling_agility,dribbling_balance,dribbling_reactions,dribbling_ballControl,dribbling_dribbling,dribbling_composure,defending_interceptions,defending_headingAccuracy,defending_standingTackle,defending_slidingTackle,defending_defenseAwareness,phsyicality_jumping,physicality_stamina,physicality_strength,physicality_aggression,goalkeeper_diving,goalkeeper_handling,goalkeeper_kicking,goalkeeper_positioning,goalkeeper_reflexes,goalkeeper_speed
0,Jordon Ibe,18000000.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,Lewis Cook,7000000.0,8565.0,31399.0,175.0,0.0,26.0,AFC Bournemouth,Premier League,England,1.0,CM,Right,Med,Med,silver,74.0,60.0,63.0,74.0,77.0,66.0,66.0,66.0,55.0,65.0,61.0,66.0,65.0,57.0,61.0,74.0,70.0,65.0,77.0,77.0,71.0,75.0,83.0,73.0,78.0,77.0,73.0,69.0,50.0,72.0,68.0,64.0,68.0,68.0,61.0,77.0,67.048519,64.859551,63.755363,65.376404,68.046476,60.0
2,Lys Mousset,6500000.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,Brad Smith,3600000.0,5150.0,29959.0,177.0,0.0,29.0,D.C. United,Major League Soccer,Australia,0.0,LWB,Left,High,High,silver,67.0,82.0,49.0,60.0,65.0,61.0,66.0,80.0,83.0,63.0,50.0,52.0,49.0,23.0,40.0,59.0,63.0,27.0,63.0,60.0,62.0,78.0,74.0,64.0,64.0,63.0,61.0,60.0,50.0,62.0,61.0,64.0,60.0,73.0,63.0,68.0,67.048519,64.859551,63.755363,65.376404,68.046476,81.0
4,Jack Wilshere,2350000.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7743,Lucas Da Cunha,1000000.0,1275.0,36177.0,176.0,0.0,22.0,OGC Nice,Ligue 1,France,0.0,RM,Left,High,Med,silver,72.0,76.0,68.0,68.0,74.0,37.0,55.0,77.0,75.0,70.0,66.0,73.0,72.0,60.0,62.0,70.0,69.0,57.0,70.0,61.0,68.0,79.0,76.0,65.0,75.0,74.0,66.0,33.0,56.0,40.0,33.0,31.0,54.0,62.0,54.0,50.0,67.048519,64.859551,63.755363,65.376404,68.046476,76.0
7744,Denis Will Poha,300000.0,11285.0,32228.0,173.0,0.0,26.0,FC Sion,Raiffeisen Super L. (SUI 1),France,0.0,CM,Right,Med,Med,silver,65.0,73.0,61.0,65.0,67.0,63.0,72.0,77.0,69.0,46.0,57.0,68.0,68.0,49.0,61.0,63.0,60.0,66.0,69.0,67.0,64.0,73.0,73.0,61.0,67.0,67.0,64.0,64.0,55.0,65.0,63.0,64.0,76.0,76.0,70.0,70.0,67.048519,64.859551,63.755363,65.376404,68.046476,73.0
7745,M'Baye Niang,1500000.0,998.0,29427.0,188.0,0.0,28.0,AJ Auxerre,Ligue 1,Senegal,1.0,ST,Right,High,Low,silver,73.0,74.0,75.0,66.0,71.0,31.0,68.0,70.0,78.0,72.0,72.0,87.0,73.0,71.0,75.0,70.0,65.0,66.0,68.0,58.0,70.0,70.0,65.0,68.0,72.0,72.0,72.0,30.0,68.0,24.0,20.0,30.0,74.0,49.0,84.0,52.0,67.048519,64.859551,63.755363,65.376404,68.046476,74.0
7746,Georginio Rutter,500000.0,17433.0,51288.0,182.0,83.0,21.0,Leeds United,Premier League,France,0.0,ST,Left,Med,Med,gold,75.0,77.0,74.0,59.0,76.0,24.0,64.0,76.0,78.0,76.0,77.0,74.0,69.0,68.0,68.0,62.0,52.0,45.0,66.0,51.0,56.0,77.0,79.0,72.0,76.0,77.0,69.0,12.0,73.0,26.0,17.0,17.0,76.0,66.0,74.0,35.0,67.048519,64.859551,63.755363,65.376404,68.046476,77.0


### Gold Layer

In [None]:
# Gold Layer

gold_player_rating = silver_player_ratings.copy()


# Standardize numerical columns
scaler = StandardScaler()
gold_player_rating[numerical_cols] = scaler.fit_transform(gold_player_rating[numerical_cols])

# One-hot encode categorical columns
player_ratings_scaled = pd.get_dummies(gold_player_rating, columns=categorical_cols)
player_ratings_scaled