# Predicting Football Player Value based on their FIFA 23 ratings

## Introduction
FIFA 23 is a football simulation video game published by Electronic Arts. It is the 30th and final installment in the FIFA series that is developed by EA Sports, and released worldwide on 30 September 2022 for PC, Nintendo Switch, PlayStation 4, PlayStation 5, Xbox One, Xbox Series X/S and Google Stadia.

In this 

## Dataset 
The data is collected from [Kaggle](https://www.kaggle.com/datasets/sanjeetsinghnaik/fifa-23-players-dataset): *Fifa 23 Players Dataset*

There are 17530 records in the dataset, and the data is described by 89 attributes. It contains the information regarding each player in the game including their personal information, physical attributes, ratings, and the value. The goal of this dataset is to predict a player's value based on its game ratings.

### Reading and loading the data into a Dataframe object
The first step is to call the .csv file on my Google Drive then mount the Drive folder to the notebook.

In [1]:
import pandas as pd

df = pd.read_csv("Fifa 23 Players Data.csv")

# To ensure the read is valid, I will print the first 5 data 
df.head()

Unnamed: 0,Known As,Full Name,Overall,Potential,Value(in Euro),Positions Played,Best Position,Nationality,Image Link,Age,...,LM Rating,CM Rating,RM Rating,LWB Rating,CDM Rating,RWB Rating,LB Rating,CB Rating,RB Rating,GK Rating
0,L. Messi,Lionel Messi,91,91,54000000,RW,CAM,Argentina,https://cdn.sofifa.net/players/158/023/23_60.png,35,...,91,88,91,67,66,67,62,53,62,22
1,K. Benzema,Karim Benzema,91,91,64000000,"CF,ST",CF,France,https://cdn.sofifa.net/players/165/153/23_60.png,34,...,89,84,89,67,67,67,63,58,63,21
2,R. Lewandowski,Robert Lewandowski,91,91,84000000,ST,ST,Poland,https://cdn.sofifa.net/players/188/545/23_60.png,33,...,86,83,86,67,69,67,64,63,64,22
3,K. De Bruyne,Kevin De Bruyne,91,91,107500000,"CM,CAM",CM,Belgium,https://cdn.sofifa.net/players/192/985/23_60.png,31,...,91,91,91,82,82,82,78,72,78,24
4,K. Mbappé,Kylian Mbappé,91,95,190500000,"ST,LW",ST,France,https://cdn.sofifa.net/players/231/747/23_60.png,23,...,92,84,92,70,66,70,66,57,66,21


In [2]:
df.shape

(18539, 89)

### Cleaning the Dataframe
In this step, we will ensure that all data in the dataset are valid, meaning there are no null values and no duplicate entries.

In [3]:
df = df.dropna()

In [4]:
df = df.drop_duplicates()

# After removing the duplicate entries, we are left with 18420 players.

In [5]:
df.dtypes

Known As          object
Full Name         object
Overall            int64
Potential          int64
Value(in Euro)     int64
                   ...  
RWB Rating         int64
LB Rating          int64
CB Rating          int64
RB Rating          int64
GK Rating          int64
Length: 89, dtype: object

### Data Processing
In this step, we will remove irrelevant columns that can't be used to determine a player's value, such as the nickname, image url, other positions player, release clause value, club position,  .

In [6]:
print(df.columns)

Index(['Known As', 'Full Name', 'Overall', 'Potential', 'Value(in Euro)',
       'Positions Played', 'Best Position', 'Nationality', 'Image Link', 'Age',
       'Height(in cm)', 'Weight(in kg)', 'TotalStats', 'BaseStats',
       'Club Name', 'Wage(in Euro)', 'Release Clause', 'Club Position',
       'Contract Until', 'Club Jersey Number', 'Joined On', 'On Loan',
       'Preferred Foot', 'Weak Foot Rating', 'Skill Moves',
       'International Reputation', 'National Team Name',
       'National Team Image Link', 'National Team Position',
       'National Team Jersey Number', 'Attacking Work Rate',
       'Defensive Work Rate', 'Pace Total', 'Shooting Total', 'Passing Total',
       'Dribbling Total', 'Defending Total', 'Physicality Total', 'Crossing',
       'Finishing', 'Heading Accuracy', 'Short Passing', 'Volleys',
       'Dribbling', 'Curve', 'Freekick Accuracy', 'LongPassing', 'BallControl',
       'Acceleration', 'Sprint Speed', 'Agility', 'Reactions', 'Balance',
       'Shot Powe

In [7]:
# Dropping irrelevant columns
df.drop(['Known As', 'Positions Played', 'Image Link', 'Club Name', 'Release Clause', 'Club Position',
       'Contract Until', 'Club Jersey Number', 'Joined On', 'On Loan', 'International Reputation', 
       'National Team Name', 'National Team Image Link', 'National Team Position',
       'National Team Jersey Number', 'ST Rating','LW Rating', 'LF Rating', 'CF Rating', 
       'RF Rating', 'RW Rating', 'CAM Rating', 'LM Rating', 'CM Rating', 'RM Rating', 'LWB Rating',
       'CDM Rating', 'RWB Rating', 'LB Rating', 'CB Rating', 'RB Rating',
       'GK Rating'], axis = 1, inplace = True)
df

Unnamed: 0,Full Name,Overall,Potential,Value(in Euro),Best Position,Nationality,Age,Height(in cm),Weight(in kg),TotalStats,...,Penalties,Composure,Marking,Standing Tackle,Sliding Tackle,Goalkeeper Diving,Goalkeeper Handling,GoalkeeperKicking,Goalkeeper Positioning,Goalkeeper Reflexes
0,Lionel Messi,91,91,54000000,CAM,Argentina,35,169,67,2190,...,75,96,20,35,24,6,11,15,14,8
1,Karim Benzema,91,91,64000000,CF,France,34,185,81,2147,...,84,90,43,24,18,13,11,5,5,7
2,Robert Lewandowski,91,91,84000000,ST,Poland,33,185,81,2205,...,90,88,35,42,19,15,6,12,8,10
3,Kevin De Bruyne,91,91,107500000,CM,Belgium,31,181,70,2303,...,83,89,68,65,53,15,13,5,10,13
4,Kylian Mbappé,91,95,190500000,ST,France,23,182,73,2177,...,80,88,26,34,32,13,5,7,11,6
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
18534,Darren Collins,47,56,110000,CAM,Republic of Ireland,21,174,68,1287,...,40,47,39,29,27,6,9,5,13,8
18535,Dejiang Yang,47,57,90000,CDM,China PR,17,175,60,1289,...,33,45,46,50,52,6,12,11,8,6
18536,Liam Mullan,47,67,130000,RM,Northern Ireland,18,170,65,1333,...,43,59,39,37,48,11,12,8,7,12
18537,Daithí McCallion,47,61,100000,CB,Republic of Ireland,17,178,65,1113,...,37,41,50,54,54,8,14,13,7,8


In [8]:
print(df.columns)

Index(['Full Name', 'Overall', 'Potential', 'Value(in Euro)', 'Best Position',
       'Nationality', 'Age', 'Height(in cm)', 'Weight(in kg)', 'TotalStats',
       'BaseStats', 'Wage(in Euro)', 'Preferred Foot', 'Weak Foot Rating',
       'Skill Moves', 'Attacking Work Rate', 'Defensive Work Rate',
       'Pace Total', 'Shooting Total', 'Passing Total', 'Dribbling Total',
       'Defending Total', 'Physicality Total', 'Crossing', 'Finishing',
       'Heading Accuracy', 'Short Passing', 'Volleys', 'Dribbling', 'Curve',
       'Freekick Accuracy', 'LongPassing', 'BallControl', 'Acceleration',
       'Sprint Speed', 'Agility', 'Reactions', 'Balance', 'Shot Power',
       'Jumping', 'Stamina', 'Strength', 'Long Shots', 'Aggression',
       'Interceptions', 'Positioning', 'Vision', 'Penalties', 'Composure',
       'Marking', 'Standing Tackle', 'Sliding Tackle', 'Goalkeeper Diving',
       'Goalkeeper Handling', ' GoalkeeperKicking', 'Goalkeeper Positioning',
       'Goalkeeper Reflexes'],
  

In [9]:
df.nunique()

Full Name                 18337
Overall                      45
Potential                    48
Value(in Euro)              257
Best Position                15
Nationality                 160
Age                          28
Height(in cm)                49
Weight(in kg)                54
TotalStats                 1411
BaseStats                   248
Wage(in Euro)               133
Preferred Foot                2
Weak Foot Rating              5
Skill Moves                   5
Attacking Work Rate           3
Defensive Work Rate           3
Pace Total                   70
Shooting Total               75
Passing Total                68
Dribbling Total              67
Defending Total              76
Physicality Total            62
Crossing                     88
Finishing                    92
Heading Accuracy             88
Short Passing                84
Volleys                      88
Dribbling                    92
Curve                        88
Freekick Accuracy            89
LongPass