# Soccoer Player Salary Prediction 2023
#### Python, Machine Learning
Aaron Xie
___

# Table of Contents

___

<a id="problem"></a>
# 1. The Problem

One of the biggest events around the world in 2022 is that Lionel Messi and argentina national football team won the World Cup. This winning solidfies the fact that Messi is one of the greatest soccer players of all time. This fact also raises the main questions: 
* Should Messi have the highest salary because of his glory honors and rich experience? 
* Messi is already 35; can his age and other factors drag down his salary? 
* When football manager hire a soccer player, how can they determine the player's salary?

### 1.1. Goals
* Conduct EDA on this data set.
* Use machine learning algorithms to predict a player's salary.
* Find the most accurate machine learning algorithm in this case.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import scipy
from sklearn.model_selection import train_test_split

<a id="preparation"></a>
# 2. Data Preparation

### Import the Data Set
Since it is either hard or impossible to get the real world data of soccer players. This project uses a fictional data set from the game FIFA 2023. The data set was published in [Kaggle](https://www.kaggle.com/datasets/cashncarry/fifa-23-complete-player-dataset) by ALEX.

In [3]:
df = pd.read_csv(r"C:\Users\zong0\OneDrive\Documents\Data_Science\My_Projects\GitHub\Soccer-Player-Salary-Prediction\players_fifa23.csv")
train_df, test_df = train_test_split(df, test_size = 0.2)

### Describe the Data

In [5]:
train_df.shape

(14831, 90)

This data set has 89 features! Checking columns.

In [6]:
print(train_df.columns.values)

['ID' 'Name' 'FullName' 'Age' 'Height' 'Weight' 'PhotoUrl' 'Nationality'
 'Overall' 'Potential' 'Growth' 'TotalStats' 'BaseStats' 'Positions'
 'BestPosition' 'Club' 'ValueEUR' 'WageEUR' 'ReleaseClause' 'ClubPosition'
 'ContractUntil' 'ClubNumber' 'ClubJoined' 'OnLoad' 'NationalTeam'
 'NationalPosition' 'NationalNumber' 'PreferredFoot' 'IntReputation'
 'WeakFoot' 'SkillMoves' 'AttackingWorkRate' 'DefensiveWorkRate'
 'PaceTotal' 'ShootingTotal' 'PassingTotal' 'DribblingTotal'
 'DefendingTotal' 'PhysicalityTotal' 'Crossing' 'Finishing'
 'HeadingAccuracy' 'ShortPassing' 'Volleys' 'Dribbling' 'Curve'
 'FKAccuracy' 'LongPassing' 'BallControl' 'Acceleration' 'SprintSpeed'
 'Agility' 'Reactions' 'Balance' 'ShotPower' 'Jumping' 'Stamina'
 'Strength' 'LongShots' 'Aggression' 'Interceptions' 'Positioning'
 'Vision' 'Penalties' 'Composure' 'Marking' 'StandingTackle'
 'SlidingTackle' 'GKDiving' 'GKHandling' 'GKKicking' 'GKPositioning'
 'GKReflexes' 'STRating' 'LWRating' 'LFRating' 'CFRating' 'RFRat

In [15]:
pd.set_option('display.max_columns', None) # default is 20
train_df.head()

Unnamed: 0,ID,Name,FullName,Age,Height,Weight,PhotoUrl,Nationality,Overall,Potential,Growth,TotalStats,BaseStats,Positions,BestPosition,Club,ValueEUR,WageEUR,ReleaseClause,ClubPosition,ContractUntil,ClubNumber,ClubJoined,OnLoad,NationalTeam,NationalPosition,NationalNumber,PreferredFoot,IntReputation,WeakFoot,SkillMoves,AttackingWorkRate,DefensiveWorkRate,PaceTotal,ShootingTotal,PassingTotal,DribblingTotal,DefendingTotal,PhysicalityTotal,Crossing,Finishing,HeadingAccuracy,ShortPassing,Volleys,Dribbling,Curve,FKAccuracy,LongPassing,BallControl,Acceleration,SprintSpeed,Agility,Reactions,Balance,ShotPower,Jumping,Stamina,Strength,LongShots,Aggression,Interceptions,Positioning,Vision,Penalties,Composure,Marking,StandingTackle,SlidingTackle,GKDiving,GKHandling,GKKicking,GKPositioning,GKReflexes,STRating,LWRating,LFRating,CFRating,RFRating,RWRating,CAMRating,LMRating,CMRating,RMRating,LWBRating,CDMRating,RWBRating,LBRating,CBRating,RBRating,GKRating
14098,205315,E. Murray,Euan Murray,28,190,79,https://cdn.sofifa.net/players/205/315/23_60.png,Scotland,61,61,0,1458,311,CB,CB,Hartlepool United,275000,2000,481000,SUB,2024.0,5.0,2022,False,Not in team,,,Right,1,2,2,Medium,Medium,53,35,46,46,59,72,38,33,61,53,27,42,28,29,48,47,52,54,44,50,73,49,74,69,76,24,66,58,31,48,56,53,59,60,60,12,8,11,10,8,47,43,44,44,44,43,46,48,51,48,56,58,56,57,61,57,16
16862,267472,O. Smyth,Oisin Smyth,22,181,72,https://cdn.sofifa.net/players/267/472/23_60.png,Northern Ireland,57,66,9,1565,337,"CAM,CM",CAM,Oxford United,375000,2000,731000,SUB,2025.0,25.0,2022,False,Not in team,,,Right,1,3,3,Medium,Medium,66,46,57,60,47,61,49,46,42,61,43,59,42,49,60,57,63,68,72,52,73,46,55,60,65,48,52,45,46,61,50,56,48,50,42,13,13,14,13,8,54,56,55,55,55,56,59,59,58,59,55,56,55,54,53,54,18
10077,266687,Tiago Ribeiro,Tiago Miguel Hora Ribeiro,20,185,73,https://cdn.sofifa.net/players/266/687/23_60.png,Portugal,65,81,16,1826,380,CDM,CM,Valencia CF,1800000,7000,0,RES,2023.0,41.0,2020,True,Not in team,,,Left,1,2,2,Medium,Medium,63,63,69,61,62,62,68,53,56,69,54,59,76,77,67,65,64,62,63,59,58,77,56,56,65,74,63,63,53,66,64,63,61,65,63,9,7,15,6,13,63,62,62,62,62,62,65,64,66,64,65,66,65,65,65,65,17
11798,231145,J. Pritchard,Joe Pritchard,25,174,66,https://cdn.sofifa.net/players/231/145/23_60.png,England,64,65,1,1775,379,"RM,RWB,LWB",RM,Accrington Stanley,700000,2000,1300000,SUB,2023.0,10.0,2019,False,Not in team,,,Right,1,3,3,High,High,73,61,62,66,57,60,62,60,52,63,53,65,62,60,57,63,75,72,78,59,81,63,54,64,58,66,59,57,57,64,54,64,58,59,56,8,6,9,10,11,63,64,63,63,63,64,65,65,63,65,64,62,64,63,60,63,16
13859,266663,J. Romero,Julián Romero,18,178,73,https://cdn.sofifa.net/players/266/663/23_60.png,Argentina,62,74,12,1653,350,ST,CAM,Club Atlético Independiente,925000,1000,1600000,RES,2022.0,44.0,2021,False,Not in team,,,Right,1,3,3,Medium,Medium,76,61,56,66,34,57,49,61,52,59,59,63,57,50,53,62,82,71,89,63,81,64,49,54,56,58,65,33,65,61,49,65,30,34,32,12,9,7,13,11,64,63,63,63,63,63,65,64,59,64,52,50,52,50,46,50,19


In [16]:
train_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 14831 entries, 14098 to 14864
Data columns (total 90 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   ID                 14831 non-null  int64  
 1   Name               14831 non-null  object 
 2   FullName           14831 non-null  object 
 3   Age                14831 non-null  int64  
 4   Height             14831 non-null  int64  
 5   Weight             14831 non-null  int64  
 6   PhotoUrl           14831 non-null  object 
 7   Nationality        14831 non-null  object 
 8   Overall            14831 non-null  int64  
 9   Potential          14831 non-null  int64  
 10  Growth             14831 non-null  int64  
 11  TotalStats         14831 non-null  int64  
 12  BaseStats          14831 non-null  int64  
 13  Positions          14831 non-null  object 
 14  BestPosition       14831 non-null  object 
 15  Club               14831 non-null  object 
 16  ValueEUR          