# Predicting the Best Soccer Players

#### Joshua Chen

## Table of Contents

## Introduction

Ever since I was little, I've followed sports. Collecting cards, reading up on the news and watching many games as most kids do. And one thing that is constantly discussed in sports is the data, especially more so in recent years as technology and models have advanced. The prime example being the infamous Moneyball A's. But in recent years, if you read many sports articles, they always mention statistics and data. For example, the 2017 Superbowl between the Patriots and the Falcons where the Patriots came back from a .02% chance of winning (according the ESPN's win-probability graph). The models and statistics and unlikelihood of the comeback were talked about for months, especially in the highly data driven NFL, where every play can be broken down and analyzed thoroughly. The same can be said for basketball, baseball, and tennis. But one sport where this fails is soccer.

Soccer has been the "problem child" of sports data science as the game was always considered too complicated and too fluid to be analyzed. Many managers and coaches relied on instinct and feel for the game and still rely on these traits. But slowly over time, this has been changing. I'm a huge Liverpool fan and earlier in the year read an article about how Liverpool has gone from mediocre over the past few years to completely dominant with the help of their analytics department (https://www.nytimes.com/2019/05/22/magazine/soccer-data-liverpool.html). Many of Liverpool's world-class bargain signings came from their analysis of the data and statistics that people can't see. 

This is the inspiration and the idea behind this tutorial. Can a model be created to predict which players will become world-class players? In this tutorial, I'll be taking data from FIFA's assessment of players over the past 5 years and create a model to try to predict player's current level of play. I'll compare my model with FIFA's most recent assessment as well as the player's current in-game form. I hope that this tutorial show fans that data can be used to help assess players and perhaps get the more data-driven people who aren't soccer fans to look into cracking one of the hardest sports to analyze through data.

### Set-up

To start we will be using different libraries to help us retrieve, visualize and analyze the data. To name a few, we will be using Pandas and Numpy to help process the data. Matplotlib and Seaborn will be used to visualize the data and Scikit will be used to help create our model and test our model.

In [9]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

## Getting the Data:

The first step will be to retrieve our data and to process it in a way that it can be used for our model. As stated before, in this tutorial we will be using the FIFA rating data which can be found <a href = "https://www.kaggle.com/stefanoleone992/fifa-20-complete-player-dataset"> here. </a>

I decided to use this data as it is a very comprehensive list of players and one of the most easily obtainable. It provides similar metrics over the last few years as FIFA hasn't changed the metrics it collects on players. Another reason is that data for soccer isn't that readily available. Most data is collected by individuals who want to mess with it and can be found via Twitter graphs or is super expensive and professional (<a href = "https://www.optasports.com/">Opta</a> being on of the only distributors). Thus, I settled for the best I could do which is this FIFA data.

### Data Wrangling

The following pieces of data are stored in the Github repository which can be found here.

In [147]:
fifa15 = pd.read_csv("players_15.csv")
fifa16 = pd.read_csv("players_16.csv")
fifa17 = pd.read_csv("players_17.csv")
fifa18 = pd.read_csv("players_18.csv")
fifa19 = pd.read_csv("players_19.csv")

fifa15.head()

Unnamed: 0,sofifa_id,player_url,short_name,long_name,age,dob,height_cm,weight_kg,nationality,club,...,lwb,ldm,cdm,rdm,rwb,lb,lcb,cb,rcb,rb
0,158023,https://sofifa.com/player/158023/lionel-messi/...,L. Messi,Lionel Andrés Messi Cuccittini,27,1987-06-24,169,67,Argentina,FC Barcelona,...,62+3,62+3,62+3,62+3,62+3,54+3,45+3,45+3,45+3,54+3
1,20801,https://sofifa.com/player/20801/c-ronaldo-dos-...,Cristiano Ronaldo,Cristiano Ronaldo dos Santos Aveiro,29,1985-02-05,185,80,Portugal,Real Madrid,...,63+3,63+3,63+3,63+3,63+3,57+3,52+3,52+3,52+3,57+3
2,9014,https://sofifa.com/player/9014/arjen-robben/15...,A. Robben,Arjen Robben,30,1984-01-23,180,80,Netherlands,FC Bayern München,...,64+3,64+3,64+3,64+3,64+3,55+3,46+3,46+3,46+3,55+3
3,41236,https://sofifa.com/player/41236/zlatan-ibrahim...,Z. Ibrahimović,Zlatan Ibrahimović,32,1981-10-03,195,95,Sweden,Paris Saint-Germain,...,61+3,65+3,65+3,65+3,61+3,56+3,55+3,55+3,55+3,56+3
4,167495,https://sofifa.com/player/167495/manuel-neuer/...,M. Neuer,Manuel Neuer,28,1986-03-27,193,92,Germany,FC Bayern München,...,,,,,,,,,,


Above we have an example of the data from 2015 (all the data from the datasets we want are uniform, so not too much to worry about there). It can be seen that there is a lot of information, most of it is data that we don't need. The most important data is the name of the player, their age, club, overall rating, and rating for each skill.

The following code filters that information and then further makes sure to remove the changes that have occurred. For example, if a player started with an 80 in passing but improved over the course of that year/season, FIFA would update their ranking by adding +1. So the data reflects this by stating that their passing rating is 80+1. This isn't convenient for our data, so the following code helps make 80+1 simply 81.

In [144]:
fifa15 = fifa15.filter(["short_name","age","club","overall","player_positions","attacking_crossing","attacking_finishing","attacking_heading_accuracy","attacking_short_passing","attacking_volleys","skill_dribbling","skill_curve","skill_fk_accuracy","skill_long_passing","skill_ball_control","movement_acceleration","movement_sprint_speed","movement_agility","movement_reactions","movement_balance","power_shot_power","power_jumping","power_stamina","power_strength","power_long_shots","mentality_aggression","mentality_interceptions","mentality_positioning","mentality_vision","mentality_penalties","mentality_composure","defending_marking","defending_standing_tackle","defending_sliding_tackle","goalkeeping_diving","goalkeeping_handling","goalkeeping_kicking","goalkeeping_positioning","goalkeeping_reflexes"])
fifa16 = fifa16.filter(["short_name","age","club","overall","player_positions","attacking_crossing","attacking_finishing","attacking_heading_accuracy","attacking_short_passing","attacking_volleys","skill_dribbling","skill_curve","skill_fk_accuracy","skill_long_passing","skill_ball_control","movement_acceleration","movement_sprint_speed","movement_agility","movement_reactions","movement_balance","power_shot_power","power_jumping","power_stamina","power_strength","power_long_shots","mentality_aggression","mentality_interceptions","mentality_positioning","mentality_vision","mentality_penalties","mentality_composure","defending_marking","defending_standing_tackle","defending_sliding_tackle","goalkeeping_diving","goalkeeping_handling","goalkeeping_kicking","goalkeeping_positioning","goalkeeping_reflexes"])
fifa17 = fifa17.filter(["short_name","age","club","overall","player_positions","attacking_crossing","attacking_finishing","attacking_heading_accuracy","attacking_short_passing","attacking_volleys","skill_dribbling","skill_curve","skill_fk_accuracy","skill_long_passing","skill_ball_control","movement_acceleration","movement_sprint_speed","movement_agility","movement_reactions","movement_balance","power_shot_power","power_jumping","power_stamina","power_strength","power_long_shots","mentality_aggression","mentality_interceptions","mentality_positioning","mentality_vision","mentality_penalties","mentality_composure","defending_marking","defending_standing_tackle","defending_sliding_tackle","goalkeeping_diving","goalkeeping_handling","goalkeeping_kicking","goalkeeping_positioning","goalkeeping_reflexes"])
fifa18 = fifa18.filter(["short_name","age","club","overall","player_positions","attacking_crossing","attacking_finishing","attacking_heading_accuracy","attacking_short_passing","attacking_volleys","skill_dribbling","skill_curve","skill_fk_accuracy","skill_long_passing","skill_ball_control","movement_acceleration","movement_sprint_speed","movement_agility","movement_reactions","movement_balance","power_shot_power","power_jumping","power_stamina","power_strength","power_long_shots","mentality_aggression","mentality_interceptions","mentality_positioning","mentality_vision","mentality_penalties","mentality_composure","defending_marking","defending_standing_tackle","defending_sliding_tackle","goalkeeping_diving","goalkeeping_handling","goalkeeping_kicking","goalkeeping_positioning","goalkeeping_reflexes"])
fifa19 = fifa19.filter(["short_name","age","club","overall","player_positions","attacking_crossing","attacking_finishing","attacking_heading_accuracy","attacking_short_passing","attacking_volleys","skill_dribbling","skill_curve","skill_fk_accuracy","skill_long_passing","skill_ball_control","movement_acceleration","movement_sprint_speed","movement_agility","movement_reactions","movement_balance","power_shot_power","power_jumping","power_stamina","power_strength","power_long_shots","mentality_aggression","mentality_interceptions","mentality_positioning","mentality_vision","mentality_penalties","mentality_composure","defending_marking","defending_standing_tackle","defending_sliding_tackle","goalkeeping_diving","goalkeeping_handling","goalkeeping_kicking","goalkeeping_positioning","goalkeeping_reflexes"])

for i, rows in fifa15.iterrows():
    if len(rows[4]) > 3:
        fifa15.at[i,fifa15.columns[4]] = rows[4][0:rows[4].find(",")]
    for j in range(5,len(rows)-1):
        if type(rows[j]) == str and (rows[j].find("+") != -1 or rows[j].find("-") != -1) :
            fifa15.at[i,fifa15.columns[j]] = str(eval(rows[j]))

for i, rows in fifa16.iterrows():
    if len(rows[4]) > 3:
        fifa16.at[i,fifa15.columns[4]] = rows[4][0:rows[4].find(",")]
    for j in range(5,len(rows)-1):
        if type(rows[j]) == str and (rows[j].find("+") != -1 or rows[j].find("-") != -1) :
            fifa16.at[i,fifa16.columns[j]] = str(eval(rows[j]))

for i, rows in fifa17.iterrows():
    if len(rows[4]) > 3:
        fifa17.at[i,fifa15.columns[4]] = rows[4][0:rows[4].find(",")]
    for j in range(5,len(rows)-1):
        if type(rows[j]) == str and (rows[j].find("+") != -1 or rows[j].find("-") != -1) :
            fifa17.at[i,fifa17.columns[j]] = str(eval(rows[j]))
            
for i, rows in fifa18.iterrows():
    if len(rows[4]) > 3:
        fifa18.at[i,fifa15.columns[4]] = rows[4][0:rows[4].find(",")]
    for j in range(5,len(rows)-1):
        if type(rows[j]) == str and (rows[j].find("+") != -1 or rows[j].find("-") != -1) :
            fifa18.at[i,fifa18.columns[j]] = str(eval(rows[j]))
            
for i, rows in fifa19.iterrows():
    if len(rows[4]) > 3:
        fifa19.at[i,fifa15.columns[4]] = rows[4][0:rows[4].find(",")]
    for j in range(5,len(rows)-1):
        if type(rows[j]) == str and (rows[j].find("+") != -1 or rows[j].find("-") != -1) :
            fifa19.at[i,fifa19.columns[j]] = str(eval(rows[j]))
fifa15.head()


Unnamed: 0,short_name,age,club,overall,player_positions,attacking_crossing,attacking_finishing,attacking_heading_accuracy,attacking_short_passing,attacking_volleys,...,mentality_penalties,mentality_composure,defending_marking,defending_standing_tackle,defending_sliding_tackle,goalkeeping_diving,goalkeeping_handling,goalkeeping_kicking,goalkeeping_positioning,goalkeeping_reflexes
0,L. Messi,27,FC Barcelona,93,CF,84,91,71,89,80,...,76,,25,21,20,6,11,15,14,8
1,Cristiano Ronaldo,29,Real Madrid,92,LW,83,98,86,82,89,...,85,,22,31,23,7,11,15,14,11
2,A. Robben,30,FC Bayern München,90,RM,80,87,50,88,88,...,81,,29,28,27,10,8,11,5,15
3,Z. Ibrahimović,32,Paris Saint-Germain,90,ST,76,91,76,82,95,...,91,,25,33,27,13,15,10,9,12
4,M. Neuer,28,FC Bayern München,90,GK,25,25,25,42,25,...,37,,25,25,25,87,88,92,96,86


In [146]:
fifa20 = pd.read_csv("players_20.csv")

fifa20 = fifa20.filter(["short_name","age","club","overall","player_positions","attacking_crossing","attacking_finishing","attacking_heading_accuracy","attacking_short_passing","attacking_volleys","skill_dribbling","skill_curve","skill_fk_accuracy","skill_long_passing","skill_ball_control","movement_acceleration","movement_sprint_speed","movement_agility","movement_reactions","movement_balance","power_shot_power","power_jumping","power_stamina","power_strength","power_long_shots","mentality_aggression","mentality_interceptions","mentality_positioning","mentality_vision","mentality_penalties","mentality_composure","defending_marking","defending_standing_tackle","defending_sliding_tackle","goalkeeping_diving","goalkeeping_handling","goalkeeping_kicking","goalkeeping_positioning","goalkeeping_reflexes"])

for i, rows in fifa20.iterrows():
    if len(rows[4]) > 3:
        fifa20.at[i,fifa15.columns[4]] = rows[4][0:rows[4].find(",")]
    for j in range(5,len(rows)-1):
        if type(rows[j]) == str and (rows[j].find("+") != -1 or rows[j].find("-") != -1) :
            fifa20.at[i,fifa15.columns[j]] = str(eval(rows[j]))

fifa20.head()

Unnamed: 0,short_name,age,club,overall,player_positions,attacking_crossing,attacking_finishing,attacking_heading_accuracy,attacking_short_passing,attacking_volleys,...,mentality_penalties,mentality_composure,defending_marking,defending_standing_tackle,defending_sliding_tackle,goalkeeping_diving,goalkeeping_handling,goalkeeping_kicking,goalkeeping_positioning,goalkeeping_reflexes
0,L. Messi,32,FC Barcelona,94,RW,88,95,70,92,88,...,75,96,33,37,26,6,11,15,14,8
1,Cristiano Ronaldo,34,Juventus,93,ST,84,94,89,83,87,...,85,95,28,32,24,7,11,15,14,11
2,Neymar Jr,27,Paris Saint-Germain,92,LW,87,87,62,87,87,...,90,94,27,26,29,9,9,15,15,11
3,J. Oblak,26,Atlético Madrid,91,GK,13,11,15,43,13,...,11,68,27,12,18,87,92,78,90,89
4,E. Hazard,28,Real Madrid,91,LW,81,84,61,89,83,...,88,91,34,27,22,11,12,6,8,8


### Data Processing and Tidying

#### Splitting By Roles

#### Standardizing Data

## Exploratory Data Analysis

### Attackers Analysis

#### Best Attackers

#### Best Attackers Under 21

### Midfielders Analysis

#### Best Midfielders

#### Best Midfielders under 21

### Defenders Analysis

#### Best Defenders

#### Best Defenders under 21

### Goalkeepers Analysis

#### Best Goalkeepers

#### Best Goalkeepers under 21

## Building the Data Model