In this notebook I will look to explore the best players in the 2021/2022 season using percentiles to measure where each player ranks amongst their peers among different categories. The reason I have used data from last year is that stats from this year are still changing and the main purpose here is to produce a reproduceable notebook to demo how we can use percentiles to rank players.

The main categories we will look into are:
* passing 
* defending
* dribbling 
* shooting

# Imports

In [1]:
import pandas as pd
import os
os.chdir("../../")

In [2]:
from src.fbref.fbref_class import FBref
from src.fbref.analysis.functions import get_player_percentile_df
pd.set_option('display.max_columns', None)

# Fetch player data

In [18]:
# config
fb = FBref()
competition_id = '9'
season_name = '2021-2022'

player_standard_df = fb.get_big5_player_stats('standard', season_name)
player_passing_df = fb.get_big5_player_stats('passing', season_name)
player_defense_df = fb.get_big5_player_stats('defense', season_name)
player_shooting_df = fb.get_big5_player_stats('shooting', season_name)
player_possession_df = fb.get_big5_player_stats('possession', season_name)
player_misc_df = fb.get_big5_player_stats('miscellaneous', season_name)

# Passing

Here we will outline the stats that are most important to be a passer. These stats will be:
* passes_attempted_per_match
* pass_completion_perc
* expected_assists_per_match
* key_passes_per_match
* progressive_passes_per_match


In [62]:
passing_stat_list = [
    'passes_attempted_per_90',
    'pass_completion_perc',
    'expected_assists_per_90',
    'key_passes_per_90',
    'progressive_passes_per_90',
]

player_passing_df_filtered = player_passing_df.query("no_of_nineties > 5")
passing_percentile_df = get_player_percentile_df( passing_stat_list,player_passing_df_filtered)

In [63]:
passing_percentile_df.sort_values(by='stat_score', ascending=False).query("no_of_nineties > 5").head(10)

Unnamed: 0,player_name,position,age,no_of_nineties,passes_attempted_per_90_percentile,pass_completion_perc_percentile,expected_assists_per_90_percentile,key_passes_per_90_percentile,progressive_passes_per_90_percentile,stat_score
1432,Toni Kroos,MF,31.0,23.4,99,98,85,95,99,95.2
1375,Joshua Kimmich,MF,26.0,27.5,99,79,99,99,99,95.0
2662,Corentin Tolisso,MF,26.0,7.8,94,88,97,94,99,94.4
1172,Ander Herrera,MF,31.0,12.1,99,98,92,84,99,94.4
1774,Lionel Messi,"FW,MF",34.0,23.9,93,81,99,98,99,94.0
1094,Bruno Guimarães,MF,23.0,18.2,96,87,95,93,99,94.0
1816,Luka Modrić,MF,35.0,22.6,96,91,94,87,99,93.4
1261,Reece James,DF,21.0,20.7,98,86,94,96,91,93.0
2774,Marco Verratti,MF,28.0,21.5,99,97,88,82,99,93.0
2907,Oleksandr Zinchenko,DF,24.0,11.6,99,93,93,80,99,92.8


As we can see there is no surprise to see Toni Kroos at the top who is known as one of the best passers in the game who good to see our eye test being backed up by the numbers. Looking at the stats we have used for our passing rating, with pass attempted and pass completion there, this could favour players who play it safer and who play for bigger teams as they have more of the possession. However the other stats are such as , expected assits and key passes, are also measuring the quality of passes so this cover balances that effect. It is also worth noting that player who play high risk passes will be disadvantaged however if their high risk passes are not leading to high expected assists or progressive passes then maybe those high risk passes are not worth it.

# Defense

Here we will define the stats that are inmportant for a defender:
* Tackles
* Interceptions
* Blocks 
* Clearances
* Aerials won

NB: these stats are used on the FBref website.

In [64]:
defense_stat_list = [
    'tackles_won_per_90',
    'blocks_per_90',
    'interceptions_per_90',
    'clearances_per_90',
    'aerial_duels_win_perc'
]

player_defense_detailed = player_misc_df.merge(
    player_defense_df[['player_name','blocks_per_90', 'clearances_per_90']], 
    on = 'player_name'
)

player_defense_detailed_filtered = player_defense_detailed.query("no_of_nineties > 5")
defense_percentile_df = get_player_percentile_df( defense_stat_list, player_defense_detailed_filtered )

In [65]:
defense_percentile_df.sort_values(by='stat_score', ascending=False).query("no_of_nineties > 5").head(10)

Unnamed: 0,player_name,position,age,no_of_nineties,tackles_won_per_90_percentile,blocks_per_90_percentile,interceptions_per_90_percentile,clearances_per_90_percentile,aerial_duels_win_perc_percentile,stat_score
1563,Boubakar Kouyaté,DF,24.0,29.6,93,97,99,99,87,95.0
2417,Guilherme Ramos,DF,23.0,10.1,88,97,99,98,83,93.0
1887,Konstantinos Mavropanos,DF,23.0,30.1,80,87,97,99,93,91.2
93,Ethan Ampadu,"MF,DF",20.0,25.4,93,96,94,83,85,90.2
1099,Andrei Girotto,"DF,MF",29.0,35.0,88,96,94,92,79,89.8
2755,Danilo Soares,DF,29.0,28.5,99,98,97,73,77,88.8
2854,James Tarkowski,DF,28.0,34.5,80,98,77,98,90,88.6
2521,Cristian Romero,DF,23.0,20.5,97,98,86,88,73,88.4
2587,Mohammed Salisu,DF,22.0,33.0,85,84,97,96,80,88.4
2658,Nico Schlotterbeck,DF,21.0,30.9,84,76,91,97,93,88.2


Here we can see that the defenders in the top teams are not coming out on top. Either those defenders are not as good as we may think or the way we are rating defenders could be improved. This is definetly the latter case. One thing we can say is that the stats we have prioritised are will lead towards more aggressive defenders. Defensive stats should also be possession adjusted as it disadvantages defenders on teams who have lots of possession. We will need to alter how we evaluate defensive stats but the players we have outputted are at least one with lots od defensive numbers who are likely to be aggressive players. 

# Dribbling

Here are the stats to be used to measure dribbling:
* dribble_success_perc 
* attempted_dribbles_per_90

In [66]:
dribbling_stat_list = [
    'dribble_success_perc',
    'attempted_dribbles_per_90',
]

player_possession_df_filtered = player_possession_df.query("no_of_nineties > 5")
dribbling_percentile_df = get_player_percentile_df( dribbling_stat_list, player_possession_df_filtered )

dribbling_percentile_df.sort_values(by='stat_score', ascending=False).query("no_of_nineties > 5").head(10)

Unnamed: 0,player_name,position,age,no_of_nineties,dribble_success_perc_percentile,attempted_dribbles_per_90_percentile,stat_score
2690,Adama Traoré,"FW,MF",25.0,11.9,90,99,94.5
1143,Eden Hazard,"FW,MF",30.0,8.0,89,93,91.0
1033,Nicolás González,MF,19.0,12.4,88,90,89.0
744,Jeremy Doku,"FW,MF",19.0,5.2,77,99,88.0
1288,Curtis Jones,MF,20.0,9.5,86,90,88.0
1379,Sofian Kiyine,"MF,FW",23.0,13.8,77,98,87.5
1411,Kouadio Koné,MF,20.0,24.3,81,93,87.0
2757,Jesus Vazquez,DF,18.0,9.0,82,92,87.0
2342,Roli Pereira de Sa,MF,24.0,6.4,93,80,86.5
2147,Paul Pogba,"MF,FW",28.0,15.0,81,91,86.0


The type of players where are what you'd expect, especially with Adama Traore being high up there. Surprsing to see Hazard up there with his form being up and down since coming to Madrid but he has always been a good dribbler.

# Shooting

Here are the stats to be used for measuring a players shooting ability:
* shots_per_90
* shots_on_target_per_90
* non_penalty_xg_per_shot
* non_penalty_xg_per_90

In [67]:
shooting_stat_list = [
    'shots_per_90',
    'shots_on_target_per_90',
    'non_penalty_xg_per_shot',
    'non_penalty_xg_per_90',
]

player_shooting_df_filtered = player_shooting_df.query("no_of_nineties > 5")

shooting_percentile_df = get_player_percentile_df( shooting_stat_list, player_shooting_df_filtered )

shooting_percentile_df.sort_values(by='stat_score', ascending=False).query("no_of_nineties > 5").head(10)

Unnamed: 0,player_name,position,age,no_of_nineties,shots_per_90_percentile,shots_on_target_per_90_percentile,non_penalty_xg_per_shot_percentile,non_penalty_xg_per_90_percentile,stat_score
2424,Patrik Schick,FW,25.0,23.1,99,99,96,99,98.25
1510,Robert Lewandowski,FW,32.0,32.7,99,99,96,99,98.25
1095,Sehrou Guirassy,FW,25.0,11.3,99,98,96,99,98.0
49,Lucas Alario,FW,28.0,7.3,95,99,99,99,98.0
711,Bamba Dieng,FW,21.0,10.8,97,99,97,99,98.0
668,Moussa Dembélé,FW,25.0,24.5,98,98,96,99,97.75
1114,Erling Haaland,FW,21.0,21.2,98,98,94,99,97.25
1320,Tino Kadewere,FW,25.0,5.6,97,96,96,99,97.0
2890,Duván Zapata,FW,30.0,19.1,99,99,90,99,96.75
1719,Kylian Mbappé,FW,22.0,33.6,99,99,90,99,96.75


The stats we have used are a fixture of player style and results where we are taking into account the number fo shots and shots on target per 90 as well as the quality of shots such as non penalty xg per 90 and np xg per shot. We are curently geting the usual suspects such as Lewandowski as well as Haaland and Mbappe. Other players I am less familiar are Guiassy and Alario.

# Conclusion

In this notebook we have successful grabbed stats player stats and ranked them amongst their peers to generate rating scorers for various measure of a players game ie dribbling ability. The method to measure for each area of a players ability can still be worked on especially for defensive measures as we are over emphasising the act of doing for defenders and numbers will need to be possession adjusted. Also note that our stats scores for respective categories are quite close together amongst players so potentially we could do another percentile output for their stat score. 

Next steps include:
* Refine stats used for category scores (especially for defense)
* Create an overall score for players where we have an attacking and defensive rating which can be averaged out