# Fantasy Football - Predicting Player Points

In this notebook, we use the goal predicting model created in the previous notebook as well as other metrics such as individual player form and clean sheet probability to predict player FPL points.

In [1]:
import pandas as pd
import warnings
from functools import reduce
import itertools
import numpy as np
import sklearn.preprocessing as preprocessing
import sklearn.model_selection as model_selection
from sklearn import linear_model
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn import linear_model
from joblib import dump, load

In [2]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
warnings.filterwarnings("ignore")

## 1) Assign Player Points

Importing player FBREF peformance data and add opponent goals and xg.

In [3]:
player_data = pd.read_csv('data/fbref_player_data.csv')

In [4]:
player_data.head()

Unnamed: 0,player,number,nation,position_x,age,mins,total_pases_cmp,total_pases_att,total_pases_cmp%,total_pases_dist,pases_prg_dist,short_pases_cmp,short_pases_att,short_pases_cmp%,medium_pases_cmp,medium_pases_att,medium_pases_cmp%,long_pases_cmp,long_pases_att,long_pases_cmp%,ast_pases,xAG,xA,KP,01-Mar,ppa,crspa,prog,live_pases,dead_pases,fk_pases,tb_pases,switch_pases,cross_pases,throwin_pases,corner_pases,inswing_corner,outswing_corner,straight_corner,cmp_pass,offside_pass,blocked_pass,total_tkl,total_tkl_ballwon,def3rd_tackles,mid3rd_tackles,att3rd_tackles,vsdribbles_tackles_cmp,vsdribbles_tackles_att,vsdribbles_tackles_cmp%,vsdribbles_tackles_past,total_tackles_blocks,sh_tackles_blocks,pass_tackles_blocks,int_tackles,tkl+int_tackles,clrearance_tackles,err_leading_toshot,total_touches,touches_in_own_box,touches_in_own_1/3,touches_in_middle_1/3,touches_in_atk_1/3,touches_in_opp_box,total_liveball_touches,dribbles_completed,dribbles_attempted,dribbles_cmp%,failed_to_control_ball,disposessed_not_including_dribbles,total_received_passes,total_prog_received_passes,yellowcrd,redcrd,2ndyellow,fouls_conceeded,fouls_drawn,offsides,croses,tklw,pk_won,pk_conceeded,OG,recoveries,aerials_won,aerials_lost,aerials_cmp%,shots_against_gk,goals_against_gk,saves,save%,post_shot_xg,long_passes_cmp_gk,long_passes_att_gk,long_passes_cmp%_gk,passes_notgkick_att,passes_notgkick_throws,passes_notgkick_over40yrds,average_pass_length_notgkick,goalkick_att,goalkick_over40yrds,goalkick_avelen,attemped_croses_opp,attemped_croses_opp_stopped,attemped_croses_opp_stopped%,defensive_actions_not_in_box,ave_distance_from_goal_defensive_actions,home,game_week,date,position_y,Gls,Ast,PK,PKatt,Sh,SoT,xG,npxG,SCA,SCAGCA,Team,positon,season,DEF,FWD,GK,MID,pos
0,Alexandre Lacazette,9,fr FRA,"FW,LW,LM",26-075,90,20.0,24.0,83.3,263.0,42.0,14.0,16.0,87.5,5.0,6.0,83.3,0.0,0.0,,0,0.4,0.2,3.0,0.0,2.0,1.0,2.0,19.0,4.0,0.0,0.0,0.0,2,0.0,0.0,0.0,0.0,0.0,20.0,1.0,1.0,2.0,1,0.0,1.0,1.0,1.0,3.0,33.3,2.0,1.0,0.0,1.0,0,2.0,1.0,0.0,35.0,1.0,2.0,9.0,24.0,7.0,35.0,1.0,1.0,100.0,2.0,2.0,25.0,7.0,0,0,0,2,2,2,2,1,0.0,0.0,0,2.0,1.0,2.0,33.3,,,,,,,,,,,,,,,,,,,,,1,1,11/08/2017,"FW,LW",1,0,0,0,3,2,0.3,0.3,7,2,Arsenal,FWD,2017-18,0,31,0,4,FWD
1,Alexandre Lacazette,9,fr FRA,FW,26-083,77,29.0,39.0,74.4,366.0,99.0,20.0,23.0,87.0,7.0,10.0,70.0,1.0,2.0,50.0,0,0.0,0.1,1.0,1.0,1.0,0.0,3.0,37.0,2.0,0.0,1.0,0.0,1,0.0,0.0,0.0,0.0,0.0,29.0,0.0,2.0,0.0,0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,1.0,0,0.0,0.0,0.0,46.0,0.0,1.0,20.0,26.0,6.0,46.0,0.0,0.0,,4.0,0.0,30.0,11.0,0,0,0,0,0,1,1,0,0.0,0.0,0,4.0,0.0,2.0,0.0,,,,,,,,,,,,,,,,,,,,,0,2,19/08/2017,"FW,RM",0,0,0,0,0,0,0.0,0.0,2,0,Arsenal,FWD,2017-18,0,31,0,4,FWD
2,Alexandre Lacazette,9,fr FRA,LW,26-091,29,10.0,15.0,66.7,160.0,30.0,5.0,6.0,83.3,4.0,6.0,66.7,1.0,3.0,33.3,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,15.0,0.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0,10.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0,0.0,0.0,0.0,17.0,0.0,2.0,7.0,8.0,2.0,17.0,2.0,2.0,100.0,1.0,1.0,19.0,3.0,0,0,0,0,1,1,0,0,0.0,0.0,0,1.0,0.0,2.0,0.0,,,,,,,,,,,,,,,,,,,,,0,3,27/08/2017,LW,0,0,0,0,1,0,0.1,0.1,1,0,Arsenal,MID,2017-18,0,31,0,4,FWD
3,Alexandre Lacazette,9,fr FRA,FW,26-104,74,16.0,17.0,94.1,215.0,40.0,10.0,11.0,90.9,6.0,6.0,100.0,0.0,0.0,,0,0.0,0.1,1.0,1.0,1.0,0.0,1.0,16.0,1.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0,16.0,0.0,0.0,1.0,1,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1,2.0,1.0,0.0,25.0,0.0,4.0,7.0,14.0,3.0,25.0,1.0,1.0,100.0,1.0,2.0,16.0,2.0,0,0,0,0,0,1,0,1,0.0,0.0,0,3.0,2.0,3.0,40.0,,,,,,,,,,,,,,,,,,,,,1,4,09/09/2017,FW,1,0,0,0,2,2,0.4,0.4,3,0,Arsenal,FWD,2017-18,0,31,0,4,FWD
4,Alexandre Lacazette,9,fr FRA,FW,26-112,65,8.0,15.0,53.3,69.0,10.0,6.0,12.0,50.0,0.0,1.0,0.0,0.0,0.0,,0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,15.0,0.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0,8.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,2.0,0.0,2.0,1.0,0.0,1.0,0,0.0,0.0,0.0,22.0,0.0,0.0,7.0,15.0,3.0,22.0,0.0,1.0,0.0,2.0,1.0,17.0,3.0,0,0,0,2,0,0,0,0,0.0,0.0,0,1.0,2.0,1.0,66.7,,,,,,,,,,,,,,,,,,,,,0,5,17/09/2017,FW,0,0,0,0,2,1,0.7,0.7,1,0,Arsenal,FWD,2017-18,0,31,0,4,FWD


FPL attributes an assist to a player for winning a penalty (if the player does not take the penalty kick and the penalty is scored). Conversely, fbref, does not consider winning a penalty as an assist. Consequently, it becomes necessary for us to align the assist values with the FPL criteria. Since we are unable to determine whether the penalty kick won was subsequently scored or missed (given the possibility of multiple penalties won in a single game), a penalty won will be accounted for as an assist regardless if it was converted or not

In [5]:
player_data.loc[(player_data.pk_won > 0) & (player_data.pk_won != player_data.PKatt), 'Ast'] += player_data.pk_won
player_data.loc[(player_data.pk_won > 0) & (player_data.pk_won != player_data.PKatt), 'xAG'] += player_data.pk_won

Due to minor discrepancies between player data and our current data sets, it is necessary to make adjustments to certain team names.

In [6]:
player_data.Team.unique()

array(['Arsenal', 'Everton', 'Liverpool', 'Chelsea', 'Leicester City',
       'Watford', 'Crystal Palace', 'Huddersfield Town',
       'West Bromwich Albion', 'Bournemouth', 'Burnley', 'Stoke City',
       'Southampton', 'Swansea City', 'Brighton', 'Manchester City',
       'Newcastle United', 'Tottenham', 'Manchester United', 'West Ham',
       'Fulham', 'Cardiff City', 'Wolverhampton Wanderers',
       'Norwich City', 'Sheffield United', 'Aston Villa', 'Brentford',
       'Leeds United'], dtype=object)

In [7]:
player_data = (player_data
               .drop_duplicates(subset=['player', 'date'])
               .assign(date=pd.to_datetime(player_data['date'], format='%d/%m/%Y'),
                       Team=player_data['Team']
                           .str.replace('Huddersfield Town', 'Huddersfield')
                           .str.replace('Manchester United', 'Manchester Utd')
                           .str.replace('West Bromwich Albion', 'West Brom')
                           .str.replace('Sheffield United', 'Sheffield Utd')
                           .str.replace('Newcastle United', 'Newcastle Utd')
                           .str.replace('Wolverhampton Wanderers', 'Wolves')))

In [8]:
player_data.Team.unique()

array(['Arsenal', 'Everton', 'Liverpool', 'Chelsea', 'Leicester City',
       'Watford', 'Crystal Palace', 'Huddersfield', 'West Brom',
       'Bournemouth', 'Burnley', 'Stoke City', 'Southampton',
       'Swansea City', 'Brighton', 'Manchester City', 'Newcastle Utd',
       'Tottenham', 'Manchester Utd', 'West Ham', 'Fulham',
       'Cardiff City', 'Wolves', 'Norwich City', 'Sheffield Utd',
       'Aston Villa', 'Brentford', 'Leeds United'], dtype=object)

Adding opponent goals to player data.

In [9]:
team_data = pd.read_csv('data/wrangled_data_final.csv').astype({'date': 'datetime64[ns]'})

In [10]:
team_cols = ['date', 'game_id', 'opponent_goals', 'opponent_xg', 'team']

In [11]:
player_data = (pd.merge(player_data, team_data[team_cols], right_on=['date', 'team'], left_on=['date', 'Team'])
               .drop_duplicates(subset=['player', 'date'])
               .astype({'opponent_goals': 'int', 'date': 'datetime64[ns]'}))

In [12]:
player_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19537 entries, 0 to 19536
Columns: 133 entries, player to team
dtypes: datetime64[ns](1), float64(91), int32(1), int64(30), object(10)
memory usage: 19.7+ MB


In [13]:
player_data.head()

Unnamed: 0,player,number,nation,position_x,age,mins,total_pases_cmp,total_pases_att,total_pases_cmp%,total_pases_dist,pases_prg_dist,short_pases_cmp,short_pases_att,short_pases_cmp%,medium_pases_cmp,medium_pases_att,medium_pases_cmp%,long_pases_cmp,long_pases_att,long_pases_cmp%,ast_pases,xAG,xA,KP,01-Mar,ppa,crspa,prog,live_pases,dead_pases,fk_pases,tb_pases,switch_pases,cross_pases,throwin_pases,corner_pases,inswing_corner,outswing_corner,straight_corner,cmp_pass,offside_pass,blocked_pass,total_tkl,total_tkl_ballwon,def3rd_tackles,mid3rd_tackles,att3rd_tackles,vsdribbles_tackles_cmp,vsdribbles_tackles_att,vsdribbles_tackles_cmp%,vsdribbles_tackles_past,total_tackles_blocks,sh_tackles_blocks,pass_tackles_blocks,int_tackles,tkl+int_tackles,clrearance_tackles,err_leading_toshot,total_touches,touches_in_own_box,touches_in_own_1/3,touches_in_middle_1/3,touches_in_atk_1/3,touches_in_opp_box,total_liveball_touches,dribbles_completed,dribbles_attempted,dribbles_cmp%,failed_to_control_ball,disposessed_not_including_dribbles,total_received_passes,total_prog_received_passes,yellowcrd,redcrd,2ndyellow,fouls_conceeded,fouls_drawn,offsides,croses,tklw,pk_won,pk_conceeded,OG,recoveries,aerials_won,aerials_lost,aerials_cmp%,shots_against_gk,goals_against_gk,saves,save%,post_shot_xg,long_passes_cmp_gk,long_passes_att_gk,long_passes_cmp%_gk,passes_notgkick_att,passes_notgkick_throws,passes_notgkick_over40yrds,average_pass_length_notgkick,goalkick_att,goalkick_over40yrds,goalkick_avelen,attemped_croses_opp,attemped_croses_opp_stopped,attemped_croses_opp_stopped%,defensive_actions_not_in_box,ave_distance_from_goal_defensive_actions,home,game_week,date,position_y,Gls,Ast,PK,PKatt,Sh,SoT,xG,npxG,SCA,SCAGCA,Team,positon,season,DEF,FWD,GK,MID,pos,game_id,opponent_goals,opponent_xg,team
0,Alexandre Lacazette,9,fr FRA,FW,26-120,82,15.0,21.0,71.4,180.0,14.0,9.0,13.0,69.2,5.0,6.0,83.3,0.0,0.0,,0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,20.0,1.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0,15.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0,0.0,2.0,0.0,33.0,2.0,2.0,8.0,23.0,7.0,32.0,5.0,5.0,100.0,3.0,0.0,23.0,7.0,0,0,0,0,2,0,0,0,0.0,0.0,0,2.0,2.0,1.0,66.7,,,,,,,,,,,,,,,,,,,,,1,6,2017-09-25,FW,2,0,1,1,3,1,1.8,1.0,3,0,Arsenal,FWD,2017-18,0,31,0,4,FWD,60,0,0.9,Arsenal
1,Mesut Özil,11,de GER,AM,28-345,8,11.0,11.0,100.0,147.0,40.0,7.0,7.0,100.0,4.0,4.0,100.0,0.0,0.0,,0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,11.0,0.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0,11.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0,,0.0,0.0,0.0,0.0,1,1.0,0.0,0.0,13.0,1.0,1.0,6.0,6.0,0.0,13.0,0.0,0.0,,0.0,0.0,11.0,2.0,0,0,0,0,0,0,0,0,0.0,0.0,0,0.0,0.0,1.0,0.0,,,,,,,,,,,,,,,,,,,,,1,6,2017-09-25,LW,0,0,0,0,0,0,0.0,0.0,1,0,Arsenal,MID,2017-18,0,0,0,29,MID,60,0,0.9,Arsenal
2,Granit Xhaka,29,ch SUI,CM,24-363,90,65.0,76.0,85.5,1112.0,242.0,23.0,27.0,85.2,33.0,36.0,91.7,6.0,10.0,60.0,0,0.0,0.1,1.0,3.0,0.0,0.0,7.0,67.0,9.0,4.0,0.0,1.0,7,0.0,5.0,1.0,1.0,1.0,65.0,0.0,0.0,1.0,1,1.0,0.0,0.0,0.0,0.0,,0.0,0.0,0.0,0.0,0,1.0,1.0,0.0,85.0,2.0,8.0,47.0,31.0,2.0,85.0,0.0,1.0,0.0,2.0,1.0,62.0,1.0,0,0,0,2,1,0,7,1,0.0,0.0,0,7.0,1.0,2.0,33.3,,,,,,,,,,,,,,,,,,,,,1,6,2017-09-25,DM,0,0,0,0,3,0,0.2,0.2,1,0,Arsenal,MID,2017-18,0,0,0,42,MID,60,0,0.9,Arsenal
3,Mohamed Elneny,35,eg EGY,CM,25-076,90,72.0,78.0,92.3,1229.0,276.0,38.0,40.0,95.0,27.0,29.0,93.1,7.0,7.0,100.0,0,0.1,0.3,2.0,8.0,3.0,0.0,7.0,77.0,1.0,1.0,0.0,1.0,1,0.0,0.0,0.0,0.0,0.0,72.0,0.0,1.0,0.0,0,0.0,0.0,0.0,0.0,0.0,,0.0,0.0,0.0,0.0,0,0.0,0.0,1.0,82.0,0.0,3.0,49.0,31.0,3.0,82.0,0.0,0.0,,0.0,2.0,73.0,3.0,0,0,0,0,3,0,1,0,0.0,0.0,0,8.0,1.0,1.0,50.0,,,,,,,,,,,,,,,,,,,,,1,6,2017-09-25,DM,0,0,0,0,1,0,0.0,0.0,6,0,Arsenal,MID,2017-18,0,0,0,14,MID,60,0,0.9,Arsenal
4,Aaron Ramsey,8,wls WAL,AM,26-273,89,47.0,62.0,75.8,618.0,63.0,28.0,35.0,80.0,14.0,20.0,70.0,1.0,2.0,50.0,0,1.0,0.0,0.0,2.0,0.0,0.0,0.0,61.0,1.0,0.0,1.0,1.0,1,1.0,0.0,0.0,0.0,0.0,47.0,0.0,1.0,1.0,0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0,1.0,0.0,0.0,67.0,0.0,3.0,27.0,38.0,4.0,67.0,1.0,1.0,100.0,3.0,0.0,61.0,13.0,0,0,0,1,3,0,1,0,1.0,0.0,0,5.0,0.0,0.0,,,,,,,,,,,,,,,,,,,,,,1,6,2017-09-25,RW,0,1,0,0,2,1,0.2,0.2,1,1,Arsenal,MID,2017-18,0,0,0,27,MID,60,0,0.9,Arsenal


Assigning points based on FPL criteria. We create both an 'actual points' columns based on actual peformance data and an 'xg_points' column based on XG peformance data.

#### Goalkeeper:

    - Playing up to 60 minutes: +1
    - Playing 60 minutes or more: +2
    - For each goal scored: +6
    - For each assist for a goal: +3
    - For a clean sheet: +4
    - For every 3 shots saved: +1
    - For each penalty save: +5
    - For each penalty miss: -2
    - Bonus points for the best players in a match*: +1-3
    - For every 2 goals conceded: -1
    - For each yellow card: -1
    - For each red card: -3
    - For each own goal: -2
    
#### Defender:

    - Playing up to 60 minutes: +1
    - Playing 60 minutes or more: +2
    - For each goal scored: +6
    - For each assist for a goal: +3
    - For a clean sheet: +4
    - Bonus points for the best players in a match*: +1-3
    - For every 2 goals conceded: -1
    - For each yellow card: -1
    - For each red card: -3
    - For each own goal: -2
    
#### Midfiilders:

    - Playing up to 60 minutes: +1
    - Playing 60 minutes or more: +2
    - For each goal scored: +5
    - For each assist for a goal: +3
    - For a clean sheet: +1
    - Bonus points for the best players in a match*: +1-3
    - For each yellow card: -1
    - For each red card: -3
    - For each own goal: -2
    
#### Forwards:

    - Playing up to 60 minutes: +1
    - Playing 60 minutes or more: +2
    - For each goal scored: +4
    - For each assist for a goal: +3
    - Bonus points for the best players in a match*: +1-3
    - For each yellow card: -1
    - For each red card: -3
    - For each own goal: -2
    
    
*Bonus points are computed by aggregating various performance statistics from a game, including metrics like passing completion percentage and key tackles, referred to as "bps". In each match, the three players who achieve the highest bps scores are awarded 1, 2, and 3 bonus points, respectively.

In [14]:
player_data['actual_points'] = 0
player_data['xg_points'] = 0

Goalkeepers actual points

In [15]:
gk_condition = (player_data.pos == 'GK')

In [16]:
player_data.loc[gk_condition & (player_data.mins <= 60), 'actual_points'] += 1
player_data.loc[gk_condition & (player_data.mins > 60), 'actual_points'] += 2
player_data.loc[gk_condition & (player_data.Gls > 0), 'actual_points'] += 6 * player_data.Gls
player_data.loc[gk_condition & (player_data.Ast > 0), 'actual_points'] += 3 * player_data.Ast
player_data.loc[gk_condition & (player_data.goals_against_gk == 0) & (player_data.redcrd == 0) & (player_data.mins > 60), 'actual_points'] += 4
player_data.loc[gk_condition & (player_data.saves >= 3), 'actual_points'] += player_data.saves // 3
player_data.loc[gk_condition & (player_data.goals_against_gk >= 2), 'actual_points'] -= player_data.goals_against_gk // 2
player_data.loc[gk_condition & (player_data.yellowcrd == 1) & (player_data.redcrd == 0), 'actual_points'] -= 1
player_data.loc[gk_condition & (player_data.redcrd == 1), 'actual_points'] -= 3
player_data.loc[gk_condition & (player_data.OG > 0), 'actual_points'] -= 2 * player_data.OG

Goalkeepers xg points

In [17]:
player_data.loc[gk_condition & (player_data.mins <= 60), 'xg_points'] += 1
player_data.loc[gk_condition & (player_data.mins > 60), 'xg_points'] += 2
player_data.loc[gk_condition & (player_data.xG > 0), 'xg_points'] += 6 * player_data.xG
player_data.loc[gk_condition & (player_data.xAG > 0), 'xg_points'] += 3 * player_data.xAG
player_data.loc[gk_condition & (player_data.opponent_xg < 1) & (player_data.redcrd == 0) & (player_data.mins > 60), 'xg_points'] += 4
player_data.loc[gk_condition & (player_data.saves >= 3), 'xg_points'] += player_data.saves // 3
player_data.loc[gk_condition & (player_data.opponent_xg >= 2), 'xg_points'] -= player_data.opponent_xg // 2
player_data.loc[gk_condition & (player_data.yellowcrd == 1) & (player_data.redcrd == 0), 'xg_points'] -= 1
player_data.loc[gk_condition & (player_data.redcrd == 1), 'xg_points'] -= 3
player_data.loc[gk_condition & (player_data.OG > 0), 'xg_points'] -= 2 * player_data.OG

Defenders actual points

In [18]:
def_condition = (player_data.pos == 'DEF')

In [19]:
player_data.loc[def_condition & (player_data.mins <= 60), 'actual_points'] += 1
player_data.loc[def_condition & (player_data.mins > 60), 'actual_points'] += 2
player_data.loc[def_condition & (player_data.Gls > 0), 'actual_points'] += 6 * player_data.Gls
player_data.loc[def_condition & (player_data.Ast > 0), 'actual_points'] += 3 * player_data.Ast
player_data.loc[def_condition & (player_data.opponent_goals == 0) & (player_data.redcrd == 0) & (player_data.mins > 60), 'actual_points'] += 4
player_data.loc[def_condition & (player_data.PKatt > 0) & (player_data.PK != player_data.PKatt), 'actual_points'] -= 2 * (player_data.PKatt - player_data.PK)
player_data.loc[def_condition & (player_data.opponent_goals >= 2), 'actual_points'] -= player_data.opponent_goals // 2
player_data.loc[def_condition & (player_data.yellowcrd == 1) & (player_data.redcrd == 0), 'actual_points'] -= 1
player_data.loc[def_condition & (player_data.redcrd == 1), 'actual_points'] -= 3
player_data.loc[def_condition & (player_data.OG > 0), 'actual_points'] -= 2 * player_data.OG

Defenders xg points

In [20]:
player_data.loc[def_condition & (player_data.mins > 0) & (player_data.mins <= 60), 'xg_points'] = player_data.xg_points + 1
player_data.loc[def_condition & (player_data.mins > 60), 'xg_points'] = player_data.xg_points + 2
player_data.loc[def_condition & (player_data.xG > 0), 'xg_points'] = player_data.xg_points + (6 * player_data.xG)
player_data.loc[def_condition & (player_data.xAG > 0), 'xg_points'] = player_data.xg_points + (3 * player_data.xAG)
player_data.loc[def_condition & (player_data.opponent_xg < 1) & (player_data.redcrd == 0) & (player_data.mins > 60), 'xg_points'] = player_data.xg_points + 4
player_data.loc[def_condition & (player_data.PKatt > 0) & (player_data.PK != player_data.PKatt), 'xg_points'] = player_data.xg_points - (2 * (player_data.PKatt - player_data.PK))
player_data.loc[def_condition & (player_data.opponent_xg >= 2), 'xg_points'] = player_data.xg_points - (player_data.xg_points // 2)
player_data.loc[def_condition & (player_data.yellowcrd == 1) & (player_data.redcrd == 0), 'xg_points'] = player_data.xg_points - 1
player_data.loc[def_condition & (player_data.redcrd == 1), 'xg_points'] = player_data.xg_points - 3
player_data.loc[def_condition & (player_data.OG > 0), 'xg_points'] = player_data.xg_points - (player_data.OG * 2)

Midfielders actual points

In [21]:
mid_condition = (player_data.pos == 'MID')

In [22]:
player_data.loc[mid_condition & (player_data.mins <= 60), 'actual_points'] += 1
player_data.loc[mid_condition & (player_data.mins > 60), 'actual_points'] += 2
player_data.loc[mid_condition & (player_data.Gls > 0), 'actual_points'] += 5 * player_data.Gls
player_data.loc[mid_condition & (player_data.Ast > 0), 'actual_points'] += 3 * player_data.Ast
player_data.loc[mid_condition & (player_data.opponent_goals == 0) & (player_data.redcrd == 0) & (player_data.mins > 60), 'actual_points'] += 1
player_data.loc[mid_condition & (player_data.PKatt > 0) & (player_data.PK != player_data.PKatt), 'actual_points'] -= 2 * (player_data.PKatt - player_data.PK)
player_data.loc[mid_condition & (player_data.yellowcrd == 1) & (player_data.redcrd == 0), 'actual_points'] -= 1
player_data.loc[mid_condition & (player_data.redcrd == 1), 'actual_points'] -= 3
player_data.loc[mid_condition & (player_data.OG > 0), 'actual_points'] -= 2 * player_data.OG

Midfielders xg points

In [23]:
player_data.loc[mid_condition & (player_data.mins > 0) & (player_data.mins <= 60), 'xg_points'] += 1
player_data.loc[mid_condition & (player_data.mins > 60), 'xg_points'] += 2
player_data.loc[mid_condition & (player_data.xG > 0), 'xg_points'] += 5 * player_data.xG
player_data.loc[mid_condition & (player_data.xAG > 0), 'xg_points'] += 3 * player_data.xAG
player_data.loc[mid_condition & (player_data.opponent_xg < 1) & (player_data.redcrd == 0) & (player_data.mins > 60), 'xg_points'] += 1
player_data.loc[mid_condition & (player_data.PKatt > 0) & (player_data.PK != player_data.PKatt), 'xg_points'] -= 2 * (player_data.PKatt - player_data.PK)
player_data.loc[mid_condition & (player_data.yellowcrd == 1), 'xg_points'] -= 1
player_data.loc[mid_condition & (player_data.redcrd == 1) & (player_data.yellowcrd > 0), 'xg_points'] -= 3
player_data.loc[mid_condition & (player_data.OG > 0), 'xg_points'] -= player_data.OG * 2

Forwards actual points

In [24]:
fwd_condition = (player_data.pos == 'FWD')

In [25]:
player_data.loc[fwd_condition & (player_data.mins > 0) & (player_data.mins <= 60), 'actual_points'] += 1
player_data.loc[fwd_condition & (player_data.mins > 60), 'actual_points'] += 2
player_data.loc[fwd_condition & (player_data.Gls > 0), 'actual_points'] += 4 * player_data['Gls']
player_data.loc[fwd_condition & (player_data.Ast > 0), 'actual_points'] += 3 * player_data['Ast']
player_data.loc[fwd_condition & (player_data.PKatt > 0) & (player_data.PK != player_data.PKatt), 'actual_points'] -= 2 * (player_data.PKatt - player_data.PK)
player_data.loc[fwd_condition & (player_data.yellowcrd == 1) & (player_data.redcrd == 0), 'actual_points'] -= 1
player_data.loc[fwd_condition & (player_data.redcrd == 1), 'actual_points'] -= 3
player_data.loc[fwd_condition & (player_data.OG > 0), 'actual_points'] -= player_data.OG * 2

Forwards xg points

In [26]:
player_data.loc[fwd_condition & (player_data.mins > 0) & (player_data.mins <= 60), 'xg_points'] += 1
player_data.loc[fwd_condition & (player_data.mins > 60), 'xg_points'] += 2
player_data.loc[fwd_condition & (player_data.xG > 0), 'xg_points'] += 4 * player_data.xG
player_data.loc[fwd_condition & (player_data.xAG > 0), 'xg_points'] += 3 * player_data.xAG
player_data.loc[fwd_condition & (player_data.PKatt > 0) & (player_data.PK != player_data.PKatt), 'xg_points'] -= 2 * (player_data.PKatt - player_data.PK)
player_data.loc[fwd_condition & (player_data.yellowcrd == 1) & (player_data.redcrd == 0), 'xg_points'] -= 1
player_data.loc[fwd_condition & (player_data.redcrd == 1), 'xg_points'] -= 3
player_data.loc[fwd_condition & (player_data.OG > 0), 'xg_points'] -= player_data.OG * 2

Actual bonus points

Creating metrics used in bps calculation that we dont currently have.

In [27]:
player_data['clearances_blocks_interceptions_total'] = player_data.clrearance_tackles + player_data.total_tackles_blocks + player_data.int_tackles
player_data['successful_tackle_net'] = player_data.total_tkl_ballwon - (player_data.total_tkl - player_data.total_tkl_ballwon)
player_data['penalties_missed'] = player_data.PKatt - player_data.PK
player_data['tackled_total'] =  player_data.disposessed_not_including_dribbles + player_data.failed_to_control_ball
player_data['shot_off_target'] = player_data.Sh - player_data.SoT

In [28]:
player_data['BPS'] = 0  

In [29]:
player_data.loc[(player_data.mins > 0) & (player_data.mins <= 60), 'BPS'] += 3
player_data.loc[(player_data.mins > 60), 'BPS'] += 6

In [30]:
player_data.loc[((player_data.pos == 'GK') | (player_data.pos == 'DEF')) & (player_data.Gls > 0), 'BPS'] += player_data.Gls * 12
player_data.loc[((player_data.pos == 'GK') | (player_data.pos == 'DEF')) & (player_data.opponent_goals == 0), 'BPS'] += 12
player_data.loc[(player_data.pos == 'GK') & (player_data.saves > 0), 'BPS'] += player_data.saves * 2
player_data.loc[(player_data.pos == 'MID')  & (player_data.Gls > 0), 'BPS'] += player_data.Gls * 18  
player_data.loc[(player_data.pos == 'FWD')  & (player_data.Gls > 0), 'BPS'] += player_data.Gls * 24

In [31]:
player_data.loc[(player_data.yellowcrd > 0), 'BPS'] -= 3
player_data.loc[(player_data.redcrd > 0), 'BPS'] -= - 9

In [32]:
player_data.loc[(player_data.total_pases_att >= 30) & (player_data['total_pases_cmp%'] >= 70) & (player_data['total_pases_cmp%'] < 80) , 'BPS'] += 2
player_data.loc[(player_data.total_pases_att >= 30) & (player_data['total_pases_cmp%'] >= 80) & (player_data['total_pases_cmp%'] < 90) , 'BPS'] += 4
player_data.loc[(player_data.total_pases_att >= 30) & (player_data['total_pases_cmp%'] >= 90) , 'BPS'] += 6

In [33]:
player_data.loc[(player_data.Ast > 0), 'BPS'] += player_data.Ast * 9
player_data.loc[(player_data.cross_pases > 0), 'BPS'] += player_data.cross_pases
player_data.loc[(player_data.SCAGCA > 0), 'BPS'] += player_data.SCAGCA * 3
player_data.loc[(player_data.dribbles_completed > 0), 'BPS'] += player_data.dribbles_completed
player_data.loc[(player_data.KP > 0), 'BPS'] += player_data.KP

In [34]:
player_data.loc[(player_data.offsides > 0), 'BPS'] -= player_data.offsides
player_data.loc[(player_data.shot_off_target > 0), 'BPS'] -= player_data.shot_off_target
player_data.loc[(player_data.penalties_missed > 0), 'BPS'] -= player_data.penalties_missed * 6

In [35]:
player_data.loc[(player_data.clearances_blocks_interceptions_total > 0), 'BPS'] += player_data.clearances_blocks_interceptions_total //2
player_data.loc[(player_data.recoveries > 0), 'BPS'] += player_data.recoveries //3
player_data.loc[(player_data.successful_tackle_net > 0), 'BPS'] += player_data.successful_tackle_net * 2

In [36]:
player_data.loc[(player_data.pk_conceeded > 0), 'BPS'] -= player_data.pk_conceeded * 3
player_data.loc[(player_data.OG > 0), 'BPS'] -= player_data.OG * 6
player_data.loc[(player_data.err_leading_toshot > 0), 'BPS'] -= player_data.err_leading_toshot
player_data.loc[(player_data.tackled_total > 0), 'BPS'] -= player_data.tackled_total
player_data.loc[(player_data.fouls_conceeded > 0), 'BPS'] -= player_data.fouls_conceeded

Assigning bonus points to best peforming players in match.

In [37]:
player_data = (player_data
               .assign(BPS_rank=player_data.groupby(['game_id'])['BPS'].rank(ascending=False),
                       bonus_points=0)
               .assign(bonus_points=lambda x: x.bonus_points.where(x.BPS_rank != 3, 1))
               .assign(bonus_points=lambda x: x.bonus_points.where(x.BPS_rank != 2, 2))
               .assign(bonus_points=lambda x: x.bonus_points.where(x.BPS_rank != 1, 3))
               .assign(total_points=lambda x: x.actual_points + x.bonus_points))

XG bonus points

In [38]:
player_data['BPS_xg'] = 0  

In [39]:
player_data.loc[(player_data.mins > 0) & (player_data.mins <= 60), 'BPS_xg'] += 3
player_data.loc[(player_data.mins > 60), 'BPS_xg'] += 6

In [40]:
player_data.loc[((player_data.pos == 'GK') | (player_data.pos == 'DEF')) & (player_data.xG > 0), 'BPS_xg'] += player_data.xG * 12
player_data.loc[((player_data.pos == 'GK') | (player_data.pos == 'DEF')) & (player_data.opponent_xg < 1), 'BPS_xg'] += player_data.BPS_xg + 12
player_data.loc[(player_data.pos == 'GK') & (player_data.saves > 0), 'BPS_xg'] += player_data.saves * 2
player_data.loc[(player_data.pos == 'MID')  & (player_data.xG > 0), 'BPS_xg'] += player_data.xG * 18
player_data.loc[(player_data.pos == 'FWD')  & (player_data.xG > 0), 'BPS_xg'] += player_data.xG * 24

In [41]:
player_data.loc[(player_data.yellowcrd > 0), 'BPS_xg'] -= 3
player_data.loc[(player_data.redcrd > 0), 'BPS_xg'] -= 9

In [42]:
player_data.loc[(player_data.total_pases_att >= 30) & (player_data['total_pases_cmp%'] >= 70) & (player_data['total_pases_cmp%'] < 80) , 'BPS_xg'] += 2
player_data.loc[(player_data.total_pases_att >= 30) & (player_data['total_pases_cmp%'] >= 80) & (player_data['total_pases_cmp%'] < 90) , 'BPS_xg'] += 4
player_data.loc[(player_data.total_pases_att >= 30) & (player_data['total_pases_cmp%'] >= 90) , 'BPS_xg'] += 6

In [43]:
player_data.loc[(player_data.xAG > 0), 'BPS_xg'] += player_data.xAG * 9
player_data.loc[(player_data.cross_pases > 0), 'BPS_xg'] += player_data.cross_pases
player_data.loc[(player_data.SCAGCA > 0), 'BPS_xg'] += player_data.SCAGCA * 3
player_data.loc[(player_data.dribbles_completed > 0), 'BPS_xg'] += player_data.dribbles_completed
player_data.loc[(player_data.KP > 0), 'BPS_xg'] += player_data.KP

In [44]:
player_data.loc[(player_data.offsides > 0), 'BPS_xg'] -= player_data.offsides
player_data.loc[(player_data.shot_off_target > 0), 'BPS_xg'] -= player_data.shot_off_target
player_data.loc[(player_data.penalties_missed > 0), 'BPS_xg'] -= player_data.penalties_missed * 6

In [45]:
player_data.loc[(player_data.clearances_blocks_interceptions_total > 0), 'BPS_xg'] += player_data.clearances_blocks_interceptions_total // 2
player_data.loc[(player_data.recoveries > 0), 'BPS_xg'] += player_data.recoveries // 3
player_data.loc[(player_data.successful_tackle_net > 0), 'BPS_xg'] += player_data.successful_tackle_net * 2

In [46]:
player_data.loc[(player_data.pk_conceeded > 0), 'BPS_xg'] -= player_data.pk_conceeded * 3
player_data.loc[(player_data.err_leading_toshot > 0), 'BPS_xg'] -= player_data.err_leading_toshot
player_data.loc[(player_data.OG > 0), 'BPS_xg'] -= player_data.OG * 6
player_data.loc[(player_data.tackled_total > 0), 'BPS_xg'] -= player_data.tackled_total
player_data.loc[(player_data.fouls_conceeded > 0), 'BPS_xg'] -= player_data.fouls_conceeded

In [47]:
player_data = (player_data
               .assign(BPS_xg_rank=player_data.groupby(['game_id'])['BPS_xg'].rank(ascending=False),
                       bonus_points_xg=0)
               .assign(bonus_points_xg=lambda x: x.bonus_points_xg.where(x.BPS_xg_rank != 3, 1))
               .assign(bonus_points_xg=lambda x: x.bonus_points_xg.where(x.BPS_xg_rank != 2, 2))
               .assign(bonus_points_xg=lambda x: x.bonus_points_xg.where(x.BPS_xg_rank != 1, 3))
               .assign(total_points_xg=lambda x: x.xg_points + x.bonus_points_xg)
               .astype({'date': 'datetime64[ns]'}))

In [48]:
player_data.head()

Unnamed: 0,player,number,nation,position_x,age,mins,total_pases_cmp,total_pases_att,total_pases_cmp%,total_pases_dist,pases_prg_dist,short_pases_cmp,short_pases_att,short_pases_cmp%,medium_pases_cmp,medium_pases_att,medium_pases_cmp%,long_pases_cmp,long_pases_att,long_pases_cmp%,ast_pases,xAG,xA,KP,01-Mar,ppa,crspa,prog,live_pases,dead_pases,fk_pases,tb_pases,switch_pases,cross_pases,throwin_pases,corner_pases,inswing_corner,outswing_corner,straight_corner,cmp_pass,offside_pass,blocked_pass,total_tkl,total_tkl_ballwon,def3rd_tackles,mid3rd_tackles,att3rd_tackles,vsdribbles_tackles_cmp,vsdribbles_tackles_att,vsdribbles_tackles_cmp%,vsdribbles_tackles_past,total_tackles_blocks,sh_tackles_blocks,pass_tackles_blocks,int_tackles,tkl+int_tackles,clrearance_tackles,err_leading_toshot,total_touches,touches_in_own_box,touches_in_own_1/3,touches_in_middle_1/3,touches_in_atk_1/3,touches_in_opp_box,total_liveball_touches,dribbles_completed,dribbles_attempted,dribbles_cmp%,failed_to_control_ball,disposessed_not_including_dribbles,total_received_passes,total_prog_received_passes,yellowcrd,redcrd,2ndyellow,fouls_conceeded,fouls_drawn,offsides,croses,tklw,pk_won,pk_conceeded,OG,recoveries,aerials_won,aerials_lost,aerials_cmp%,shots_against_gk,goals_against_gk,saves,save%,post_shot_xg,long_passes_cmp_gk,long_passes_att_gk,long_passes_cmp%_gk,passes_notgkick_att,passes_notgkick_throws,passes_notgkick_over40yrds,average_pass_length_notgkick,goalkick_att,goalkick_over40yrds,goalkick_avelen,attemped_croses_opp,attemped_croses_opp_stopped,attemped_croses_opp_stopped%,defensive_actions_not_in_box,ave_distance_from_goal_defensive_actions,home,game_week,date,position_y,Gls,Ast,PK,PKatt,Sh,SoT,xG,npxG,SCA,SCAGCA,Team,positon,season,DEF,FWD,GK,MID,pos,game_id,opponent_goals,opponent_xg,team,actual_points,xg_points,clearances_blocks_interceptions_total,successful_tackle_net,penalties_missed,tackled_total,shot_off_target,BPS,BPS_rank,bonus_points,total_points,BPS_xg,BPS_xg_rank,bonus_points_xg,total_points_xg
0,Alexandre Lacazette,9,fr FRA,FW,26-120,82,15.0,21.0,71.4,180.0,14.0,9.0,13.0,69.2,5.0,6.0,83.3,0.0,0.0,,0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,20.0,1.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0,15.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0,0.0,2.0,0.0,33.0,2.0,2.0,8.0,23.0,7.0,32.0,5.0,5.0,100.0,3.0,0.0,23.0,7.0,0,0,0,0,2,0,0,0,0.0,0.0,0,2.0,2.0,1.0,66.7,,,,,,,,,,,,,,,,,,,,,1,6,2017-09-25,FW,2,0,1,1,3,1,1.8,1.0,3,0,Arsenal,FWD,2017-18,0,31,0,4,FWD,60,0,0.9,Arsenal,10,9.2,2.0,0.0,0,3.0,2,56,1.0,3,13,51.2,1.0,3,12.2
1,Mesut Özil,11,de GER,AM,28-345,8,11.0,11.0,100.0,147.0,40.0,7.0,7.0,100.0,4.0,4.0,100.0,0.0,0.0,,0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,11.0,0.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0,11.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0,,0.0,0.0,0.0,0.0,1,1.0,0.0,0.0,13.0,1.0,1.0,6.0,6.0,0.0,13.0,0.0,0.0,,0.0,0.0,11.0,2.0,0,0,0,0,0,0,0,0,0.0,0.0,0,0.0,0.0,1.0,0.0,,,,,,,,,,,,,,,,,,,,,1,6,2017-09-25,LW,0,0,0,0,0,0,0.0,0.0,1,0,Arsenal,MID,2017-18,0,0,0,29,MID,60,0,0.9,Arsenal,1,1.0,1.0,0.0,0,0.0,0,3,25.0,0,1,3.0,25.0,0,1.0
2,Granit Xhaka,29,ch SUI,CM,24-363,90,65.0,76.0,85.5,1112.0,242.0,23.0,27.0,85.2,33.0,36.0,91.7,6.0,10.0,60.0,0,0.0,0.1,1.0,3.0,0.0,0.0,7.0,67.0,9.0,4.0,0.0,1.0,7,0.0,5.0,1.0,1.0,1.0,65.0,0.0,0.0,1.0,1,1.0,0.0,0.0,0.0,0.0,,0.0,0.0,0.0,0.0,0,1.0,1.0,0.0,85.0,2.0,8.0,47.0,31.0,2.0,85.0,0.0,1.0,0.0,2.0,1.0,62.0,1.0,0,0,0,2,1,0,7,1,0.0,0.0,0,7.0,1.0,2.0,33.3,,,,,,,,,,,,,,,,,,,,,1,6,2017-09-25,DM,0,0,0,0,3,0,0.2,0.2,1,0,Arsenal,MID,2017-18,0,0,0,42,MID,60,0,0.9,Arsenal,3,4.0,1.0,1.0,0,3.0,3,14,12.0,0,3,17.6,10.0,0,4.0
3,Mohamed Elneny,35,eg EGY,CM,25-076,90,72.0,78.0,92.3,1229.0,276.0,38.0,40.0,95.0,27.0,29.0,93.1,7.0,7.0,100.0,0,0.1,0.3,2.0,8.0,3.0,0.0,7.0,77.0,1.0,1.0,0.0,1.0,1,0.0,0.0,0.0,0.0,0.0,72.0,0.0,1.0,0.0,0,0.0,0.0,0.0,0.0,0.0,,0.0,0.0,0.0,0.0,0,0.0,0.0,1.0,82.0,0.0,3.0,49.0,31.0,3.0,82.0,0.0,0.0,,0.0,2.0,73.0,3.0,0,0,0,0,3,0,1,0,0.0,0.0,0,8.0,1.0,1.0,50.0,,,,,,,,,,,,,,,,,,,,,1,6,2017-09-25,DM,0,0,0,0,1,0,0.0,0.0,6,0,Arsenal,MID,2017-18,0,0,0,14,MID,60,0,0.9,Arsenal,3,3.3,0.0,0.0,0,2.0,1,13,13.0,0,3,13.9,15.0,0,3.3
4,Aaron Ramsey,8,wls WAL,AM,26-273,89,47.0,62.0,75.8,618.0,63.0,28.0,35.0,80.0,14.0,20.0,70.0,1.0,2.0,50.0,0,1.0,0.0,0.0,2.0,0.0,0.0,0.0,61.0,1.0,0.0,1.0,1.0,1,1.0,0.0,0.0,0.0,0.0,47.0,0.0,1.0,1.0,0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0,1.0,0.0,0.0,67.0,0.0,3.0,27.0,38.0,4.0,67.0,1.0,1.0,100.0,3.0,0.0,61.0,13.0,0,0,0,1,3,0,1,0,1.0,0.0,0,5.0,0.0,0.0,,,,,,,,,,,,,,,,,,,,,,1,6,2017-09-25,RW,0,1,0,0,2,1,0.2,0.2,1,1,Arsenal,MID,2017-18,0,0,0,27,MID,60,0,0.9,Arsenal,6,7.0,0.0,-1.0,0,3.0,1,18,8.0,0,6,21.6,8.0,0,7.0


Removing all columns we no longer need.

In [49]:
player_data = player_data[['player', 'number', 'mins', 'game_week', 'season', 'date', 'game_id',
                           'pos', 'Team', 'bonus_points', 'total_points', 'total_points_xg', 'Gls', 'xG', 'Ast',
                           'xAG', 'opponent_goals']].drop_duplicates(subset=['player', 'date'])

In [50]:
player_data.head()

Unnamed: 0,player,number,mins,game_week,season,date,game_id,pos,Team,bonus_points,total_points,total_points_xg,Gls,xG,Ast,xAG,opponent_goals
0,Alexandre Lacazette,9,82,6,2017-18,2017-09-25,60,FWD,Arsenal,3,13,12.2,2,1.8,0,0.0,0
1,Mesut Özil,11,8,6,2017-18,2017-09-25,60,MID,Arsenal,0,1,1.0,0,0.0,0,0.0,0
2,Granit Xhaka,29,90,6,2017-18,2017-09-25,60,MID,Arsenal,0,3,4.0,0,0.2,0,0.0,0
3,Mohamed Elneny,35,90,6,2017-18,2017-09-25,60,MID,Arsenal,0,3,3.3,0,0.0,0,0.1,0
4,Aaron Ramsey,8,89,6,2017-18,2017-09-25,60,MID,Arsenal,0,6,7.0,0,0.2,1,1.0,0


In [51]:
player_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19537 entries, 0 to 19536
Data columns (total 17 columns):
 #   Column           Non-Null Count  Dtype         
---  ------           --------------  -----         
 0   player           19537 non-null  object        
 1   number           19537 non-null  int64         
 2   mins             19537 non-null  int64         
 3   game_week        19537 non-null  int64         
 4   season           19537 non-null  object        
 5   date             19537 non-null  datetime64[ns]
 6   game_id          19537 non-null  int64         
 7   pos              19537 non-null  object        
 8   Team             19537 non-null  object        
 9   bonus_points     19537 non-null  int64         
 10  total_points     19537 non-null  int64         
 11  total_points_xg  19537 non-null  float64       
 12  Gls              19537 non-null  int64         
 13  xG               19537 non-null  float64       
 14  Ast              19537 non-null  int64

## 2) Getting Player Form

Currently, our point calculations only include players who have earned at least 1 point in each game. However, we need to address cases where players have participated in certain game weeks but not in others. For instance, Mohamed Elneny, who played in the first three game weeks of the 2017-18 season but did not participate again until game week 24. If we were to calculate Elneny's 3-game form using our current dataset, it would use game weeks 7, 8, and 24 for game week 25, instead of the correct game weeks 22, 23, and 24.

In [52]:
player_data[player_data.player == 'Mohamed Elneny'].head()

Unnamed: 0,player,number,mins,game_week,season,date,game_id,pos,Team,bonus_points,total_points,total_points_xg,Gls,xG,Ast,xAG,opponent_goals
3,Mohamed Elneny,35,90,6,2017-18,2017-09-25,60,MID,Arsenal,0,3,3.3,0,0.0,0,0.1,0
17,Mohamed Elneny,35,8,7,2017-18,2017-10-01,70,MID,Arsenal,0,1,1.0,0,0.0,0,0.0,0
32,Mohamed Elneny,35,90,8,2017-18,2017-10-14,77,MID,Arsenal,0,2,2.0,0,0.0,0,0.0,2
247,Mohamed Elneny,35,90,24,2017-18,2018-01-20,234,MID,Arsenal,0,2,3.0,0,0.0,0,0.0,1
261,Mohamed Elneny,35,59,25,2017-18,2018-01-30,242,MID,Arsenal,0,0,0.0,0,0.0,0,0.0,3


We need to add players to games where they didnt play at all and got 0 points in order to get the actual form of a player not just the form of the last game they played. 

In [53]:
players = player_data.groupby(['season', 'player', 'Team', 'pos'], as_index=False).agg({'player': 'first'})

In [54]:
players.head()

Unnamed: 0,season,Team,pos,player
0,2017-18,West Ham,DEF,Aaron Cresswell
1,2017-18,Burnley,MID,Aaron Lennon
2,2017-18,Everton,MID,Aaron Lennon
3,2017-18,Huddersfield,MID,Aaron Mooy
4,2017-18,Arsenal,MID,Aaron Ramsey


Getting 38 seperate rows for each player in each season representing the 38 games each team plays in a Premeir League season.

In [55]:
ids_df = pd.DataFrame({'id': range(1, 39)})

In [56]:
players['key'] = 1
ids_df['key'] = 1

In [57]:
ids_df.head()

Unnamed: 0,id,key
0,1,1
1,2,1
2,3,1
3,4,1
4,5,1


Adding blank game weeks to current player data.

In [58]:
players_with_ids = (pd.merge(players, ids_df, on='key')
                    .drop('key', axis=1)
                    .sort_values(['season', 'player', 'Team', 'pos'])
                    .reset_index(drop=True))

In [59]:
players_with_ids.head()

Unnamed: 0,season,Team,pos,player,id
0,2017-18,West Ham,DEF,Aaron Cresswell,1
1,2017-18,West Ham,DEF,Aaron Cresswell,2
2,2017-18,West Ham,DEF,Aaron Cresswell,3
3,2017-18,West Ham,DEF,Aaron Cresswell,4
4,2017-18,West Ham,DEF,Aaron Cresswell,5


In [60]:
player_data_with_missing_rows = pd.merge(player_data, players_with_ids, 
                                         left_on=['player', 'season', 'game_week', 'Team', 'pos'],
                                         right_on=['player', 'season', 'id', 'Team', 'pos'],
                                         how='right')

Now we have rows for every game week even if player didnt get any fpl points.

In [61]:
player_data_with_missing_rows[player_data_with_missing_rows.player == 'Mohamed Elneny'].head(10)

Unnamed: 0,player,number,mins,game_week,season,date,game_id,pos,Team,bonus_points,total_points,total_points_xg,Gls,xG,Ast,xAG,opponent_goals,id
13490,Mohamed Elneny,,,,2017-18,NaT,,MID,Arsenal,,,,,,,,,1
13491,Mohamed Elneny,,,,2017-18,NaT,,MID,Arsenal,,,,,,,,,2
13492,Mohamed Elneny,,,,2017-18,NaT,,MID,Arsenal,,,,,,,,,3
13493,Mohamed Elneny,,,,2017-18,NaT,,MID,Arsenal,,,,,,,,,4
13494,Mohamed Elneny,,,,2017-18,NaT,,MID,Arsenal,,,,,,,,,5
13495,Mohamed Elneny,35.0,90.0,6.0,2017-18,2017-09-25,60.0,MID,Arsenal,0.0,3.0,3.3,0.0,0.0,0.0,0.1,0.0,6
13496,Mohamed Elneny,35.0,8.0,7.0,2017-18,2017-10-01,70.0,MID,Arsenal,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,7
13497,Mohamed Elneny,35.0,90.0,8.0,2017-18,2017-10-14,77.0,MID,Arsenal,0.0,2.0,2.0,0.0,0.0,0.0,0.0,2.0,8
13498,Mohamed Elneny,,,,2017-18,NaT,,MID,Arsenal,,,,,,,,,9
13499,Mohamed Elneny,,,,2017-18,NaT,,MID,Arsenal,,,,,,,,,10


Filling NaN values in rows where player doesnt play.

In [62]:
player_data_with_missing_rows['game_week'] = player_data_with_missing_rows.game_week.fillna(player_data_with_missing_rows.id)
player_data_with_missing_rows = player_data_with_missing_rows.fillna(0).drop('id', axis=1)

In [63]:
player_data_with_missing_rows[player_data_with_missing_rows.player == 'Mohamed Elneny'].head(10)

Unnamed: 0,player,number,mins,game_week,season,date,game_id,pos,Team,bonus_points,total_points,total_points_xg,Gls,xG,Ast,xAG,opponent_goals
13490,Mohamed Elneny,0.0,0.0,1.0,2017-18,0,0.0,MID,Arsenal,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
13491,Mohamed Elneny,0.0,0.0,2.0,2017-18,0,0.0,MID,Arsenal,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
13492,Mohamed Elneny,0.0,0.0,3.0,2017-18,0,0.0,MID,Arsenal,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
13493,Mohamed Elneny,0.0,0.0,4.0,2017-18,0,0.0,MID,Arsenal,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
13494,Mohamed Elneny,0.0,0.0,5.0,2017-18,0,0.0,MID,Arsenal,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
13495,Mohamed Elneny,35.0,90.0,6.0,2017-18,2017-09-25 00:00:00,60.0,MID,Arsenal,0.0,3.0,3.3,0.0,0.0,0.0,0.1,0.0
13496,Mohamed Elneny,35.0,8.0,7.0,2017-18,2017-10-01 00:00:00,70.0,MID,Arsenal,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0
13497,Mohamed Elneny,35.0,90.0,8.0,2017-18,2017-10-14 00:00:00,77.0,MID,Arsenal,0.0,2.0,2.0,0.0,0.0,0.0,0.0,2.0
13498,Mohamed Elneny,0.0,0.0,9.0,2017-18,0,0.0,MID,Arsenal,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
13499,Mohamed Elneny,0.0,0.0,10.0,2017-18,0,0.0,MID,Arsenal,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


With the inclusion of games where players didn't participate, we can now calculate performance form. To gauge player form, we will analyze the percentage of team goals and assists they have contributed to over the past five games. This metric will provide insight into the player's recent performance and their impact on the team's offensive output.

In [64]:
player_data_with_missing_rows['team_actual_goals'] = player_data_with_missing_rows.groupby(['season', 'game_week', 'Team'])['Gls'].transform('sum')
player_data_with_missing_rows['team_xg_goals'] = player_data_with_missing_rows.groupby(['season', 'game_week', 'Team'])['xG'].transform('sum')

In [65]:
player_data_with_missing_rows['percentage_of_teams_actual_goals'] = (player_data_with_missing_rows.Gls / player_data_with_missing_rows.team_actual_goals).fillna(0).round(2)
player_data_with_missing_rows['percentage_of_teams_xg_goals'] = (player_data_with_missing_rows.xG / player_data_with_missing_rows.team_xg_goals).fillna(0).round(2)

In [66]:
player_data_with_missing_rows['team_actual_assists'] = player_data_with_missing_rows.groupby(['season', 'game_week', 'Team'])['Ast'].transform('sum')
player_data_with_missing_rows['team_xg_assists'] = player_data_with_missing_rows.groupby(['season', 'game_week', 'Team'])['xAG'].transform('sum')

In [67]:
player_data_with_missing_rows['percentage_of_teams_actual_assists'] = (player_data_with_missing_rows.Ast / player_data_with_missing_rows.team_actual_assists).fillna(0).round(2)
player_data_with_missing_rows['percentage_of_teams_xg_assists'] = (player_data_with_missing_rows.xAG / player_data_with_missing_rows.team_xg_assists).fillna(0).round(2)

In [68]:
player_data_with_missing_rows[920:925][['player', 'game_week', 'Team', 'Gls', 'team_actual_goals', 'percentage_of_teams_actual_goals']]

Unnamed: 0,player,game_week,Team,Gls,team_actual_goals,percentage_of_teams_actual_goals
920,Alexandre Lacazette,9.0,Arsenal,1.0,5.0,0.2
921,Alexandre Lacazette,10.0,Arsenal,0.0,2.0,0.0
922,Alexandre Lacazette,11.0,Arsenal,1.0,1.0,1.0
923,Alexandre Lacazette,12.0,Arsenal,0.0,2.0,0.0
924,Alexandre Lacazette,13.0,Arsenal,0.0,1.0,0.0


Adding fpl_week_day as we need to conform to the fpl game week scedule not the fbref game week scedule as discussed in previous notebooks.

In [69]:
fpl_data = (team_data
            .loc[:, ['season', 'fpl_game_week', 'game_week', 'team', 'date']]
            .astype({'date': 'datetime64[ns]'})
            .assign(season=team_data.season
                    .str.replace('2017-2018', '2017-18')
                    .str.replace('2018-2019', '2018-19')))

In [70]:
fpl_data.head()

Unnamed: 0,season,fpl_game_week,game_week,team,date
0,2017-18,6,6,Watford,2017-09-23
1,2017-18,6,6,Swansea City,2017-09-23
2,2017-18,6,6,Burnley,2017-09-23
3,2017-18,6,6,Huddersfield,2017-09-23
4,2017-18,6,6,Crystal Palace,2017-09-23


In [71]:
player_data_with_fpl_week = pd.merge(fpl_data, player_data_with_missing_rows,
                                     left_on=['team', 'season', 'game_week'], right_on=['Team', 'season', 'game_week'])

In [72]:
player_data_with_fpl_week.head()

Unnamed: 0,season,fpl_game_week,game_week,team,date_x,player,number,mins,date_y,game_id,pos,Team,bonus_points,total_points,total_points_xg,Gls,xG,Ast,xAG,opponent_goals,team_actual_goals,team_xg_goals,percentage_of_teams_actual_goals,percentage_of_teams_xg_goals,team_actual_assists,team_xg_assists,percentage_of_teams_actual_assists,percentage_of_teams_xg_assists
0,2017-18,6,6,Watford,2017-09-23,Abdoulaye Doucouré,16.0,90.0,2017-09-23 00:00:00,51.0,MID,Watford,0.0,1.0,1.3,0.0,0.0,0.0,0.1,1.0,2.0,1.7,0.0,0.0,1.0,0.5,0.0,0.2
1,2017-18,6,6,Watford,2017-09-23,Adrian Mariappa,6.0,90.0,2017-09-23 00:00:00,51.0,DEF,Watford,0.0,2.0,2.0,0.0,0.0,0.0,0.0,1.0,2.0,1.7,0.0,0.0,1.0,0.5,0.0,0.0
2,2017-18,6,6,Watford,2017-09-23,Andre Gray,18.0,85.0,2017-09-23 00:00:00,51.0,FWD,Watford,0.0,6.0,7.6,1.0,0.9,0.0,0.0,1.0,2.0,1.7,0.5,0.53,1.0,0.5,0.0,0.0
3,2017-18,6,6,Watford,2017-09-23,André Carrillo,28.0,73.0,2017-09-23 00:00:00,51.0,MID,Watford,0.0,2.0,2.0,0.0,0.0,0.0,0.0,1.0,2.0,1.7,0.0,0.0,1.0,0.5,0.0,0.0
4,2017-18,6,6,Watford,2017-09-23,Ben Watson,0.0,0.0,0,0.0,MID,Watford,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,1.7,0.0,0.0,1.0,0.5,0.0,0.0


In order to assess player performance form from the last five games, it is important to consider FPL injury data, which we dont have access to (past seasons). Determining player availability for the next game is a crucial aspect of player selection. Without injury data, our best option is to calculate the average number of minutes played by the player in the previous two games. This average serves as an indication of the player's likelihood to start in the next game. It's important to note that having access to injury data in real-time would greatly enhance our ability to predict player starts. However, given the lack of injury data, this method is the best we can do.

In [73]:
player_data_with_fpl_week.sort_values(by=['date_x'], inplace=True)

Getting number of minutes played in last two games.

In [74]:
player_data_with_fpl_week['mins_played_last'] = (player_data_with_fpl_week
                                                 .groupby('player', sort=True)['mins']
                                                 .transform(lambda x: x.rolling(3, min_periods=3)
                                                            .sum()
                                                            .sub(player_data_with_fpl_week.mins)
                                                            .div(2)))

Getting form for past 5 games function.

In [75]:
def get_form(df, col_name, metric):
    return df.assign(**{col_name: lambda x: x.groupby('player', sort=True)[metric]
                        .transform(lambda x: x.rolling(6, min_periods=6)
                                   .sum()
                                   .sub(df[metric])
                                   .div(5))})

In [76]:
df = (get_form(player_data_with_fpl_week, 'percentage_of_teams_actual_goals_last_5', 'percentage_of_teams_actual_goals')
      .pipe(get_form, 'percentage_of_teams_xg_goals_last_5', 'percentage_of_teams_xg_goals')
      .pipe(get_form, 'percentage_of_teams_actual_assists_last_5', 'percentage_of_teams_actual_assists')
      .pipe(get_form, 'percentage_of_teams_xg_assists_last_5', 'percentage_of_teams_xg_assists'))

In [77]:
final_player_form = df[['player', 'number', 'pos', 'fpl_game_week', 'season', 'Team', 'mins', 'game_id',
                        'date_x', 'total_points', 'total_points_xg', 'mins_played_last', 
                        'percentage_of_teams_actual_goals_last_5', 'percentage_of_teams_xg_goals_last_5', 
                        'percentage_of_teams_actual_assists_last_5', 'percentage_of_teams_xg_assists_last_5', 'opponent_goals']]

In [78]:
final_player_form.tail()

Unnamed: 0,player,number,pos,fpl_game_week,season,Team,mins,game_id,date_x,total_points,total_points_xg,mins_played_last,percentage_of_teams_actual_goals_last_5,percentage_of_teams_xg_goals_last_5,percentage_of_teams_actual_assists_last_5,percentage_of_teams_xg_assists_last_5,opponent_goals
35881,David Luiz,30.0,DEF,38,2018-19,Chelsea,90.0,754.0,2019-05-12,6.0,6.0,89.0,0.066,0.034,0.0,0.084,0.0
35880,César Azpilicueta,28.0,DEF,38,2018-19,Chelsea,90.0,754.0,2019-05-12,6.0,6.0,90.0,0.0,0.018,0.1,0.018,0.0
35879,Cesc Fàbregas,0.0,MID,38,2018-19,Chelsea,0.0,0.0,2019-05-12,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
35892,Olivier Giroud,18.0,FWD,38,2018-19,Chelsea,7.0,754.0,2019-05-12,1.0,1.0,6.0,0.0,0.056,0.0,0.0,0.0
36227,Łukasz Fabiański,1.0,GK,38,2018-19,West Ham,90.0,760.0,2019-05-12,4.0,3.0,90.0,0.0,0.0,0.0,0.0,1.0


In [105]:
final_player_form.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 38412 entries, 0 to 38411
Data columns (total 23 columns):
 #   Column                                     Non-Null Count  Dtype         
---  ------                                     --------------  -----         
 0   player                                     38412 non-null  object        
 1   number                                     38412 non-null  float64       
 2   pos                                        38412 non-null  object        
 3   fpl_game_week                              38412 non-null  int64         
 4   season                                     38412 non-null  object        
 5   Team                                       38412 non-null  object        
 6   mins                                       38412 non-null  float64       
 7   game_id                                    38412 non-null  float64       
 8   date_x                                     38412 non-null  datetime64[ns]
 9   total_points     

In [106]:
final_player_form.to_csv('player_form.csv')

## 3) Predicting Player Points

Having gathered all the necessary data points, we are now ready to predict weekly individual player point. To begin, we need to import the files containing the predicted goals scored, goals conceded, and clean sheet probability created in previous notebooks.

In [80]:
cols = ['season', 'fpl_game_week', 'team', 'pred_goals_scored', 'pred_goals_conceded', 'clean_sheet', 'cleen_sheet_prob', 'team_goals', 'opponent_goals']
predicted = pd.read_csv('data/predict_goals_cs.csv', usecols=cols)

In [81]:
predicted.head()

Unnamed: 0,season,fpl_game_week,team,team_goals,opponent_goals,pred_goals_scored,pred_goals_conceded,clean_sheet,cleen_sheet_prob
0,2017-18,6,Watford,2,1,1.213222,1.545529,0,0.237033
1,2017-18,6,Swansea City,1,2,1.545529,1.213222,0,0.324177
2,2017-18,6,Burnley,0,0,1.53414,0.882562,1,0.377977
3,2017-18,6,Huddersfield,0,0,0.882562,1.53414,1,0.207109
4,2017-18,6,Crystal Palace,0,5,0.389978,2.447532,0,0.01


Adding predicted metrics to player data.

In [82]:
final_player_form = final_player_form.merge(predicted[['team', 'fpl_game_week', 'season', 'pred_goals_scored',
                                                           'pred_goals_conceded', 'cleen_sheet_prob']],
                                            left_on=['Team', 'fpl_game_week', 'season'],
                                            right_on=['team', 'fpl_game_week', 'season'],
                                            how='left')

In [83]:
final_player_form.head()

Unnamed: 0,player,number,pos,fpl_game_week,season,Team,mins,game_id,date_x,total_points,total_points_xg,mins_played_last,percentage_of_teams_actual_goals_last_5,percentage_of_teams_xg_goals_last_5,percentage_of_teams_actual_assists_last_5,percentage_of_teams_xg_assists_last_5,opponent_goals,team,pred_goals_scored,pred_goals_conceded,cleen_sheet_prob
0,Abdoulaye Doucouré,16.0,MID,6,2017-18,Watford,90.0,51.0,2017-09-23,1.0,1.3,,,,,,1.0,Watford,1.213222,1.545529,0.237033
1,Aaron Cresswell,3.0,DEF,6,2017-18,West Ham,90.0,56.0,2017-09-23,1.0,2.0,,,,,,3.0,West Ham,0.937314,1.989058,0.136851
2,Érik Lamela,0.0,MID,6,2017-18,Tottenham,0.0,0.0,2017-09-23,0.0,0.0,,,,,,0.0,Tottenham,1.989058,0.937314,0.412658
3,Victor Wanyama,0.0,MID,6,2017-18,Tottenham,0.0,0.0,2017-09-23,0.0,0.0,,,,,,0.0,Tottenham,1.989058,0.937314,0.412658
4,Toby Alderweireld,4.0,DEF,6,2017-18,Tottenham,90.0,56.0,2017-09-23,1.0,6.0,,,,,,2.0,Tottenham,1.989058,0.937314,0.412658


Predicting player points based on form, teams predicted goals scored, conceded, clean sheet probability and the FPL point criteria.

In [84]:
is_gk_def = (final_player_form.pos.isin(['GK', 'DEF']))
is_mid = (final_player_form.pos == 'MID')
is_fwd = (final_player_form.pos == 'FWD')

Actual

In [85]:
final_player_form['predicted_points_actual'] = 0

In [86]:
final_player_form.loc[is_gk_def, 'predicted_points_actual'] = (
    final_player_form.cleen_sheet_prob * 4
    + final_player_form.percentage_of_teams_actual_assists_last_5 * final_player_form.pred_goals_scored * 4
    + final_player_form.percentage_of_teams_actual_goals_last_5 * final_player_form.pred_goals_scored * 6
)

In [87]:
final_player_form.loc[is_mid, 'predicted_points_actual'] = (
    final_player_form.cleen_sheet_prob * 1
    + final_player_form.percentage_of_teams_actual_assists_last_5 * final_player_form.pred_goals_scored * 3
    + final_player_form.percentage_of_teams_actual_goals_last_5 * final_player_form.pred_goals_scored * 5
)

In [88]:
final_player_form.loc[is_fwd, 'predicted_points_actual'] = (
    final_player_form.percentage_of_teams_actual_assists_last_5 * final_player_form.pred_goals_scored * 3
    + final_player_form.percentage_of_teams_actual_goals_last_5 * final_player_form.pred_goals_scored * 4
)

To determine a player's likely playing time in the upcoming match, we establish the following guidelines:

- If a player has played, on average, more than 60 minutes in their previous two games, it is assumed that they will start and play at least 60 minutes in the next match.

- If a player has played less than an average of 60 minutes but more than 0 minutes in their previous two games, it is assumed that they will play in the upcoming match, but for less than 60 minutes.

- If a player has not played any minutes in their last two games, it is assumed that they will not feature at all in the next match.

In [89]:
final_player_form.loc[final_player_form.mins_played_last == 0, 'predicted_points_actual'] = 0
final_player_form.loc[final_player_form.mins_played_last > 60, 'predicted_points_actual'] += 2
final_player_form.loc[(final_player_form.mins_played_last > 0) & (final_player_form.mins_played_last <= 60), 'predicted_points_actual'] += 1

XG

In [90]:
final_player_form['predicted_points_xg'] = 0

In [91]:
final_player_form.loc[is_gk_def, 'predicted_points_xg'] = (
    final_player_form.cleen_sheet_prob * 4
    + final_player_form.percentage_of_teams_xg_assists_last_5 * final_player_form.pred_goals_scored * 3
    + final_player_form.percentage_of_teams_xg_goals_last_5 * final_player_form.pred_goals_scored * 6
)

In [92]:
final_player_form.loc[is_mid, 'predicted_points_xg'] = (
    final_player_form.cleen_sheet_prob * 1
    + final_player_form.percentage_of_teams_xg_assists_last_5 * final_player_form.pred_goals_scored * 3
    + final_player_form.percentage_of_teams_xg_goals_last_5 * final_player_form.pred_goals_scored * 5
)

In [93]:
final_player_form.loc[is_fwd, 'predicted_points_xg'] = (
    final_player_form.percentage_of_teams_xg_assists_last_5 * final_player_form.pred_goals_scored * 3
    + final_player_form.percentage_of_teams_xg_goals_last_5 * final_player_form.pred_goals_scored * 4
)

In [94]:
final_player_form.loc[final_player_form.mins_played_last == 0, 'predicted_points_xg'] = 0
final_player_form.loc[final_player_form.mins_played_last > 60, 'predicted_points_xg'] += 2
final_player_form.loc[(final_player_form.mins_played_last > 0) & (final_player_form.mins_played_last <= 60), 'predicted_points_xg'] += 1

In [95]:
final_player_form.head()

Unnamed: 0,player,number,pos,fpl_game_week,season,Team,mins,game_id,date_x,total_points,total_points_xg,mins_played_last,percentage_of_teams_actual_goals_last_5,percentage_of_teams_xg_goals_last_5,percentage_of_teams_actual_assists_last_5,percentage_of_teams_xg_assists_last_5,opponent_goals,team,pred_goals_scored,pred_goals_conceded,cleen_sheet_prob,predicted_points_actual,predicted_points_xg
0,Abdoulaye Doucouré,16.0,MID,6,2017-18,Watford,90.0,51.0,2017-09-23,1.0,1.3,,,,,,1.0,Watford,1.213222,1.545529,0.237033,,
1,Aaron Cresswell,3.0,DEF,6,2017-18,West Ham,90.0,56.0,2017-09-23,1.0,2.0,,,,,,3.0,West Ham,0.937314,1.989058,0.136851,,
2,Érik Lamela,0.0,MID,6,2017-18,Tottenham,0.0,0.0,2017-09-23,0.0,0.0,,,,,,0.0,Tottenham,1.989058,0.937314,0.412658,,
3,Victor Wanyama,0.0,MID,6,2017-18,Tottenham,0.0,0.0,2017-09-23,0.0,0.0,,,,,,0.0,Tottenham,1.989058,0.937314,0.412658,,
4,Toby Alderweireld,4.0,DEF,6,2017-18,Tottenham,90.0,56.0,2017-09-23,1.0,6.0,,,,,,2.0,Tottenham,1.989058,0.937314,0.412658,,


Top 10 projected overall scorers over 2018-19 season.

In [96]:
(final_player_form
 .query('season == "2018-19"')
 .groupby('player')
 .agg({'predicted_points_actual': 'sum', 'predicted_points_xg': 'sum'})
 .sort_values(by='predicted_points_xg', ascending=False)
 .head(10))

Unnamed: 0_level_0,predicted_points_actual,predicted_points_xg
player,Unnamed: 1_level_1,Unnamed: 2_level_1
Mohamed Salah,220.314816,246.539679
Raheem Sterling,220.759857,241.660845
Paul Pogba,176.261734,202.922922
Sadio Mané,175.956169,199.313762
Eden Hazard,201.926929,198.999148
Pierre-Emerick Aubameyang,168.222766,176.493474
Andrew Robertson,190.171489,176.000874
Aymeric Laporte,186.581485,174.659464
Sergio Agüero,191.264085,172.472494
Virgil van Dijk,162.519951,171.028666


Correlation between total and predicted points.

In [97]:
final_player_form['total_points'].corr(final_player_form.predicted_points_actual)

0.4435483042550687

In line with team form, predicted player points based on form using expected goals (xG) data yields more accurate predictions compared to using actual data. Therefore, we will rely on this metric to make our final player point predictions.

In [98]:
final_player_form['total_points'].corr(final_player_form.predicted_points_xg)

0.4585384603006409

In [99]:
final_player_form['total_points_xg'].corr(final_player_form.predicted_points_actual)

0.5303382027954703

In [100]:
final_player_form['total_points_xg'].corr(final_player_form.predicted_points_xg)

0.5417257987426848

In [101]:
predict_player_points = (final_player_form
                         .loc[:, ['player', 'number', 'pos', 'season', 'fpl_game_week', 'Team', 'mins',
                                  'game_id', 'date_x', 'total_points', 'total_points_xg', 'predicted_points_actual',
                                  'predicted_points_xg', 'opponent_goals']]
                         .rename(columns={'Team': 'team'}))

In [102]:
predict_player_points.head()

Unnamed: 0,player,number,pos,season,fpl_game_week,team,mins,game_id,date_x,total_points,total_points_xg,predicted_points_actual,predicted_points_xg,opponent_goals
0,Abdoulaye Doucouré,16.0,MID,2017-18,6,Watford,90.0,51.0,2017-09-23,1.0,1.3,,,1.0
1,Aaron Cresswell,3.0,DEF,2017-18,6,West Ham,90.0,56.0,2017-09-23,1.0,2.0,,,3.0
2,Érik Lamela,0.0,MID,2017-18,6,Tottenham,0.0,0.0,2017-09-23,0.0,0.0,,,0.0
3,Victor Wanyama,0.0,MID,2017-18,6,Tottenham,0.0,0.0,2017-09-23,0.0,0.0,,,0.0
4,Toby Alderweireld,4.0,DEF,2017-18,6,Tottenham,90.0,56.0,2017-09-23,1.0,6.0,,,2.0


Looping over game weeks, predicting player points for all future games based on their current form.

In [103]:
game_weeks = [i for i in range(1,39)]

In [104]:
for week in game_weeks:
    player_data = (final_player_form
                   .query(f'fpl_game_week <= {week}')
                   .loc[:, ['team', 'player', 'number', 'pos', 'season', 'fpl_game_week', 'mins_played_last', 
                            'percentage_of_teams_actual_goals_last_5','percentage_of_teams_xg_goals_last_5',
                            'percentage_of_teams_actual_assists_last_5', 'percentage_of_teams_xg_assists_last_5']]
                   .drop_duplicates(subset=['player', 'team'], keep='last'))

    predicted_metrics = pd.read_csv(fr'C:\Users\adamc\OneDrive\github\fantasy_football\fantasy_football_predicting_clean_sheet\predicting_weekly_team_clean_sheet\2018-19\game_week_clean_sheet_{week}.csv')

    predict_player_points = pd.merge(player_data, predicted_metrics, on=['team'])
    predict_player_points['predicted_points_actual'] = 0
    predict_player_points['predicted_points_xg'] = 0

    is_gk_def = (predict_player_points.pos.isin(['GK', 'DEF']))
    is_mid = (predict_player_points.pos == 'MID')
    is_fwd = (predict_player_points.pos == 'FWD')

    predict_player_points.loc[is_gk_def, 'predicted_points_actual'] = (
        predict_player_points.cleen_sheet_prob * 4
        + predict_player_points.percentage_of_teams_actual_assists_last_5 * predict_player_points.scored * 4
        + predict_player_points.percentage_of_teams_actual_goals_last_5 * predict_player_points.scored * 6
    )
    predict_player_points.loc[is_mid, 'predicted_points_actual'] = (
        predict_player_points.cleen_sheet_prob * 1
        + predict_player_points.percentage_of_teams_actual_assists_last_5 * predict_player_points.scored * 3
        + predict_player_points.percentage_of_teams_actual_goals_last_5 * predict_player_points.scored * 5
    )
    predict_player_points.loc[is_fwd, 'predicted_points_actual'] = (
        predict_player_points.percentage_of_teams_actual_assists_last_5 * predict_player_points.scored * 3
        + predict_player_points.percentage_of_teams_actual_goals_last_5 * predict_player_points.scored * 4
    )

    predict_player_points.loc[predict_player_points.mins_played_last == 0, 'predicted_points_actual'] = 0
    predict_player_points.loc[predict_player_points.mins_played_last > 60, 'predicted_points_actual'] += 2
    predict_player_points.loc[(predict_player_points.mins_played_last > 0) & (predict_player_points.mins_played_last <= 60), 'predicted_points_actual'] += 1

    predict_player_points.loc[is_gk_def, 'predicted_points_xg'] = (
        predict_player_points.cleen_sheet_prob * 4
        + predict_player_points.percentage_of_teams_xg_assists_last_5 * predict_player_points.scored * 3
        + predict_player_points.percentage_of_teams_xg_goals_last_5 * predict_player_points.scored * 6
    )
    predict_player_points.loc[is_mid, 'predicted_points_xg'] = (
        predict_player_points.cleen_sheet_prob * 1
        + predict_player_points.percentage_of_teams_xg_assists_last_5 * predict_player_points.scored * 3
        + predict_player_points.percentage_of_teams_xg_goals_last_5 * predict_player_points.scored * 5
    )
    predict_player_points.loc[is_fwd, 'predicted_points_xg'] = (
        predict_player_points.percentage_of_teams_xg_assists_last_5 * predict_player_points.scored * 3
        + predict_player_points.percentage_of_teams_xg_goals_last_5 * predict_player_points.scored * 4
    )

    predict_player_points.loc[predict_player_points.mins_played_last == 0, 'predicted_points_xg'] = 0
    predict_player_points.loc[predict_player_points.mins_played_last > 60, 'predicted_points_xg'] += 2
    predict_player_points.loc[(predict_player_points.mins_played_last > 0) & (predict_player_points.mins_played_last <= 60), 'predicted_points_xg'] += 1

    predict_player_points.dropna().to_csv(f'predicting_weekly_player_points/2018-19/game_week_player_points_{week}.csv',index=False)

With the predicted points for each player in every match, we have reached the final stage of the process. To complete the selectopm process, we need to consider player costs and adhere to FPL team selection criteria. Merely selecting players based on their predicted points is not feasible since we are limited by a budget of 100m and the best-performing players often have the highest values. In the next notebook, we will explore player value and how to devise a strategy to construct a squad that not only meets the FPL requirements but also maximizes the total points within the given budget and selection constraints.