# Fantasy Football - Selecting Squads

In this notebook, we incorporate player costs and FPL selection criteria to optimise squad selections.

In [1]:
import pandas as pd
import warnings
from functools import reduce
import itertools
import numpy as np
import sklearn.preprocessing as preprocessing
import sklearn.model_selection as model_selection
from sklearn import linear_model
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn import linear_model
from joblib import dump, load

In [2]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
warnings.filterwarnings("ignore")

## 1) Player Value

Selecting an FPL squad requires a delicate balance of including the top-scoring players as well as the best-value players. Spending excessive amounts of your budget on multiple high-cost players (£10m+) will impede the point-scoring ability of the rest of your team, as you will be compelled to fill the remaining spots with significantly low value players in order to stay within the budget. However, exclusively choosing players based on their value for money, while disregarding the highly-priced players who also score the most, will result in lower total points, as you will be missing out on a handful of players who consistently outperform all others, even if they are slightly overvalued.

In this notebook, we will simulate the process of selecting our starting squad for the 2018-19 season. This initial team will remain relatively unchanged until we activate our first wildcard between game weeks 6 and 12. Consequently, our team selection will be based on the projected points for the first 10 game weeks of the 2018-19 season.

Firstly, we can examine the value per pound by creating a column that divides the cost by points. This calculation allows us to assess how efficiently each player's performance translates into their cost.

In [3]:
game_week_1 = pd.read_csv(r'data/game_week_player_points_1.csv')[['team', 'player', 'number', 'pos', 'predicted_points_xg', 'fpl_game_week_y']]

In [4]:
game_week_1.tail()

Unnamed: 0,team,player,number,pos,predicted_points_xg,fpl_game_week_y
12498,Southampton,Wesley Hoedt,6.0,DEF,4.487624,34
12499,Southampton,Wesley Hoedt,6.0,DEF,5.97615,35
12500,Southampton,Wesley Hoedt,6.0,DEF,4.16188,36
12501,Southampton,Wesley Hoedt,6.0,DEF,3.644146,37
12502,Southampton,Wesley Hoedt,6.0,DEF,4.449314,38


Grouping by player and filtering out game weeks after game week 10 to get total predicted points between game week 1 and 10.

In [5]:
point_predictions = (game_week_1.
                     query('fpl_game_week_y <11')
                     .groupby('player', as_index=False)
                     .agg({'player': 'first', 'team': 'first', 'number': 'first', 'pos': 'first', 'predicted_points_xg': 'sum'}))

For the initial team selection of the season, we rely on data from the end of the previous season. However, teams that have recently been promoted do not have historical data available. Therefore, we assign estimated predicted goals scored and conceded to these newly promoted teams. Consequently, we do not consider players from newly promoted teams in our selection process, as their performance is solely based on estimated data.

As you can observe, there is a Cardiff City player predicted to be the fourth-highest scoring player over the first 10 game weeks, based on their form leading into the season. However, since Cardiff City was promoted this season, we will exclude this player from consideration. Players who are ignored due to promotion or due to the fact that they have moved to another team, but are still part of our selection pool as we are using last season's data, will be assigned a 'number' value of 0.0.

In [6]:
point_predictions.sort_values('predicted_points_xg', ascending=False).head(10)

Unnamed: 0,player,team,number,pos,predicted_points_xg
329,Xherdan Shaqiri,Liverpool,23.0,MID,68.458391
234,Mohamed Salah,Liverpool,11.0,MID,61.024615
282,Sadio Mané,Liverpool,10.0,MID,57.438911
249,Oumar Niasse,Cardiff City,0.0,FWD,52.958133
304,Séamus Coleman,Everton,23.0,DEF,52.593695
312,Trent Alexander-Arnold,Liverpool,66.0,DEF,52.214529
253,Patrick van Aanholt,Crystal Palace,3.0,DEF,50.186124
116,Gabriel Jesus,Manchester City,33.0,FWD,49.729179
126,Harry Kane,Tottenham,10.0,FWD,49.23973
173,Joshua King,Bournemouth,17.0,FWD,47.557564


Getting player values for 1st game week from FPL data. We will use the squad numbers to merge with our player data. Note valuies are represented in 100,000 for example 55 represents 5.5 million.

In [7]:
fpl_data = (pd.read_csv(r"data/fpl_values.csv")[['team', 'date', 'squad_number', 'value', 'GW']]
            .astype({'squad_number': 'float'})
            .drop_duplicates(subset=['team', 'squad_number'], keep='first'))

In [8]:
fpl_data['team'] = (fpl_data['team'].
                    str.replace('Man City', 'Manchester City')
                    .str.replace('Leicester', 'Leicester City')
                    .str.replace('Spurs', 'Tottenham')
                    .str.replace('Cardiff', 'Cardiff City')
                    .str.replace('Newcastle', 'Newcastle Utd')
                    .str.replace('Man Utd', 'Manchester Utd'))

In [9]:
fpl_data.head()

Unnamed: 0,team,date,squad_number,value,GW
0,West Ham,12/08/2018,3.0,55,1
38,West Ham,12/08/2018,13.0,45,1
76,West Ham,12/08/2018,20.0,70,1
114,West Ham,12/08/2018,9.0,55,1
152,West Ham,12/08/2018,21.0,45,1


In [10]:
predict_player_points_value = (pd.merge(point_predictions.astype({'number': 'float'}), fpl_data,
                                        left_on=['number', 'team'], right_on=['squad_number', 'team']))

In [11]:
predict_player_points_value.sort_values('predicted_points_xg', ascending=False).head(10)

Unnamed: 0,player,team,number,pos,predicted_points_xg,date,squad_number,value,GW
201,Xherdan Shaqiri,Liverpool,23.0,MID,68.458391,12/08/2018,23.0,75,1
143,Mohamed Salah,Liverpool,11.0,MID,61.024615,12/08/2018,11.0,130,1
171,Sadio Mané,Liverpool,10.0,MID,57.438911,12/08/2018,10.0,95,1
185,Séamus Coleman,Everton,23.0,DEF,52.593695,11/08/2018,23.0,55,1
188,Trent Alexander-Arnold,Liverpool,66.0,DEF,52.214529,12/08/2018,66.0,50,1
153,Patrick van Aanholt,Crystal Palace,3.0,DEF,50.186124,11/08/2018,3.0,55,1
72,Gabriel Jesus,Manchester City,33.0,FWD,49.729179,12/08/2018,33.0,105,1
78,Harry Kane,Tottenham,10.0,FWD,49.23973,11/08/2018,10.0,125,1
106,Joshua King,Bournemouth,17.0,FWD,47.557564,11/08/2018,17.0,65,1
198,Wilfried Zaha,Crystal Palace,11.0,FWD,47.054694,11/08/2018,11.0,70,1


Getting player values.

In [12]:
predict_player_points_value['points_per_pound'] = predict_player_points_value.predicted_points_xg / predict_player_points_value.value

Top players that are predicted to be the best value.

In [13]:
predict_player_points_value.sort_values('points_per_pound', ascending=False).head(10)

Unnamed: 0,player,team,number,pos,predicted_points_xg,date,squad_number,value,GW,points_per_pound
188,Trent Alexander-Arnold,Liverpool,66.0,DEF,52.214529,12/08/2018,66.0,50,1,1.044291
185,Séamus Coleman,Everton,23.0,DEF,52.593695,11/08/2018,23.0,55,1,0.956249
91,James Tomkins,Crystal Palace,5.0,DEF,42.204348,11/08/2018,5.0,45,1,0.937874
201,Xherdan Shaqiri,Liverpool,23.0,MID,68.458391,12/08/2018,23.0,75,1,0.912779
153,Patrick van Aanholt,Crystal Palace,3.0,DEF,50.186124,11/08/2018,3.0,55,1,0.912475
140,Michael Keane,Everton,4.0,DEF,42.72311,11/08/2018,4.0,50,1,0.854462
7,Ainsley Maitland-Niles,Arsenal,15.0,DEF,37.773845,12/08/2018,15.0,45,1,0.839419
51,Cédric Soares,Southampton,2.0,DEF,37.577493,12/08/2018,2.0,45,1,0.835055
41,Cheikhou Kouyaté,Crystal Palace,8.0,MID,41.065864,11/08/2018,8.0,50,1,0.821317
196,Wesley Hoedt,Southampton,6.0,DEF,36.710241,12/08/2018,6.0,45,1,0.815783


To obtain two final scores, namely 'points' and 'value', between 0 and 1 for each player available for selection, we will standardize the total predicted points and value score. This standardization process involves applying min-max scaling, which requires determining the minimum and maximum values.

By using min-max standardization, we can rescale the values to a common range while preserving the relative differences.

In [14]:
min_predicted_points = predict_player_points_value.predicted_points_xg.min().round(3)
min_predicted_points

0.0

In [15]:
max_predicted_points = predict_player_points_value.query('predicted_points_xg>0').predicted_points_xg.max().round(3)
max_predicted_points

68.458

In [16]:
predict_player_points_value['points_score'] = ((predict_player_points_value['predicted_points_xg'] - min_predicted_points) / (max_predicted_points-min_predicted_points))

In [17]:
predict_player_points_value.sort_values('points_score', ascending=False).head(10)

Unnamed: 0,player,team,number,pos,predicted_points_xg,date,squad_number,value,GW,points_per_pound,points_score
201,Xherdan Shaqiri,Liverpool,23.0,MID,68.458391,12/08/2018,23.0,75,1,0.912779,1.000006
143,Mohamed Salah,Liverpool,11.0,MID,61.024615,12/08/2018,11.0,130,1,0.46942,0.891417
171,Sadio Mané,Liverpool,10.0,MID,57.438911,12/08/2018,10.0,95,1,0.60462,0.839039
185,Séamus Coleman,Everton,23.0,DEF,52.593695,11/08/2018,23.0,55,1,0.956249,0.768262
188,Trent Alexander-Arnold,Liverpool,66.0,DEF,52.214529,12/08/2018,66.0,50,1,1.044291,0.762724
153,Patrick van Aanholt,Crystal Palace,3.0,DEF,50.186124,11/08/2018,3.0,55,1,0.912475,0.733094
72,Gabriel Jesus,Manchester City,33.0,FWD,49.729179,12/08/2018,33.0,105,1,0.473611,0.726419
78,Harry Kane,Tottenham,10.0,FWD,49.23973,11/08/2018,10.0,125,1,0.393918,0.719269
106,Joshua King,Bournemouth,17.0,FWD,47.557564,11/08/2018,17.0,65,1,0.731655,0.694697
198,Wilfried Zaha,Crystal Palace,11.0,FWD,47.054694,11/08/2018,11.0,70,1,0.67221,0.687351


In [18]:
min_points_per = predict_player_points_value.points_per_pound.min().round(3)
min_points_per

0.0

In [19]:
max_points_per = predict_player_points_value.query('points_per_pound>0').points_per_pound.max().round(3)
max_points_per

1.044

In [20]:
predict_player_points_value['value_score'] = ((predict_player_points_value['points_per_pound'] - min_points_per) / (max_points_per-min_points_per))

In [21]:
predict_player_points_value.drop(columns=['number', 'date', 'GW', 'points_per_pound'], inplace=True)

In [22]:
predict_player_points_value.sort_values('value_score', ascending=False).head(10)

Unnamed: 0,player,team,pos,predicted_points_xg,squad_number,value,points_score,value_score
188,Trent Alexander-Arnold,Liverpool,DEF,52.214529,66.0,50,0.762724,1.000278
185,Séamus Coleman,Everton,DEF,52.593695,23.0,55,0.768262,0.915947
91,James Tomkins,Crystal Palace,DEF,42.204348,5.0,45,0.6165,0.898347
201,Xherdan Shaqiri,Liverpool,MID,68.458391,23.0,75,1.000006,0.874309
153,Patrick van Aanholt,Crystal Palace,DEF,50.186124,3.0,55,0.733094,0.874018
140,Michael Keane,Everton,DEF,42.72311,4.0,50,0.624078,0.81845
7,Ainsley Maitland-Niles,Arsenal,DEF,37.773845,15.0,45,0.551781,0.804041
51,Cédric Soares,Southampton,DEF,37.577493,2.0,45,0.548913,0.799862
41,Cheikhou Kouyaté,Crystal Palace,MID,41.065864,8.0,50,0.599869,0.786702
196,Wesley Hoedt,Southampton,DEF,36.710241,6.0,45,0.536245,0.781401


## 2) Positions and Teams

When selecting FPL squads, there are certain criteria that must be adhered to. These criteria are as follows:

- The squad must consist of a total of 15 players, with 11 starters and 4 substitutes.
- Among the 15 players, there must be 2 goalkeepers, 5 defenders, 5 midfielders, and 3 forwards.
- The starting 11 must include at least 1 goalkeeper, 3 defenders, 2 midfielders, and 1 forward.
- There can be a maximum of 3 players from any single team in the squad.
- The total cost of the squad must be below 100 million.
- One player needs to be assigned as the captain, who will receive double points each week.
- One player should be assigned as the vice-captain, who will receive double points if the captain does not play.

Different possitions tend to cost different amounts. Forwards are generally the most expensive, followed by midfielders and then defenders and goal keepers.

In [23]:
predict_player_points_value[predict_player_points_value.pos == 'GK'].value.mean()

47.5

In [24]:
predict_player_points_value[predict_player_points_value.pos == 'DEF'].value.mean()

50.138888888888886

In [25]:
predict_player_points_value[predict_player_points_value.pos == 'MID'].value.mean()

61.845238095238095

In [26]:
predict_player_points_value[predict_player_points_value.pos == 'FWD'].value.mean()

71.52777777777777

However the average predicted point totals do not follow the same pattern.

In [27]:
predict_player_points_value[predict_player_points_value.pos == 'GK'].predicted_points_xg.mean()

21.209913171655277

In [28]:
predict_player_points_value[predict_player_points_value.pos == 'DEF'].predicted_points_xg.mean()

27.424820632466137

In [29]:
predict_player_points_value[predict_player_points_value.pos == 'MID'].predicted_points_xg.mean()

26.738885013681813

In [30]:
predict_player_points_value[predict_player_points_value.pos == 'FWD'].predicted_points_xg.mean()

24.111668601201043

Our squad selection will be based on the following principles:

- We will prioritize selecting the top predicted scorers for each position as our premium picks. These players are expected to contribute significantly to our team's overall score.
- For the substitute goalkeeper position, we will choose the most affordable option available, as this player will almost never be needed.
- When selecting substitute players for the defender, midfielder, and forward positions, we will focus on choosing the best-performing players with a value of 5, 5.5, and 6 or lower respectively. This approach allows us to maximize the value of our squad while maintaining depth in these positions.
- The player with the highest total predicted points for each game will be designated as our captain.

After making the aforementioned selections, we will have filled 8 out of the 15 spots in our squad. The remaining 7 starters will be determined through comparing the remaining selection pool in terms of cost, predicted points and value.

## 3) Selecting Squad

Selecting squads for the beginning of the season is inherently less accurate compared to selecting teams later in the season. This is primarily due to the limited availability of data, which restricts our ability to make accurate predictions. In this case, we rely on the last five games from the previous season as a reference point.

However, there are several reasons why this method may not yield accurate results. Firstly, towards the end of a season, games can often be less meaningful for teams that are not competing for titles, relegation, or European spots. As a result, players may be rested or not give their maximum effort, leading to unpredictable performances.

Additionally, teams undergo significant changes over the summer, including managerial changes, transfers in and out, and other factors that can alter team dynamics and performance. These changes make it challenging to accurately predict player performance based solely on the previous season's data.

To address these issues and obtain a more accurate team, we use our first wildcard relatively quickly. The wildcard allows us to make unlimited transfers within a single game week, enabling us to incorporate updated data and update our team accordingly.

In [31]:
starting_gk = predict_player_points_value[predict_player_points_value.pos == 'GK'].sort_values(by='predicted_points_xg', ascending=False)[0:1]

In [32]:
starting_gk

Unnamed: 0,player,team,pos,predicted_points_xg,squad_number,value,points_score,value_score
81,Hugo Lloris,Tottenham,GK,34.470065,1.0,55,0.503521,0.600315


In [33]:
premium_def = predict_player_points_value[predict_player_points_value.pos == 'DEF'].sort_values(by='predicted_points_xg', ascending=False)[0:1]

In [34]:
premium_def

Unnamed: 0,player,team,pos,predicted_points_xg,squad_number,value,points_score,value_score
185,Séamus Coleman,Everton,DEF,52.593695,23.0,55,0.768262,0.915947


In [35]:
premium_mid = predict_player_points_value[predict_player_points_value.pos == 'MID'].sort_values(by='predicted_points_xg', ascending=False)[0:1]

In [36]:
premium_mid

Unnamed: 0,player,team,pos,predicted_points_xg,squad_number,value,points_score,value_score
201,Xherdan Shaqiri,Liverpool,MID,68.458391,23.0,75,1.000006,0.874309


In [37]:
premium_fwd = predict_player_points_value[predict_player_points_value.pos == 'FWD'].sort_values(by='predicted_points_xg', ascending=False)[0:1]

In [38]:
premium_fwd

Unnamed: 0,player,team,pos,predicted_points_xg,squad_number,value,points_score,value_score
72,Gabriel Jesus,Manchester City,FWD,49.729179,33.0,105,0.726419,0.453651


In [39]:
sub_gk = predict_player_points_value[predict_player_points_value.pos == 'GK'].sort_values(by='predicted_points_xg', ascending=True)[0:1]

In [40]:
sub_gk

Unnamed: 0,player,team,pos,predicted_points_xg,squad_number,value,points_score,value_score
30,Ben Hamer,Huddersfield,GK,0.0,12.0,40,0.0,0.0


In [41]:
sub_def = predict_player_points_value[(predict_player_points_value.pos == 'DEF') & (predict_player_points_value.value <= 50)].sort_values(by='predicted_points_xg', ascending=False)[0:1]

In [42]:
sub_def

Unnamed: 0,player,team,pos,predicted_points_xg,squad_number,value,points_score,value_score
188,Trent Alexander-Arnold,Liverpool,DEF,52.214529,66.0,50,0.762724,1.000278


In [43]:
sub_mid = predict_player_points_value[(predict_player_points_value.pos == 'MID') & (predict_player_points_value.value <= 55)].sort_values(by='predicted_points_xg', ascending=False)[0:1]

In [44]:
sub_mid

Unnamed: 0,player,team,pos,predicted_points_xg,squad_number,value,points_score,value_score
168,Ruben Loftus-Cheek,Chelsea,MID,42.989804,12.0,55,0.627973,0.74869


In [45]:
sub_fwd = predict_player_points_value[(predict_player_points_value.pos == 'FWD') & (predict_player_points_value.value <= 60)].sort_values(by='predicted_points_xg', ascending=False)[0:1]

In [46]:
sub_fwd

Unnamed: 0,player,team,pos,predicted_points_xg,squad_number,value,points_score,value_score
35,Callum Wilson,Bournemouth,FWD,40.601025,13.0,60,0.593079,0.648165


In [47]:
squad = pd.concat([starting_gk, premium_def, premium_mid, premium_fwd, sub_gk, sub_def, sub_mid, sub_fwd])

In [48]:
squad

Unnamed: 0,player,team,pos,predicted_points_xg,squad_number,value,points_score,value_score
81,Hugo Lloris,Tottenham,GK,34.470065,1.0,55,0.503521,0.600315
185,Séamus Coleman,Everton,DEF,52.593695,23.0,55,0.768262,0.915947
201,Xherdan Shaqiri,Liverpool,MID,68.458391,23.0,75,1.000006,0.874309
72,Gabriel Jesus,Manchester City,FWD,49.729179,33.0,105,0.726419,0.453651
30,Ben Hamer,Huddersfield,GK,0.0,12.0,40,0.0,0.0
188,Trent Alexander-Arnold,Liverpool,DEF,52.214529,66.0,50,0.762724,1.000278
168,Ruben Loftus-Cheek,Chelsea,MID,42.989804,12.0,55,0.627973,0.74869
35,Callum Wilson,Bournemouth,FWD,40.601025,13.0,60,0.593079,0.648165


In [49]:
squad.value.sum()

495

In [50]:
budget = 1000 - squad.value.sum()

In [51]:
budget/ 7

72.14285714285714

In [52]:
budget

505

We have 50.5 million to spend on our 7 remaining players at an average of 7.2 million on each player

Adding points and value score to get total score between 0 and 2.

In [53]:
predict_player_points_value['combined_score'] = predict_player_points_value['points_score'] + predict_player_points_value['value_score']

In [54]:
predict_player_points_value.sort_values('combined_score', ascending=False).head(20)

Unnamed: 0,player,team,pos,predicted_points_xg,squad_number,value,points_score,value_score,combined_score
201,Xherdan Shaqiri,Liverpool,MID,68.458391,23.0,75,1.000006,0.874309,1.874315
188,Trent Alexander-Arnold,Liverpool,DEF,52.214529,66.0,50,0.762724,1.000278,1.763002
185,Séamus Coleman,Everton,DEF,52.593695,23.0,55,0.768262,0.915947,1.68421
153,Patrick van Aanholt,Crystal Palace,DEF,50.186124,3.0,55,0.733094,0.874018,1.607112
91,James Tomkins,Crystal Palace,DEF,42.204348,5.0,45,0.6165,0.898347,1.514847
140,Michael Keane,Everton,DEF,42.72311,4.0,50,0.624078,0.81845,1.442528
117,Leighton Baines,Everton,DEF,44.326025,3.0,55,0.647492,0.771961,1.419454
171,Sadio Mané,Liverpool,MID,57.438911,10.0,95,0.839039,0.579138,1.418177
106,Joshua King,Bournemouth,FWD,47.557564,17.0,65,0.694697,0.700819,1.395516
41,Cheikhou Kouyaté,Crystal Palace,MID,41.065864,8.0,50,0.599869,0.786702,1.386572


As we have a good amount of budget left over we can add a few more premium picks to our team

In [55]:
predict_player_points_value.sort_values('points_score', ascending=False).head(25)

Unnamed: 0,player,team,pos,predicted_points_xg,squad_number,value,points_score,value_score,combined_score
201,Xherdan Shaqiri,Liverpool,MID,68.458391,23.0,75,1.000006,0.874309,1.874315
143,Mohamed Salah,Liverpool,MID,61.024615,11.0,130,0.891417,0.449636,1.341053
171,Sadio Mané,Liverpool,MID,57.438911,10.0,95,0.839039,0.579138,1.418177
185,Séamus Coleman,Everton,DEF,52.593695,23.0,55,0.768262,0.915947,1.68421
188,Trent Alexander-Arnold,Liverpool,DEF,52.214529,66.0,50,0.762724,1.000278,1.763002
153,Patrick van Aanholt,Crystal Palace,DEF,50.186124,3.0,55,0.733094,0.874018,1.607112
72,Gabriel Jesus,Manchester City,FWD,49.729179,33.0,105,0.726419,0.453651,1.180069
78,Harry Kane,Tottenham,FWD,49.23973,10.0,125,0.719269,0.377316,1.096585
106,Joshua King,Bournemouth,FWD,47.557564,17.0,65,0.694697,0.700819,1.395516
198,Wilfried Zaha,Crystal Palace,FWD,47.054694,11.0,70,0.687351,0.643879,1.331231


In [56]:
player_1 = predict_player_points_value[predict_player_points_value.player == 'Mohamed Salah']

In [57]:
player_1

Unnamed: 0,player,team,pos,predicted_points_xg,squad_number,value,points_score,value_score,combined_score
143,Mohamed Salah,Liverpool,MID,61.024615,11.0,130,0.891417,0.449636,1.341053


In [58]:
player_2 = predict_player_points_value[predict_player_points_value.player == 'Christian Eriksen']

In [59]:
player_2

Unnamed: 0,player,team,pos,predicted_points_xg,squad_number,value,points_score,value_score,combined_score
46,Christian Eriksen,Tottenham,MID,45.747039,23.0,95,0.66825,0.461253,1.129502


In [60]:
player_3 = predict_player_points_value[predict_player_points_value.player == 'Riyad Mahrez']

In [61]:
player_3

Unnamed: 0,player,team,pos,predicted_points_xg,squad_number,value,points_score,value_score,combined_score
163,Riyad Mahrez,Manchester City,MID,44.886818,26.0,90,0.655684,0.477723,1.133407


In [62]:
squad = pd.concat([squad, player_1, player_2, player_3])

In [63]:
squad

Unnamed: 0,player,team,pos,predicted_points_xg,squad_number,value,points_score,value_score,combined_score
81,Hugo Lloris,Tottenham,GK,34.470065,1.0,55,0.503521,0.600315,
185,Séamus Coleman,Everton,DEF,52.593695,23.0,55,0.768262,0.915947,
201,Xherdan Shaqiri,Liverpool,MID,68.458391,23.0,75,1.000006,0.874309,
72,Gabriel Jesus,Manchester City,FWD,49.729179,33.0,105,0.726419,0.453651,
30,Ben Hamer,Huddersfield,GK,0.0,12.0,40,0.0,0.0,
188,Trent Alexander-Arnold,Liverpool,DEF,52.214529,66.0,50,0.762724,1.000278,
168,Ruben Loftus-Cheek,Chelsea,MID,42.989804,12.0,55,0.627973,0.74869,
35,Callum Wilson,Bournemouth,FWD,40.601025,13.0,60,0.593079,0.648165,
143,Mohamed Salah,Liverpool,MID,61.024615,11.0,130,0.891417,0.449636,1.341053
46,Christian Eriksen,Tottenham,MID,45.747039,23.0,95,0.66825,0.461253,1.129502


In [64]:
budget = 1000 - squad.value.sum()

In [65]:
budget

190

In [66]:
budget / 4

47.5

To complete our squad, we require one more FWD and three more defenders DEF. Since Trent Alexander-Arnold, our substitute player, is predicted to be the second-highest scorer, it is obvious that he should be included in our starting lineup. Consequently, we need to find an affordable alternative to serve as our substitute defender. Additionally, we will start Callum Wilson as one of our forwards, so we need a substitute FWD too.

In [67]:
predict_player_points_value[(predict_player_points_value.pos == 'FWD')].sort_values(by='value', ascending=True).head()

Unnamed: 0,player,team,pos,predicted_points_xg,squad_number,value,points_score,value_score,combined_score
105,Joselu,Newcastle Utd,FWD,11.208263,21.0,50,0.163725,0.214718,0.378442
11,Alexander Sørloth,Crystal Palace,FWD,0.0,9.0,50,0.0,0.0,0.0
123,Lys Mousset,Bournemouth,FWD,13.625995,9.0,50,0.199042,0.261034,0.460076
116,Laurent Depoitre,Huddersfield,FWD,17.478112,20.0,55,0.255311,0.304391,0.559702
57,Danny Ings,Southampton,FWD,16.408032,9.0,55,0.23968,0.285755,0.525435


In [68]:
fwd = predict_player_points_value[predict_player_points_value.player == 'Lys Mousset']

In [69]:
predict_player_points_value[(predict_player_points_value.pos == 'DEF')].sort_values(by='predicted_points_xg', ascending=False).head(15)

Unnamed: 0,player,team,pos,predicted_points_xg,squad_number,value,points_score,value_score,combined_score
185,Séamus Coleman,Everton,DEF,52.593695,23.0,55,0.768262,0.915947,1.68421
188,Trent Alexander-Arnold,Liverpool,DEF,52.214529,66.0,50,0.762724,1.000278,1.763002
153,Patrick van Aanholt,Crystal Palace,DEF,50.186124,3.0,55,0.733094,0.874018,1.607112
117,Leighton Baines,Everton,DEF,44.326025,3.0,55,0.647492,0.771961,1.419454
52,César Azpilicueta,Chelsea,DEF,42.921317,28.0,65,0.626973,0.632498,1.259471
140,Michael Keane,Everton,DEF,42.72311,4.0,50,0.624078,0.81845,1.442528
91,James Tomkins,Crystal Palace,DEF,42.204348,5.0,45,0.6165,0.898347,1.514847
25,Aymeric Laporte,Manchester City,DEF,40.335371,14.0,55,0.589199,0.702462,1.291661
15,Andrew Robertson,Liverpool,DEF,39.883206,26.0,60,0.582594,0.636705,1.219299
122,Luke Shaw,Manchester Utd,DEF,39.63684,23.0,50,0.578995,0.759326,1.338321


In [70]:
def_1 = predict_player_points_value[predict_player_points_value.player == 'Michael Keane']

In [71]:
def_1

Unnamed: 0,player,team,pos,predicted_points_xg,squad_number,value,points_score,value_score,combined_score
140,Michael Keane,Everton,DEF,42.72311,4.0,50,0.624078,0.81845,1.442528


In [72]:
def_2 = predict_player_points_value[predict_player_points_value.player == 'Ainsley Maitland-Niles']

In [73]:
def_2

Unnamed: 0,player,team,pos,predicted_points_xg,squad_number,value,points_score,value_score,combined_score
7,Ainsley Maitland-Niles,Arsenal,DEF,37.773845,15.0,45,0.551781,0.804041,1.355822


In [74]:
def_3 = predict_player_points_value[predict_player_points_value.player == 'James Tomkins']

In [75]:
def_3

Unnamed: 0,player,team,pos,predicted_points_xg,squad_number,value,points_score,value_score,combined_score
91,James Tomkins,Crystal Palace,DEF,42.204348,5.0,45,0.6165,0.898347,1.514847


In [76]:
squad = pd.concat([squad, fwd, def_1, def_2, def_3])

In [77]:
squad[['player', 'team', 'pos', 'value', 'predicted_points_xg']]

Unnamed: 0,player,team,pos,value,predicted_points_xg
81,Hugo Lloris,Tottenham,GK,55,34.470065
185,Séamus Coleman,Everton,DEF,55,52.593695
201,Xherdan Shaqiri,Liverpool,MID,75,68.458391
72,Gabriel Jesus,Manchester City,FWD,105,49.729179
30,Ben Hamer,Huddersfield,GK,40,0.0
188,Trent Alexander-Arnold,Liverpool,DEF,50,52.214529
168,Ruben Loftus-Cheek,Chelsea,MID,55,42.989804
35,Callum Wilson,Bournemouth,FWD,60,40.601025
143,Mohamed Salah,Liverpool,MID,130,61.024615
46,Christian Eriksen,Tottenham,MID,95,45.747039


In [78]:
budget = 1000 - squad.value.sum()

In [79]:
budget

0

Our final squad adheres to all FPL selection requirements, and we have decided to play a 4-4-2 formation. Here is our squad composition:

#### Starting 11:

- Goalkeeper: Hugo Lloris
- Defenders: Séamus Coleman, Trent Alexander-Arnold, Michael Keane, James Tomkins
- Midfielders: Xherdan Shaqiri, Mohamed Salah, Christian Eriksen, Riyad Mahrez
- Forwards: Gabriel Jesus, Callum Wilson

#### Substitutes:

- Substitute Goalkeeper: Ben Hamer
- Substitute Forward: Alexander Sørloth
- Substitute Defender: Ainsley Maitland-Niles
- Substitute Midfielder: Ruben Loftus-Cheek

Other aspects of FPL, such as substitutes and bonus tokens, also require careful consideration, but they will be explored in a future notebook. 

## Results

Simulating past FPL seasons accurately poses several 
challenges, primarily due to the unavailability of injury 
data. In FPL, when a player is injured and unlikely to 
participate in future games, they are flagged with a red 
or yellow header, indicating the severity of the injury 
and a potential return date. This information is crucial 
for making substitutions and utilising bonus cards 
effectively, as it helps in replacing injured or suspended 
players in the team. However, the absence of detailed 
game to game injury data makes it difficult to 
accurately simulate past seasons. Relying solely on the 
"minute played" data point doesn't provide insights 
into the reason behind a player's absence from a 
match. It could be a tactical decision made by the manager, or it could be due to an injury or suspension.
This lack of contextual information makes it hard to 
effectively simulate past seasons.

Despite these challenges, I have simulated three past 
Premier League seasons using the methodology 
outlined in this project (also replacing team and players 
names with random id’s to further prevent bias). The 
results have surpassed my actual point totals, 
indicating the effectiveness of the approach. I plan to 
apply the same methodology to the upcoming FPL 
season (2023/24) and will share the results at the 
conclusion of the season

- Season 2018-19 - Actual final points = 1,891, Simulated final points = 2,032
- Season 2019-20 - Actual final points = 1,919, Simulated final points = 2,243
- Season 2020-21 - Actual final points = 1,700, Simulated final points = 2,098