# FPL Ideal Team

In Fantasy Premier League (FPL), you select a fantasy football squad of 15 players, consisting of:
- 2 Goalkeepers
- 5 Defenders
- 5 Midfielders
- 3 Forwards

Each player has a price, and the total value of your initial squad must not exceed £100 million. You can select no more than 3 players from a single Premier League team.

## FPL Ideal Team Model

In this notebook I created a model that returns three FPL ideal teams:
- A team based on each player's **total_points** scored last season.
- A team based on each player's **PP90min** from last season 
- A team based on each player's **ROI** from last season.

*What is PP90min?*
- **PP90min** tells us how many points a player scores (on average) per game played. The points-per-game (90min) statistic has the advantage of factoring in the breadth of scenarios in which the player is effective. In other words, it tells us how often a player succeeds in scoring points, which is what we care for when selecting players for our team. That is why pp90min is a reliable statistic to base our player selections on.

*What is ROI?*
- **ROI** (Return on Investment) is how many points a player returns for every £1 invested in them. A high ROI means the investment's gains (points scored) compare favourably to its cost. As a performance measure, ROI is used to evaluate the efficiency of an investment.

## Index
* [Data](#data)
* [Statistics](#stats)
    * [Total Points](#total_points)
    * [PP90min](#pp90min)
    * [ROI](#ROI)
* [Ideal Team](#ideal-team)
    * [Ideal Team - Total Points](#ideal_team_total_points)
    * [Ideal Team - PP90min](#ideal_team_PP90min)
    * [Ideal Team - ROI](#ideal_team_ROI)
* [Model Limitations](#model_limitations)

In [184]:
#Import relevant libraries and packages
import pandas as pd
import numpy as np
import seaborn as sns
import plotly.express as px
import os
import sys
from pathlib import Path
from pulp import *

## Data <a class="anchor" id="data"></a>

In [185]:
#Path to data directory
path = Path('/Users/amirgrunhaus/Documents/Data Science Projects/FPL Prediction Tool/fpl_model/Data')

#Import dataset
data = pd.read_csv(path/'training_data.csv', 
                       index_col=0, 
                       dtype={'season':str,
                              'squad':str,
                              'comp':str})

#Select data from last season
data_2122 = data[data['season'] == '2122']

#Player data from this season
players_2223 = pd.read_csv(path/'2022-23/cleaned_players.csv')

#Player data from last season
players_2122 = pd.read_csv(path/'2021-22/cleaned_players.csv')

The original data has one row per player, per gameweek, for each player and gameweek since the 2016-2017 season; however, I only use the data from the 2021-22 PL season.

I create a dataframe with each player, team, position in 2022-23 FPL, cost in 2021-22 FPL, cost in 2022-23 FPL, total minutes played in the 2021-22 PL season, total_points scored in 2021-22 FPL, PP90min in 2021-22 FPL, and their ROI (Return on Investment) in 2021-22 FPL.

**Note: I drop players from relegated teams, as they will not be available this season's FPL. I also drop some players that were transferred to other teams and are not expected to have as many minutes as they had last season.*

In [186]:
#Adjusting data for Heung-Min Son (name values swapped for 21-22 and 22-23)
players_2223['first_name'] = players_2223['first_name'].replace(['Son'],['Heung-Min'])
players_2223['second_name'] = players_2223['second_name'].replace(['Heung-min'],['Son'])

#Adjust 2022-23 players dataframe - get players' full names and keep relevant columns (cost and position)
players_2223['player'] = players_2223['first_name'] + ' ' + players_2223['second_name']
players_2223 = players_2223.set_index('player')
players_2223 = players_2223.drop(['first_name', 'second_name'], axis=1)
players_2223 = players_2223[['now_cost', 'element_type']]

#Adjust 2021-22 dataframe - group data by team and player, and keep relevant columns (total_points and minutes)
data_player = data_2122.groupby(['player', 'team']).sum()
data_player = data_player.reset_index()
data_player = data_player[['player', 'team','total_points', 'minutes']]

#Merge 2021-22 and 2022-23 players dataframes and rename relevant columns
data_with_cost = data_player.merge(players_2223, on = 'player')
data_with_cost = data_with_cost.rename({'element_type': 'position', 'now_cost': 'cost_2223'}, axis=1)
data_with_cost = data_with_cost.astype(str)

#Adjust cost values to represent actual FPL costs
for index, row in data_with_cost.iterrows():
    if (len(row['cost_2223'])) == 3:
        row['cost_2223'] = (row['cost_2223'][:2] + '.' + row['cost_2223'][2:])
    if (len(row['cost_2223'])) == 2:
        row['cost_2223'] = (row['cost_2223'][:1] + '.' + row['cost_2223'][1:])
        
data_with_cost['total_points'] = data_with_cost['total_points'].astype(int)
data_with_cost['cost_2223'] = data_with_cost['cost_2223'].astype(float)

#Adjust 2021-22 players dataframe - get players' full names and keep cost column
players_2122['player'] = players_2122['first_name'] + ' ' + players_2122['second_name']
players_2122 = players_2122.set_index('player')
players_2122 = players_2122.drop(['first_name', 'second_name'], axis=1)
players_2122 = players_2122[['now_cost']]
players_2122 = players_2122.astype(str)

#Adjust cost values to represent actual FPL costs
for index, row in players_2122.iterrows():
    if (len(row['now_cost'])) == 3:
        row['now_cost'] = (row['now_cost'][:2] + '.' + row['now_cost'][2:])
    if (len(row['now_cost'])) == 2:
        row['now_cost'] = (row['now_cost'][:1] + '.' + row['now_cost'][1:])

#Merge 2021-22 and 2022-23 players dataframe with 2021-22 players dataframe, rename and reorder columns
data_with_cost = data_with_cost.merge(players_2122, on = 'player')
data_with_cost = data_with_cost.rename({'now_cost': 'cost_2122'}, axis=1)
data_with_cost = data_with_cost[['player', 'team', 'position', 'cost_2122', 'cost_2223','minutes','total_points']]
data_with_cost['cost_2122'] = data_with_cost['cost_2122'].astype(float)
data_with_cost['minutes'] = data_with_cost['minutes'].astype(int)
#We keep players who played at least half of games
data_with_cost = data_with_cost[data_with_cost['minutes'] >= (38/2)*90]

#Dropping relegated teams and some players
data_with_cost = data_with_cost[data_with_cost['player'] != 'Armando Broja']
data_with_cost = data_with_cost[data_with_cost['player'] != 'Conor Gallagher']
data_with_cost = data_with_cost[data_with_cost['team'] != 'Norwich']
data_with_cost = data_with_cost[data_with_cost['team'] != 'Burnley']
data_with_cost = data_with_cost[data_with_cost['team'] != 'Watford']

#Add PP90min to dataframe, where pp90min is 
data_with_cost['PP90min'] = data_with_cost['total_points']/(data_with_cost['minutes']/90)
data_with_cost

#Add ROI (Return On Investment) to dataframe, where ROI is total points / cost
data_with_cost['ROI'] = data_with_cost['total_points']/data_with_cost['cost_2122']

#Only keep players with ROI values greater than 0
data_with_cost = data_with_cost[data_with_cost['ROI'] > 0]
data_with_cost = data_with_cost.reset_index()
data_with_cost = data_with_cost.drop('index', axis=1)

## Statistics <a class="anchor" id="stats"></a>

Let's take a deeper look at the statistics we are going to use to select our ideal teams.

### Total Points <a class="anchor" id="total_points"></a>

Players sorted by **total_points**:

In [187]:
#Sort table values by total_points
total_points_top_players = data_with_cost.sort_values('total_points', ascending=False)
total_points_top_players = total_points_top_players.reset_index()
total_points_top_players = total_points_top_players.drop('index', axis=1)
total_points_top_players 

Unnamed: 0,player,team,position,cost_2122,cost_2223,minutes,total_points,PP90min,ROI
0,Mohamed Salah,Liverpool,MID,13.1,13.0,2726,259,8.550990,19.770992
1,Heung-Min Son,Tottenham Hotspur,MID,11.2,12.0,2919,246,7.584789,21.964286
2,Trent Alexander-Arnold,Liverpool,DEF,8.4,7.5,2763,206,6.710098,24.523810
3,Jarrod Bowen,West Ham United,MID,6.9,8.5,2897,204,6.337591,29.565217
4,Kevin De Bruyne,Manchester City,MID,12.1,12.1,2106,190,8.119658,15.702479
...,...,...,...,...,...,...,...,...,...
147,Luke Thomas,Leicester City,DEF,4.3,4.5,1911,50,2.354788,11.627907
148,Ben Godfrey,Everton,DEF,4.7,4.4,2029,45,1.996057,9.574468
149,Jamaal Lascelles,Newcastle United,DEF,4.4,4.4,1965,45,2.061069,10.227273
150,Aaron Wan-Bissaka,Manchester United,DEF,5.1,4.4,1793,41,2.058003,8.039216


Average **total_points** per team:

In [188]:
#Average total_points per team
total_points_team_averages = total_points_top_players.groupby('team').mean().sort_values('total_points', ascending=False)['total_points'].to_frame()
total_points_team_averages

Unnamed: 0_level_0,total_points
team,Unnamed: 1_level_1
Liverpool,170.75
Manchester City,131.4
Chelsea,122.625
Tottenham Hotspur,121.5
West Ham United,115.2
Arsenal,112.857143
Wolverhampton Wanderers,106.571429
Aston Villa,104.25
Brentford,103.888889
Crystal Palace,99.333333


Average **total_points** per position:

In [189]:
#Average total_points per position
total_points_position_averages = total_points_top_players.groupby('position').mean().sort_values('total_points', ascending=False)['total_points'].to_frame()
total_points_position_averages

Unnamed: 0_level_0,total_points
position,Unnamed: 1_level_1
FWD,123.357143
GK,122.866667
MID,109.111111
DEF,93.65


### PP90min - Points per 90min <a class="anchor" id="pp90min"></a>

Players sorted by **PP90min**:

In [190]:
#Sort table values by total_points
PP90min_top_players = data_with_cost.sort_values('PP90min', ascending=False)
PP90min_top_players  = PP90min_top_players.reset_index()
PP90min_top_players = PP90min_top_players.drop('index', axis=1)
PP90min_top_players

Unnamed: 0,player,team,position,cost_2122,cost_2223,minutes,total_points,PP90min,ROI
0,Mohamed Salah,Liverpool,MID,13.1,13.0,2726,259,8.550990,19.770992
1,Kevin De Bruyne,Manchester City,MID,12.1,12.1,2106,190,8.119658,15.702479
2,Heung-Min Son,Tottenham Hotspur,MID,11.2,12.0,2919,246,7.584789,21.964286
3,Raheem Sterling,Manchester City,MID,10.5,10.0,2087,159,6.856732,15.142857
4,Reece James,Chelsea,DEF,6.5,6.0,1773,134,6.802030,20.615385
...,...,...,...,...,...,...,...,...,...
147,Jamaal Lascelles,Newcastle United,DEF,4.4,4.4,1965,45,2.061069,10.227273
148,Aaron Wan-Bissaka,Manchester United,DEF,5.1,4.4,1793,41,2.058003,8.039216
149,Ben Godfrey,Everton,DEF,4.7,4.4,2029,45,1.996057,9.574468
150,Mohammed Salisu,Southampton,DEF,4.5,4.4,2881,58,1.811871,12.888889


Average **PP90min** per team:

In [191]:
#Average total_points per team
PP90min_team_averages = total_points_top_players.groupby('team').mean().sort_values('PP90min', ascending=False)['PP90min'].to_frame()
PP90min_team_averages

Unnamed: 0_level_0,PP90min
team,Unnamed: 1_level_1
Liverpool,5.649015
Manchester City,5.403538
Chelsea,5.089669
Arsenal,4.267138
Tottenham Hotspur,4.216991
Leicester City,3.946338
West Ham United,3.885066
Aston Villa,3.844629
Crystal Palace,3.611259
Brentford,3.608484


Average **PP90min** per position:

In [192]:
#Average total_points per position
PP90min_position_averages = total_points_top_players.groupby('position').mean().sort_values('PP90min', ascending=False)['PP90min'].to_frame()
PP90min_position_averages

Unnamed: 0_level_0,PP90min
position,Unnamed: 1_level_1
FWD,4.762793
MID,4.199218
GK,3.78322
DEF,3.483106


### ROI - Return on Investment <a class="anchor" id="ROI"></a>

Players sorted by **ROI**:

In [193]:
#Sort table values by ROI
ROI_top_players = data_with_cost.sort_values('ROI', ascending=False)
ROI_top_players = ROI_top_players.reset_index()
ROI_top_players = ROI_top_players.drop('index', axis=1)
ROI_top_players

Unnamed: 0,player,team,position,cost_2122,cost_2223,minutes,total_points,PP90min,ROI
0,Joel Matip,Liverpool,DEF,5.3,6.0,2700,163,5.433333,30.754717
1,Jarrod Bowen,West Ham United,MID,6.9,8.5,2897,204,6.337591,29.565217
2,Conor Coady,Wolverhampton Wanderers,DEF,4.7,4.9,3271,137,3.769489,29.148936
3,Alisson Ramses Becker,Liverpool,GK,6.1,5.5,3150,173,4.942857,28.360656
4,José Malheiro de Sá,Wolverhampton Wanderers,GK,5.2,5.0,3240,145,4.027778,27.884615
...,...,...,...,...,...,...,...,...,...
147,Jamaal Lascelles,Newcastle United,DEF,4.4,4.4,1965,45,2.061069,10.227273
148,Jadon Sancho,Manchester United,MID,8.9,7.4,1894,91,4.324182,10.224719
149,Ben Godfrey,Everton,DEF,4.7,4.4,2029,45,1.996057,9.574468
150,Aaron Wan-Bissaka,Manchester United,DEF,5.1,4.4,1793,41,2.058003,8.039216


Average **ROI** per team:

In [194]:
#Average ROI per team
ROI_team_averages = ROI_top_players.groupby('team').mean().sort_values('ROI', ascending=False)['ROI'].to_frame()
ROI_team_averages

Unnamed: 0_level_0,ROI
team,Unnamed: 1_level_1
Liverpool,24.321537
Brentford,21.233736
Tottenham Hotspur,21.212107
Wolverhampton Wanderers,21.177581
Arsenal,20.55124
West Ham United,20.239105
Crystal Palace,19.916084
Chelsea,19.458348
Brighton and Hove Albion,19.091665
Aston Villa,18.681509


Average **ROI** per position:

In [195]:
#Average ROI per position
ROI_position_averages = ROI_top_players.groupby('position').mean().sort_values('ROI', ascending=False)['ROI'].to_frame()
ROI_position_averages

Unnamed: 0_level_0,ROI
position,Unnamed: 1_level_1
GK,24.033653
DEF,18.873236
MID,17.923447
FWD,15.603448


## Ideal Team <a class="anchor" id="ideal_team"></a>

### Ideal Team - Total Points <a class="anchor" id="ideal_team_total_points"></a>

The algorithm below returns an ideal team according to each player's **total_points** scored last season. The team fulfills the position requirements (2 goalkeepers, 5 defenders, 5 midfielders, and 3 forwards), the team requirements (no more than 3 players per team), and the budget requirements (squad value must not exceed £100 million).

In [202]:
positions = total_points_top_players.position.unique()
clubs = total_points_top_players.team.unique()
budget = 100
available_roles = {
    'GK': 2,
    'DEF': 5,
    'MID': 5,
    'FWD': 3    
}

names = [total_points_top_players.player[i] for i in total_points_top_players.index]
teams = [total_points_top_players.team[i] for i in total_points_top_players.index]
roles = [total_points_top_players.position[i] for i in total_points_top_players.index]
costs = [total_points_top_players.cost_2223[i] for i in total_points_top_players.index]
points = [total_points_top_players.total_points[i] for i in total_points_top_players.index]
players = [LpVariable("player_" + str(i), cat="Binary") for i in total_points_top_players.index]
prob = LpProblem("Fantasy Ideal Team (total_points)", LpMaximize)

#Maximize points
prob += lpSum(players[i] * points[i] for i in range(len(total_points_top_players)))
#Budget constraint
prob += lpSum(players[i] * total_points_top_players.cost_2223[total_points_top_players.index[i]] for i in range(len(total_points_top_players))) <= budget

for pos in positions:
    prob += lpSum(players[i] for i in range(len(total_points_top_players)) if roles[i] == pos) <= available_roles[pos]
#Max 3 per team constraint
for club in clubs:
    prob += lpSum(players[i] for i in range(len(total_points_top_players)) if teams[i] == club) <= 3
prob.solve()
df_list = []
for variable in prob.variables():
    if variable.varValue != 0:
        name = total_points_top_players.player[int(variable.name.split("_")[1])]
        club = total_points_top_players.team[int(variable.name.split("_")[1])]
        role = total_points_top_players.position[int(variable.name.split("_")[1])]
        points = total_points_top_players.total_points[int(variable.name.split("_")[1])]
        cost = total_points_top_players.cost_2223[int(variable.name.split("_")[1])]
        df_list.append((name, club, role, points, cost))
    

# Dataframe with name, club, position, points, cost
ideal_team_total_points = pd.DataFrame(df_list, columns = ['player', 'team', 'position', 'total_points', 'cost_2223'])

Welcome to the CBC MILP Solver 
Version: 2.10.3 
Build Date: Dec 15 2019 

command line - /Users/amirgrunhaus/opt/miniconda3/lib/python3.9/site-packages/pulp/apis/../solverdir/cbc/osx/64/cbc /var/folders/mh/djnbz20x6yx5_43kjtkrmxmh0000gn/T/f0870d068a14454da1477d3ede0d68ae-pulp.mps max timeMode elapsed branch printingOptions all solution /var/folders/mh/djnbz20x6yx5_43kjtkrmxmh0000gn/T/f0870d068a14454da1477d3ede0d68ae-pulp.sol (default strategy 1)
At line 2 NAME          MODEL
At line 3 ROWS
At line 27 COLUMNS
At line 940 RHS
At line 963 BOUNDS
At line 1116 ENDATA
Problem MODEL has 22 rows, 152 columns and 456 elements
Coin0008I MODEL read with 0 errors
Option for timeMode changed from cpu to elapsed
Continuous objective value is 2404.65 - 0.00 seconds
Cgl0004I processed model has 22 rows, 150 columns (150 integer (148 of which binary)) and 450 elements
Cutoff increment increased from 1e-05 to 0.9999
Cbc0038I Initial state - 1 integers unsatisfied sum - 0.243243
Cbc0038I Pass   1: sumin



f which 0 were active after adding rounds of cuts (0.000 seconds)
ZeroHalf was tried 1 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)

Result - Optimal solution found

Objective value:                2395.00000000
Enumerated nodes:               8
Total iterations:               54
Time (CPU seconds):             0.02
Time (Wallclock seconds):       0.03

Option for printingOptions changed from normal to all
Total time (CPU seconds):       0.02   (Wallclock seconds):       0.04



In [203]:
ideal_team_total_points.position = pd.Categorical(ideal_team_total_points.position, categories=['GK', 'DEF', 'MID', 'FWD'])
ideal_team_total_points = ideal_team_total_points.sort_values('position')
ideal_team_total_points = ideal_team_total_points.reset_index().drop('index', axis=1)
ideal_team_total_points

Unnamed: 0,player,team,position,total_points,cost_2223
0,José Malheiro de Sá,Wolverhampton Wanderers,GK,145,5.0
1,Alisson Ramses Becker,Liverpool,GK,173,5.5
2,Aymeric Laporte,Manchester City,DEF,159,5.9
3,Trent Alexander-Arnold,Liverpool,DEF,206,7.5
4,Conor Coady,Wolverhampton Wanderers,DEF,137,4.9
5,Marc Guéhi,Crystal Palace,DEF,123,4.5
6,Virgil van Dijk,Liverpool,DEF,183,6.5
7,Heung-Min Son,Tottenham Hotspur,MID,246,12.0
8,James Ward-Prowse,Southampton,MID,152,6.5
9,Leandro Trossard,Brighton and Hove Albion,MID,141,6.5


![total_points_ideal_team_fantasy.png](attachment:total_points_ideal_team_fantasy.png)

### Ideal Team - PP90min <a class="anchor" id="ideal_team_PP90min"></a>

The algorithm below returns an ideal team according to each player's **PP90min** from last season. The team fulfills the position requirements (2 goalkeepers, 5 defenders, 5 midfielders, and 3 forwards), the team requirements (no more than 3 players per team), and the budget requirements (squad value must not exceed £100 million).

In [204]:
positions = PP90min_top_players.position.unique()
clubs = PP90min_top_players.team.unique()
budget = 100
available_roles = {
    'GK': 2,
    'DEF': 5,
    'MID': 5,
    'FWD': 3    
}

names = [PP90min_top_players.player[i] for i in PP90min_top_players.index]
teams = [PP90min_top_players.team[i] for i in PP90min_top_players.index]
roles = [PP90min_top_players.position[i] for i in PP90min_top_players.index]
costs = [PP90min_top_players.cost_2223[i] for i in PP90min_top_players.index]
PP90min = [PP90min_top_players.PP90min[i] for i in PP90min_top_players.index]
players = [LpVariable("player_" + str(i), cat="Binary") for i in PP90min_top_players.index]
prob = LpProblem("Fantasy Ideal Team (total_points)", LpMaximize)

#Maximize points
prob += lpSum(players[i] * PP90min[i] for i in range(len(PP90min_top_players)))
#Budget constraint
prob += lpSum(players[i] * PP90min_top_players.cost_2223[PP90min_top_players.index[i]] for i in range(len(PP90min_top_players))) <= budget

for pos in positions:
    prob += lpSum(players[i] for i in range(len(PP90min_top_players)) if roles[i] == pos) <= available_roles[pos]
#Max 3 per team constraint
for club in clubs:
    prob += lpSum(players[i] for i in range(len(PP90min_top_players)) if teams[i] == club) <= 3
prob.solve()
df_list = []
for variable in prob.variables():
    if variable.varValue != 0:
        name = PP90min_top_players.player[int(variable.name.split("_")[1])]
        club = PP90min_top_players.team[int(variable.name.split("_")[1])]
        role = PP90min_top_players.position[int(variable.name.split("_")[1])]
        PP90min = PP90min_top_players.PP90min[int(variable.name.split("_")[1])]
        cost = PP90min_top_players.cost_2223[int(variable.name.split("_")[1])]
        df_list.append((name, club, role, PP90min, cost))
    

# Dataframe with name, club, position, points, cost
ideal_team_PP90min = pd.DataFrame(df_list, columns = ['player', 'team', 'position', 'PP90min', 'cost_2223'])

Welcome to the CBC MILP Solver 
Version: 2.10.3 
Build Date: Dec 15 2019 

command line - /Users/amirgrunhaus/opt/miniconda3/lib/python3.9/site-packages/pulp/apis/../solverdir/cbc/osx/64/cbc /var/folders/mh/djnbz20x6yx5_43kjtkrmxmh0000gn/T/2f0f95a07f514f269f1adc7ba12e27fc-pulp.mps max timeMode elapsed branch printingOptions all solution /var/folders/mh/djnbz20x6yx5_43kjtkrmxmh0000gn/T/2f0f95a07f514f269f1adc7ba12e27fc-pulp.sol (default strategy 1)
At line 2 NAME          MODEL
At line 3 ROWS
At line 27 COLUMNS
At line 940 RHS
At line 963 BOUNDS
At line 1116 ENDATA
Problem MODEL has 22 rows, 152 columns and 456 elements
Coin0008I MODEL read with 0 errors
Option for timeMode changed from cpu to elapsed
Continuous objective value is 87.36 - 0.00 seconds
Cgl0004I processed model has 22 rows, 150 columns (150 integer (148 of which binary)) and 450 elements
Cbc0038I Initial state - 2 integers unsatisfied sum - 0.153846
Cbc0038I Solution found of -87.1846
Cbc0038I Cleaned solution of -87.1846




In [205]:
ideal_team_PP90min.position = pd.Categorical(ideal_team_PP90min.position, categories=['GK', 'DEF', 'MID', 'FWD'])
ideal_team_PP90min = ideal_team_PP90min.sort_values('position')
ideal_team_PP90min = ideal_team_PP90min.reset_index().drop('index', axis=1)
ideal_team_PP90min

Unnamed: 0,player,team,position,PP90min,cost_2223
0,Alisson Ramses Becker,Liverpool,GK,4.942857,5.5
1,David Raya Martin,Brentford,GK,4.086957,4.5
2,Kieran Tierney,Arsenal,DEF,4.979123,4.9
3,Sergio Reguilón,Tottenham Hotspur,DEF,4.890282,4.4
4,Reece James,Chelsea,DEF,6.80203,6.0
5,Trent Alexander-Arnold,Liverpool,DEF,6.710098,7.5
6,Andrew Robertson,Liverpool,DEF,6.510012,7.0
7,Emile Smith Rowe,Arsenal,MID,5.893138,5.9
8,Harvey Barnes,Leicester City,MID,5.839525,6.9
9,Saïd Benrahma,West Ham United,MID,5.744681,6.0


![PP90min_ideal_team_fantasy.png](attachment:PP90min_ideal_team_fantasy.png)

### Ideal Team - ROI <a class="anchor" id="ideal_team_ROI"></a>

The algorithm below returns an ideal team according to each player's **ROI** from last season. The team fulfills the position requirements (2 goalkeepers, 5 defenders, 5 midfielders, and 3 forwards), the team requirements (no more than 3 players per team), and the budget requirements (squad value must not exceed £100 million).

In [206]:
positions = ROI_top_players.position.unique()
clubs = ROI_top_players.team.unique()
budget = 100
available_roles = {
    'GK': 2,
    'DEF': 5,
    'MID': 5,
    'FWD': 3    
}

names = [ROI_top_players.player[i] for i in ROI_top_players.index]
teams = [ROI_top_players.team[i] for i in ROI_top_players.index]
roles = [ROI_top_players.position[i] for i in ROI_top_players.index]
costs = [ROI_top_players.cost_2223[i] for i in ROI_top_players.index]
PP90min = [ROI_top_players.ROI[i] for i in ROI_top_players.index]
players = [LpVariable("player_" + str(i), cat="Binary") for i in ROI_top_players.index]
prob = LpProblem("Fantasy Ideal Team (total_points)", LpMaximize)

#Maximize points
prob += lpSum(players[i] * PP90min[i] for i in range(len(ROI_top_players)))
#Budget constraint
prob += lpSum(players[i] * ROI_top_players.cost_2223[ROI_top_players.index[i]] for i in range(len(ROI_top_players))) <= budget

for pos in positions:
    prob += lpSum(players[i] for i in range(len(ROI_top_players)) if roles[i] == pos) <= available_roles[pos]
#Max 3 per team constraint
for club in clubs:
    prob += lpSum(players[i] for i in range(len(ROI_top_players)) if teams[i] == club) <= 3
prob.solve()
df_list = []
for variable in prob.variables():
    if variable.varValue != 0:
        name = ROI_top_players.player[int(variable.name.split("_")[1])]
        club = ROI_top_players.team[int(variable.name.split("_")[1])]
        role = ROI_top_players.position[int(variable.name.split("_")[1])]
        PP90min = ROI_top_players.PP90min[int(variable.name.split("_")[1])]
        cost = ROI_top_players.cost_2223[int(variable.name.split("_")[1])]
        df_list.append((name, club, role, PP90min, cost))
    

# Dataframe with name, club, position, points, cost
ideal_team_ROI = pd.DataFrame(df_list, columns = ['player', 'team', 'position', 'ROI', 'cost_2223'])

Welcome to the CBC MILP Solver 
Version: 2.10.3 
Build Date: Dec 15 2019 

command line - /Users/amirgrunhaus/opt/miniconda3/lib/python3.9/site-packages/pulp/apis/../solverdir/cbc/osx/64/cbc /var/folders/mh/djnbz20x6yx5_43kjtkrmxmh0000gn/T/633c397b1b8e423dac2287acf8e32255-pulp.mps max timeMode elapsed branch printingOptions all solution /var/folders/mh/djnbz20x6yx5_43kjtkrmxmh0000gn/T/633c397b1b8e423dac2287acf8e32255-pulp.sol (default strategy 1)
At line 2 NAME          MODEL
At line 3 ROWS
At line 27 COLUMNS
At line 940 RHS
At line 963 BOUNDS
At line 1116 ENDATA
Problem MODEL has 22 rows, 152 columns and 456 elements
Coin0008I MODEL read with 0 errors
Option for timeMode changed from cpu to elapsed
Continuous objective value is 385.534 - 0.00 seconds
Cgl0004I processed model has 22 rows, 152 columns (152 integer (152 of which binary)) and 456 elements
Cbc0038I Initial state - 0 integers unsatisfied sum - 0
Cbc0038I Solution found of -385.534
Cbc0038I Before mini branch and bound, 152 



In [207]:
ideal_team_ROI.position = pd.Categorical(ideal_team_ROI.position, categories=['GK', 'DEF', 'MID', 'FWD'])
ideal_team_ROI = ideal_team_ROI.sort_values('position')
ideal_team_ROI = ideal_team_ROI.reset_index().drop('index', axis=1)
ideal_team_ROI

Unnamed: 0,player,team,position,ROI,cost_2223
0,Alisson Ramses Becker,Liverpool,GK,4.942857,5.5
1,José Malheiro de Sá,Wolverhampton Wanderers,GK,4.027778,5.0
2,Joel Matip,Liverpool,DEF,5.433333,6.0
3,Conor Coady,Wolverhampton Wanderers,DEF,3.769489,4.9
4,Marc Guéhi,Crystal Palace,DEF,3.435754,4.5
5,Virgil van Dijk,Liverpool,DEF,5.382353,6.5
6,Pontus Jansson,Brentford,DEF,3.370474,4.5
7,Jarrod Bowen,West Ham United,MID,6.337591,8.5
8,Bukayo Saka,Arsenal,MID,5.398139,8.0
9,James Maddison,Leicester City,MID,6.371681,8.0


![ROI_ideal_team.png](attachment:ROI_ideal_team.png)

## Model Limitations <a class="anchor" id="model_limitations"></a>

While this model does help us optimize the FPL squad selection process, it also has some limitations.

1. **Past performance is not indicative of future results**: Although we are basing our player selections solely on last season's performance, it might not be indicative of this season's expected performance.

2. **Summer transfers**: The model will not select players who were recently transferred-in to the premier league.

3. **Star players**: As you can see, the ideal team algorithm strictly follows the optimization parameters and budget constraint, which leaves our teams without star players like Mohamed Salah or Kevin de Bruyne (which we might want to have in our teams considering their historical performance, even if their cost is high).