# FPL Gameweek Player Predictions

Managers in Fantasy Premier League (FPL) earn points from their players for a number of actions. These include goals, assists, clean sheets and saves. They can also earn additional bonus points if they are among the top-performing players in the Bonus Points System (BPS) in any given match.

You can look at a detailed breakdown of the scoring system [here.](https://fantasy.premierleague.com/help/rules)

## FPL Points Prediction Model

In this notebook I created a model that predicts how many points a player will score for a specific gameweek during the 22-23 PL Season. I set up a Random Forest Model using the Scikit-Learn Python library. 

## Index
* [Data](#data)
* [Lags](#lags)
* [Model](#model)
    * [Training Data](#training_data)
    * [Random Forest](#random_forest)
    * [Accuracy](#accuracy)
* [Predictions](#predictions)
    * [Predictions - Gameweek 17](#predictions_gw17)
    * [Predictions - Gameweek 19](#predictions_gw19)
    * [Predictions - Gameweek 20](#predictions_gw20)
    * [Predictions - Gameweek 22](#predictions_gw22)
* [Ideal Teams](#ideal_team)
    * [Ideal Team - Gameweek 17 (No Budget Constraint)](#ideal_team_gw17_no_budget)
    * [Ideal Team - Gameweek 17 (Budget Constraint)](#ideal_team_gw17_budget)
    * [Ideal Team - Gameweek 19 (No Budget Constraint)](#ideal_team_gw19_no_budget)
    * [Ideal Team - Gameweek 19 (Budget Constraint)](#ideal_team_gw19_budget)
    * [Ideal Team - Gameweek 20 (No Budget Constraint)](#ideal_team_gw20_no_budget)
    * [Ideal Team - Gameweek 20 (Budget Constraint)](#ideal_team_gw20_budget)
    * [Ideal Team - Gameweek 22 (No Budget Constraint)](#ideal_team_gw22_no_budget)
    * [Ideal Team - Gameweek 22 (Budget Constraint)](#ideal_team_gw22_budget)
    

In [2]:
#Import relevant libraries and packages
import pandas as pd
import numpy as np
import os
import sys
import plotly.express as px
from pathlib import Path
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
from pulp import *

## Data <a class="anchor" id="data"></a>

In [3]:
#Paths
path = Path('Data')
path_22_23 = Path('Data/2022-23')

#Import datasets
data = pd.read_csv(path/'training_data_updated.csv', 
                       index_col=0, 
                       dtype={'season':str,
                              'squad':str,
                              'comp':str})
season_gws = pd.read_csv(path/'remaining_season.csv', index_col=0)
players_raw = pd.read_csv(path_22_23/'players_raw.csv')
player_stats_2223 = pd.read_csv(path_22_23/'gws/merged_gw.csv')
team_standard_stats_2223 = pd.read_csv(path_22_23/'team_standard_stats_2223.csv')
player_standard_stats_2223 = pd.read_csv(path_22_23/'player_standard_stats_2223.csv')
teams = pd.read_csv(path_22_23/'teams.csv')
cleaned_players = pd.read_csv(path_22_23/'cleaned_players.csv')

#Reset index and drop duplicates (just in case)
data = data.reset_index()
data = data.drop_duplicates()

The data has one row per player, per gameweek, for each player and gameweek since the 2020-2021 season. Each row contains information and statistics for each player and gameweek. The dataframe's columns are:

In [4]:
#Data info
data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 61482 entries, 0 to 61481
Data columns (total 31 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   player           61482 non-null  object 
 1   position         61482 non-null  int64  
 2   gw               61482 non-null  int64  
 3   team             61482 non-null  object 
 4   opponent_team    61482 non-null  object 
 5   was_home         61482 non-null  bool   
 6   season           61482 non-null  object 
 7   minutes          61482 non-null  int64  
 8   total_points     61482 non-null  int64  
 9   assists          61482 non-null  int64  
 10  bonus            61482 non-null  int64  
 11  bps              61482 non-null  int64  
 12  clean_sheets     61482 non-null  int64  
 13  creativity       61482 non-null  float64
 14  goals_conceded   61482 non-null  int64  
 15  goals_scored     61482 non-null  int64  
 16  ict_index        61482 non-null  float64
 17  influence   

In [5]:
def position_assignment_int_to_string(data):
    if data['position'] == 1:
        return 'GK'
    if data['position'] == 2:
        return 'DEF'
    if data['position'] == 3:
        return 'MID'
    if data['position'] == 4:
        return 'FWD'

data['position'] = data.apply(position_assignment_int_to_string, axis = 1)

We add a column to the dataframe called 'fixture difficulty rating' (FDR). The FDR creates a value that offers a perceived fixture difficulty for each team when facing another team. These values are then simplified into ratings from 1 to 5, with 5 being the highest difficulty value.

FPL develops FDR based on a complex algorithm that analyses the performance statistics for each team across their home and away matches. It then combines this data with each team's home and away form over the past six fixtures. 

In FPL, FDR can change from week to week. In other words, Team X might have an FDR of 3 this gameweek, and an FDR of 4 next gameweek. For the sake of simplicity and to maintain consistency across the time-series data, I will assign a constant FDR for each team based on historical PL standings and historical FDRs (dating back to the 2017-18 season). These are my FDR assignments and the logic behind them:

**FDR = 5 &rarr; Manchester City and Liverpool**
- Since the 2017-18 season, Manchester City and Liverpool are the highest-achieving and most consistent teams. They are arguably the most difficult teams to play against and have ended all seasons in the top-4.

**FDR = 4 &rarr; Arsenal, Chelsea, Manchester United, Tottenham Hotspur**
- These four teams, along with Man City and Liverpool are considered the PL's "big six", and thus the most difficult teams to play against. I didn't assign these teams an FDR of 5 because they haven't been as consistent as Man City and Liverpool and haven't made as many points as them since the 2017-18 season.

**FDR = 3 &rarr; Brighton, Crystal Palace, Everton, Leicester City, Newcaste United, West Ham United, Wolves**
- These teams are considered "mid-table teams". Although consistency and regularity across these teams varies, and some of them are arguably more difficult to play against than others, it makes sense to group them under the same FDR rating due to their historic standings (and similar consistency) since the 2017-18 season. 

**FDR = 2 &rarr; Aston Villa, Brentford, Burnley, Leeds, Norwich, Southampton, Watford, Hull City, Middlesbrough, Bournemouth, Sunderland, Swansea, West Brom, Stoke City, Huddersfield, Fulham, Cardiff City, Sheffield United, Nottingham Forest**
- All of these teams (with the exception of Southampton) have been relegated at least once in the past 5 seasons, and during their time in the Premier League have struggled to make it out of the relegation zone or past the 10th standing. The reason I grouped Southampton with the rest of the teams here is because it is the only team that despite not having been relegated, hasn't finished a season above the 11th position (since the 2017-18 season). 

**FDR = 1 &rarr; NONE**
- I didn't assign a score of 1 to any of the teams because FPL rarely gives an FDR of 1 to any fixture.

In [6]:
#Function to add fdr (fixture difficulty rating) to dataframe
def fdr_assignment(data):
    if data['opponent_team'] == 'Arsenal':
        return 4
    if data['opponent_team'] == 'Aston Villa':
        return 2
    if data['opponent_team'] == 'Brentford':
        return 2
    if data['opponent_team'] == 'Brighton':
        return 3
    if data['opponent_team'] == 'Burnley':
        return 2
    if data['opponent_team'] == 'Chelsea':
        return 4
    if data['opponent_team'] == 'Crystal Palace':
        return 3
    if data['opponent_team'] == 'Everton':
        return 3
    if data['opponent_team'] == 'Leeds':
        return 2
    if data['opponent_team'] == 'Leicester City':
        return 3
    if data['opponent_team'] == 'Liverpool':
        return 5
    if data['opponent_team'] == 'Manchester City':
        return 5
    if data['opponent_team'] == 'Manchester Utd':
        return 4
    if data['opponent_team'] == 'Newcastle Utd':
        return 3
    if data['opponent_team'] == 'Norwich':
        return 2
    if data['opponent_team'] == 'Southampton':
        return 2
    if data['opponent_team'] == 'Tottenham':
        return 4
    if data['opponent_team'] == 'Watford':
        return 2
    if data['opponent_team'] == 'West Ham':
        return 3
    if data['opponent_team'] == 'Wolves':
        return 3
    if data['opponent_team'] == 'Hull City':
        return 2
    if data['opponent_team'] == 'Middlesbrough':
        return 2
    if data['opponent_team'] == 'Bournemouth':
        return 2
    if data['opponent_team'] == 'Sunderland':
        return 2
    if data['opponent_team'] == 'Swansea City':
        return 2
    if data['opponent_team'] == 'West Brom':
        return 2
    if data['opponent_team'] == 'Stoke City':
        return 2
    if data['opponent_team'] == 'Huddersfield Town':
        return 2
    if data['opponent_team'] == 'Fulham':
        return 2
    if data['opponent_team'] == 'Cardiff City':
        return 2
    if data['opponent_team'] == 'Sheffield Utd':
        return 2
    if data['opponent_team'] == 'Nottingham Forest':
        return 2
    
data['fdr'] = data.apply(fdr_assignment, axis = 1)

We add a column to the dataframe called 'team_won' that specifies whether the player's team won that game or not (1 if team won, 0 if otherwise).

In [7]:
def team_won(data):
    if data['was_home'] == True and data['team_h_score'] > data['team_a_score']:
        return 1
    if data['was_home'] == False and data['team_h_score'] < data['team_a_score']:
        return 1
    if data['was_home'] == True and data['team_h_score'] < data['team_a_score']:
        return 0
    if data['was_home'] == False and data['team_h_score'] > data['team_a_score']:
        return 0
    else:
        return 0
        
data['team_won'] = data.apply(team_won, axis = 1)

We add a column to the dataframe called 'team_mv' that assigns each team a market value. The market value info was scraped from [transfermarkt](https://www.transfermarkt.com/premier-league/startseite/wettbewerb/GB1).

*Note: for teams that are not currently in the PL, we assign them the market value in their last season they were in the PL. We also assign the latest available market value to each team for consistency (regardelss of season).

In [8]:
def team_market_value(data):
    if data['team'] == 'Arsenal':
        return 671.5
    if data['team'] == 'Aston Villa':
        return 499.6
    if data['team'] == 'Brentford':
        return 292.65
    if data['team'] == 'Brighton':
        return 264.7
    if data['team'] == 'Burnley':
        return 138.05
    if data['team'] == 'Chelsea':
        return 823.7
    if data['team'] == 'Crystal Palace':
        return 268.80
    if data['team'] == 'Everton':
        return 415.95
    if data['team'] == 'Leeds':
        return 275.30
    if data['team'] == 'Leicester City':
        return 508.30
    if data['team'] == 'Liverpool':
        return 870
    if data['team'] == 'Manchester City':
        return 1010
    if data['team'] == 'Manchester Utd':
        return 708.8
    if data['team'] == 'Newcastle Utd':
        return 333.6
    if data['team'] == 'Norwich':
        return 163.2
    if data['team'] == 'Southampton':
        return 271.45
    if data['team'] == 'Tottenham':
        return 727.3
    if data['team'] == 'Watford':
        return 156.2
    if data['team'] == 'West Ham':
        return 384
    if data['team'] == 'Wolves':
        return 385.95
    if data['team'] == 'Hull City':
        return 135.85
    if data['team'] == 'Middlesbrough':
        return 128.8
    if data['team'] == 'Bournemouth':
        return 160.4
    if data['team'] == 'Sunderland':
        return 132
    if data['team'] == 'Swansea City':
        return 165.49
    if data['team'] == 'West Brom':
        return 141.15
    if data['team'] == 'Stoke City':
        return 192.45
    if data['team'] == 'Huddersfield Town':
        return 137.45
    if data['team'] == 'Fulham':
        return 202.5
    if data['team'] == 'Cardiff City':
        return 113
    if data['team'] == 'Sheffield Utd':
        return 148.85
    if data['team'] == 'Nottingham Forest':
        return 189.8
            
data['team_mv'] = data.apply(team_market_value, axis = 1)

We do the same as above (adding a market value column), but this time for the opponent_team. We call the column 'opponent_team_mv'.

In [9]:
def opponent_team_market_value(data):
    if data['opponent_team'] == 'Arsenal':
        return 671.5
    if data['opponent_team'] == 'Aston Villa':
        return 499.6
    if data['opponent_team'] == 'Brentford':
        return 292.65
    if data['opponent_team'] == 'Brighton':
        return 264.7
    if data['opponent_team'] == 'Burnley':
        return 138.05
    if data['opponent_team'] == 'Chelsea':
        return 823.7
    if data['opponent_team'] == 'Crystal Palace':
        return 268.80
    if data['opponent_team'] == 'Everton':
        return 415.95
    if data['opponent_team'] == 'Leeds':
        return 275.30
    if data['opponent_team'] == 'Leicester City':
        return 508.30
    if data['opponent_team'] == 'Liverpool':
        return 870
    if data['opponent_team'] == 'Manchester City':
        return 1010
    if data['opponent_team'] == 'Manchester Utd':
        return 708.8
    if data['opponent_team'] == 'Newcastle Utd':
        return 333.6
    if data['opponent_team'] == 'Norwich':
        return 163.2
    if data['opponent_team'] == 'Southampton':
        return 271.45
    if data['opponent_team'] == 'Tottenham':
        return 727.3
    if data['opponent_team'] == 'Watford':
        return 156.2
    if data['opponent_team'] == 'West Ham':
        return 384
    if data['opponent_team'] == 'Wolves':
        return 385.95
    if data['opponent_team'] == 'Hull City':
        return 135.85
    if data['opponent_team'] == 'Middlesbrough':
        return 128.8
    if data['opponent_team'] == 'Bournemouth':
        return 160.4
    if data['opponent_team'] == 'Sunderland':
        return 132
    if data['opponent_team'] == 'Swansea City':
        return 165.49
    if data['opponent_team'] == 'West Brom':
        return 141.15
    if data['opponent_team'] == 'Stoke City':
        return 192.45
    if data['opponent_team'] == 'Huddersfield Town':
        return 137.45
    if data['opponent_team'] == 'Fulham':
        return 202.5
    if data['opponent_team'] == 'Cardiff City':
        return 113
    if data['opponent_team'] == 'Sheffield Utd':
        return 148.85
    if data['opponent_team'] == 'Nottingham Forest':
        return 189.8
        
data['opponent_team_mv'] = data.apply(opponent_team_market_value, axis = 1)

## Lags <a class="anchor" id="lags"></a>

Since we are dealing with time series data, we create two functions to keep track of lags (a fixed amount of passing time) on both player and team levels. In this case, lags are a certain amount of gameweeks. The functions return lagged statistics for the x amount of lags that we specify and adds them to the original dataframe as new columns, which will be helpful later for modeling.

Let's look at the two functions:

- **player_lag_stats** &rarr; this function returns the lagged statistic we specify for each player and each specified lag. Let's say we want Kevin de Bruyne's lagged goals_scored (statistic) for the last 1, 2, and 3 gameweeks (lags). Let's assume De Bruyne scored  1, 2, and 0 goals in the past three gameweeks (respectively). This is what our lags would look like:
    - *goals_scored_last_1 = 0*
    - *goals_scored_last_2 = 0 + 2 = 2* 
    - *goals_scored_last_3 = 0 + 2 + 1 = 3* 
     
    
- **team_lag_stats** &rarr; this function does the same as the function above, but on a team level - it returns the lagged statistic for the team as a whole, not just the player. It also returns their *conceded* lagged statistic, and their opponent's lagged and conceded lagged statistic. For example, if we want a team's goals_scored (statistic) in the last 1 gameweek (lag), the function would return how many goals the team scored and conceded in the last gameweek, and how many goals their opponent scored and conceded in the last gameweek. 

We need lagged statistics because we want to predict a player's expected points based on historical data.

In [10]:
#Lagged stats for players
def player_lag_stats(df, stats, lags):    
    player_lag = []
    updated_df = df.copy()
    stats.insert(0, 'minutes')
    for stat in stats:
        for lag in lags:
            stat_name = stat + '_last_' + str(lag)
            minute_game = 'minutes_last_' + str(lag)
            if lag == 'all':
                updated_df[stat_name] = updated_df.groupby(['player'])[stat].apply(lambda x: x.cumsum() - x)
            else: 
                updated_df[stat_name] = updated_df.groupby(['player'])[stat].apply(lambda x: x.rolling(min_periods=1, 
                                                                                            window=lag+1).sum() - x)
                
    return updated_df, player_lag

In [11]:
#Lagged stats for teams
def team_lag_stats(df, stats, lags):
    team_lag = []
    updated_new = df.copy()
    for stat in stats:
        stat_name_team = stat + '_team'
        stat_conceded_team = stat_name_team + '_conceded'
        stat_team = (df.groupby(['team', 'season', 'gw',
                                   'opponent_team'])
                        [stat].sum().rename(stat_name_team).reset_index())
        stat_team = stat_team.merge(stat_team,
                           left_on=['team', 'season', 'gw',
                                    'opponent_team'],
                           right_on=['opponent_team', 'season', 'gw',
                                     'team'],
                           how='left',
                           suffixes = ('', '_conceded'))
        stat_team.drop(['team_conceded', 'opponent_team_conceded'], axis=1, inplace=True)
        updated_new = updated_new.merge(stat_team, 
                          on=['team', 'season', 'gw', 'opponent_team'], 
                          how='left')
        updated_new = updated_new.merge(stat_team,
                 left_on=['team', 'season', 'gw', 'opponent_team'],
                 right_on=['opponent_team', 'season', 'gw', 'team'],
                 how='left',
                 suffixes = ('', '_opponent'))
        updated_new.drop(['team_opponent', 'opponent_team_opponent'], axis=1, inplace=True)
        
    team_lag = team_lag + [team_lag + '_opponent' for team_lag in team_lag]  

    return updated_new, team_lag

## Model <a class="anchor" id="model"></a>

### Training Data <a class="anchor" id="training_data"></a>

Now that we have our data and our lag functions, we can proceed to create the training data for our model.

Our model will use the following features to make its predictions:

- **total_points**: total points scored.
- **minutes**: minutes played.
- **assists**: assists made.
- **clean_sheets**: if player made clean sheet.
- **goals_conceded**: goals conceded.
- **goals_scored**: goals scored.
- **red_cards**: if player got red card.
- **influence (scraped from FPL)**: evaluates the degree to which a player has made an impact on a single match or throughout the season. It takes into account events and actions that could directly or indirectly effect the outcome of the fixture. At the top level these are decisive actions like goals and assists. But the Influence score also processes significant defensive actions to analyse the effectiveness of defenders and goalkeepers.
- **threat (scraped from FPL)**: this is a value that examines a player's threat on goal. It gauges the individuals most likely to score goals. While attempts are the key action, the Index looks at pitch location, giving greater weight to actions that are regarded as the best chances to score.
- **fdr**: fixture difficulty rating.
- **team_mv**: the team's market value.
- **opponent_team_mv**: the opponent team's market value.
- **xg**: player expected goals (per90min).
- **xa**: player expected assists (per90min).
- **team_xg**: team expected goals (per90min).
- **team_xa**: team expected assists (per90min).



**Note: Each statistic is per player, per gameweek. The specific lags we're using, and whether we are using player and team levels, is specified below.*

In [12]:
#Drop duplicates
training_data = data.drop_duplicates()

#Total points
training_data, players_lag = player_lag_stats(training_data, ['total_points'], [1])

#Assists
training_data, players_lag = player_lag_stats(training_data, ['assists'], [1])

#Clean sheets
training_data, players_lag = player_lag_stats(training_data, ['clean_sheets'], [1])

#Goals conceded
training_data, players_lag = player_lag_stats(training_data, ['goals_conceded'], [1])

#Goals scored
training_data, players_lag = player_lag_stats(training_data, ['goals_scored'], [1])

#Red Cards
training_data, players_lag = player_lag_stats(training_data, ['red_cards'], [1])

#Columns to drop
drop_columns = ['gw', 'player', 'minutes',
                'assists', 'bonus', 'bps', 'clean_sheets','goals_conceded', 
                'goals_scored', 'penalties_saved', 'red_cards', 'saves',
                'yellow_cards', 'season', 'team_a_score', 'team_h_score', 
                'team_won', 'ict_index', 'creativity',
                'npxg', 'team_npxg', 'team', 'opponent_team',
                'was_home', 'position']

training_data = training_data.drop(drop_columns,axis = 1)

#Fill NaN values with 0
training_data = training_data.fillna(0)

#Round all numbers to two decimal points for simplicity
training_data = training_data.round(2)

In [13]:
training_data

Unnamed: 0,total_points,influence,threat,xg,xa,team_xg,team_xa,fdr,team_mv,opponent_team_mv,minutes_last_1,total_points_last_1,assists_last_1,clean_sheets_last_1,goals_conceded_last_1,goals_scored_last_1,red_cards_last_1
0,6,10.2,4.0,0.0,0.0,1.41,0.96,2,671.50,202.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,7,38.6,48.0,0.4,0.2,1.41,0.96,2,671.50,202.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,7,14.0,0.0,0.0,0.0,1.41,0.96,2,671.50,202.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0,0.0,0.0,0.0,0.0,1.41,0.96,2,671.50,202.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0,0.0,0.0,0.0,0.0,1.41,0.96,2,671.50,202.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
61477,0,0.0,0.0,0.0,0.0,1.37,0.84,2,292.65,275.3,0.0,0.0,0.0,0.0,0.0,0.0,0.0
61478,0,0.0,0.0,0.0,0.0,1.37,0.84,2,292.65,275.3,0.0,0.0,0.0,0.0,0.0,0.0,0.0
61479,0,0.0,0.0,0.0,0.0,1.37,0.84,2,292.65,275.3,0.0,0.0,0.0,0.0,0.0,0.0,0.0
61480,0,0.0,0.0,0.0,0.0,1.37,0.84,2,292.65,275.3,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Random Forest <a class="anchor" id="random_forest"></a>

We use the scikit learn Python library to develop a random forest.

In [14]:
#Features to make our predictions
x = training_data.drop('total_points', axis=1)

#What we want to predict
y = training_data['total_points'] 

#Split up data into train and test sets, fit model, and make predictions
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.10, random_state = 42)

#Random forest
from sklearn.ensemble import RandomForestClassifier
rnd_clf = RandomForestClassifier(n_estimators=400, random_state=42)
rnd_clf.fit(x_train, y_train)

#Predictions
y_pred = rnd_clf.predict(x_test)

### Measuring Accuracy <a class="anchor" id="accuracy"></a>

We measure the accuracy of the model.

In [15]:
#Calculating accuracy
from sklearn.metrics import accuracy_score
accuracy_score(y_test, y_pred)

0.7578468043584322

## Predictions <a class="anchor" id="predictions"></a>

### Predictions - Gameweek 17 <a class="anchor" id="predictions_gw17"></a>

In [16]:
player_stats_gw17 = player_stats_2223[player_stats_2223['GW'] == 17]
relevant_columns = ['name', 'total_points']
player_stats_gw17 = player_stats_gw17[relevant_columns]
player_stats_gw17 = player_stats_gw17.rename(columns={'name': 'player'})
gw17_predictions = pd.read_csv(path/'gw17_predictions.csv')
gw17_predictions_vs_actual = gw17_predictions.merge(player_stats_gw17, on = 'player')
gw17_predictions_vs_actual[['player', 'predicted_total_points', 'total_points']]

Unnamed: 0,player,predicted_total_points,total_points
0,Raphaël Varane,0,6
1,Harry Maguire,0,1
2,Luke Shaw,0,8
3,Marcus Rashford,0,14
4,Donny van de Beek,0,1
...,...,...,...
628,Willian Borges da Silva,0,6
629,Carlos Vinícius Alves Morais,0,1
630,Stefan Parkes,0,0
631,Martial Godo,0,0


### Predictions - Gameweek 19 <a class="anchor" id="predictions_gw19"></a>

In [17]:
player_stats_gw19 = player_stats_2223[player_stats_2223['GW'] == 19]
relevant_columns = ['name', 'total_points']
player_stats_gw19 = player_stats_gw19[relevant_columns]
player_stats_gw19 = player_stats_gw19.rename(columns={'name': 'player'})
gw19_predictions = pd.read_csv(path/'gw19_predictions.csv')
gw19_predictions_vs_actual = gw19_predictions.merge(player_stats_gw19, on = 'player')
gw19_predictions_vs_actual[['player', 'predicted_total_points', 'total_points']]

Unnamed: 0,player,predicted_total_points,total_points
0,Liam Cooper,2,0
1,Luke Ayling,1,1
2,Mateusz Klich,1,1
3,Adam Forshaw,1,0
4,Patrick Bamford,0,0
...,...,...,...
709,Brandon Austin,0,0
710,Alfie Devine,0,0
711,Troy Parrott,0,0
712,Richarlison de Andrade,0,0


### Predictions - Gameweek 20 <a class="anchor" id="predictions_gw19"></a>

In [18]:
player_stats_gw20 = player_stats_2223[player_stats_2223['GW'] == 20]
relevant_columns = ['name', 'total_points']
player_stats_gw20 = player_stats_gw20[relevant_columns]
player_stats_gw20 = player_stats_gw20.rename(columns={'name': 'player'})
gw20_predictions = pd.read_csv(path/'gw20_predictions.csv')
gw20_predictions_vs_actual = gw20_predictions.merge(player_stats_gw20, on = 'player')
gw20_predictions_vs_actual[['player', 'predicted_total_points', 'total_points']]

Unnamed: 0,player,predicted_total_points,total_points
0,Raúl Jiménez,0,1
1,Hwang Hee-chan,1,3
2,Max Kilman,1,9
3,Nathan Collins,2,6
4,Joseph Hodge,1,0
...,...,...,...
703,Archie Gray,0,0
704,Joel Robles,0,0
705,Wilfried Gnonto,8,2
706,Mateo Joseph Fernández,0,0


### Predictions - Gameweek 22 <a class="anchor" id="predictions_gw22"></a>

Now that we trained our model, we are ready to make predictions for the upcoming PL gameweek.

First, we import data with upcoming gameweek information, merge with relevant player statistics from last gameweek , and then add the rest of the statistics we use for the model with a value of zero (we do this in order to get lagged statistics later):

**Note: remember to change the gameweek weekly to update the data.*

We create a dataframe with the upcoming gw's data:

# CHANGE GAMEWEEK HERE

In [19]:
#Last gw (most recently played)
gameweek = 21

#Player stats for most recent gameweek
player_stats = player_stats_2223[player_stats_2223['GW'] == gameweek]
relevant_columns = ['name','creativity', 'ict_index','influence', 'threat']
player_stats = player_stats[relevant_columns]
player_stats = player_stats.rename(columns={'name': 'player'})

#Add relevant statistics with value = 0
player_stats[['minutes', 'total_points', 'assists', 'bonus', 'bps',
       'clean_sheets', 'goals_conceded', 'goals_scored',
       'penalties_saved', 'red_cards', 'saves', 'yellow_cards', 'team_a_score', 'team_h_score']] = 0

#Merge dataframes and make some adjustments
def fuzzy_merge(df_1, df_2, key1, key2, threshold=90, limit=2):
    """
    :param df_1: the left table to join
    :param df_2: the right table to join
    :param key1: key column of the left table
    :param key2: key column of the right table
    :param threshold: how close the matches should be to return a match, based on Levenshtein distance
    :param limit: the amount of matches that will get returned, these are sorted high to low
    :return: dataframe with boths keys and matches
    """
    s = df_2[key2].tolist()
    
    m = df_1[key1].apply(lambda x: process.extract(x, s, limit=limit))    
    df_1['matches'] = m
    
    m2 = df_1['matches'].apply(lambda x: ', '.join([i[0] for i in x if i[1] >= threshold]))
    df_1['matches'] = m2
    
    return df_1

#Player Raw Data Merged with player's team
players_raw = players_raw[['first_name', 'second_name', 'team_code']]
teams = teams[['code', 'name']]
players_raw = players_raw.merge(teams, left_on = 'team_code', right_on= 'code')
players_raw['player'] = players_raw['first_name'] + ' ' + players_raw['second_name']
players_raw = players_raw.rename(columns={'name': 'team'})
players_raw = players_raw[['player', 'team']]
players_raw['team'] = players_raw['team'].replace({'Man Utd': 'Manchester Utd', 
                                          'Newcastle United': 'Newcastle Utd',
                                          'West Ham United': 'West Ham',
                                          'Tottenham Hotspur': 'Tottenham',
                                          'Brighton and Hove Albion': 'Brighton',
                                          'Wolverhampton Wanderers': 'Wolves',
                                          'Leicester': 'Leicester City',
                                          'Man City': 'Manchester City',
                                          'Newcastle': 'Newcastle Utd',
                                          "Nott'm Forest": 'Nottingham Forest',
                                          'Spurs': 'Tottenham'})

#Gameweek Data
season_gws['opponent_team'] = season_gws['opponent_team'].replace({'Manchester United': 'Manchester Utd', 
                                          'Newcastle United': 'Newcastle Utd',
                                          'West Ham United': 'West Ham',
                                          'Tottenham Hotspur': 'Tottenham',
                                          'Brighton and Hove Albion': 'Brighton',
                                          'Wolverhampton Wanderers': 'Wolves'})

season_gws['team'] = season_gws['team'].replace({'Manchester United': 'Manchester Utd', 
                                          'Newcastle United': 'Newcastle Utd',
                                          'West Ham United': 'West Ham',
                                          'Tottenham Hotspur': 'Tottenham',
                                          'Brighton and Hove Albion': 'Brighton',
                                          'Wolverhampton Wanderers': 'Wolves'})

season_gws = season_gws[['gw','team', 'opponent_team', 'was_home', 'season']]
season_gws = season_gws.drop_duplicates()
season_gws = season_gws.reset_index().drop('index', axis=1)

#CHANGE GAMEWEEK TO NEXT GAMEWWEK HERE:
season_gws = season_gws[season_gws['gw'] == 22]

#Merge gameweek info with player names
season_player_merge = season_gws.merge(players_raw, on='team')
season_player_merge = season_player_merge[['player', 'gw', 'team', 'opponent_team', 'was_home', 'season']]
season_player_merge

#Add player's position
cleaned_players['player'] = cleaned_players['first_name'] + ' ' + cleaned_players['second_name']
cleaned_players = cleaned_players[['player', 'element_type']]
cleaned_players = cleaned_players.rename(columns={'element_type': 'position'})
season_player_merge = season_player_merge.merge(cleaned_players, on='player')

#Ordered and clean df with player gw data
season_player_merge = season_player_merge[['player', 'position', 'gw', 'team', 'opponent_team', 'was_home', 'season']]
season_player_merge = season_player_merge.drop_duplicates()

season_gw = fuzzy_merge(season_player_merge, player_stats, 'player', 'player', threshold=91)
season_gw_stats = season_gw.merge(player_stats, left_on = 'matches', right_on = 'player')
season_gw_stats = season_gw_stats.drop(['player_x', 'matches'], axis=1)
season_gw_stats = season_gw_stats.rename(columns={'player_y': 'player'})
season_gw_stats = season_gw_stats[['player', 'position', 'gw', 'team', 'opponent_team', 'was_home',
       'season', 'minutes', 'total_points', 'assists', 'bonus', 'bps',
       'clean_sheets', 'creativity', 'goals_conceded', 'goals_scored',
       'ict_index', 'influence', 'penalties_saved', 'red_cards', 'saves',
       'threat', 'yellow_cards', 'team_a_score', 'team_h_score']]

#Add player's xg, xa, npxg
season_gw_stats = fuzzy_merge(season_gw_stats, player_standard_stats_2223, 'player', 'player', threshold=91)
season_gw_stats['matches'].replace('', np.nan, inplace=True)
season_gw_no_match = season_gw_stats[season_gw_stats['matches'].isna()]
season_gw_no_match[['xg', 'xa', 'npxg']] = 0
season_gw_no_match = season_gw_no_match.drop('matches', axis=1)
season_gw_stats = season_gw_stats.dropna(subset=['matches'])
season_gw_stats = season_gw_stats.merge(player_standard_stats_2223, left_on = 'matches', right_on = 'player')
season_gw_stats = season_gw_stats.drop(['player_x', 'matches'], axis=1)
season_gw_stats = season_gw_stats.rename(columns={'player_y': 'player'})
season_gw_stats = season_gw_stats[['player', 'position', 'gw', 'team', 'opponent_team', 'was_home',
       'season', 'minutes', 'total_points', 'assists', 'bonus', 'bps',
       'clean_sheets', 'creativity', 'goals_conceded', 'goals_scored',
       'ict_index', 'influence', 'penalties_saved', 'red_cards', 'saves',
       'threat', 'yellow_cards', 'team_a_score', 'team_h_score', 'xg', 'xa', 'npxg']]
season_gw_stats = pd.concat([season_gw_stats, season_gw_no_match])

#Add team's xg, xa, and npxg
next_gw = season_gw_stats 
next_gw = next_gw.merge(team_standard_stats_2223, left_on = 'team', right_on ='team')
next_gw = next_gw[['player', 'position', 'gw', 'team', 'opponent_team', 'was_home',
       'season', 'minutes', 'total_points', 'assists', 'bonus', 'bps',
       'clean_sheets', 'creativity', 'goals_conceded', 'goals_scored',
       'ict_index', 'influence', 'penalties_saved', 'red_cards', 'saves',
       'threat', 'yellow_cards', 'team_a_score', 'team_h_score', 'xg', 'xa', 'npxg',
       'team_xg', 'team_xa', 'team_npxg']]

#FDR, team_won, team_mv, and opponent_team_mv assignment
next_gw['fdr'] = next_gw.apply(fdr_assignment, axis = 1)
next_gw['team_won'] = next_gw.apply(team_won, axis = 1)
next_gw['team_mv'] = next_gw.apply(team_market_value, axis = 1)
next_gw['opponent_team_mv'] = next_gw.apply(opponent_team_market_value, axis = 1)

#Re-order columns
next_gw = next_gw[['player', 'position', 'gw', 'team', 'opponent_team', 'was_home',
       'season', 'minutes', 'total_points', 'assists', 'bonus', 'bps',
       'clean_sheets', 'creativity', 'goals_conceded', 'goals_scored',
       'ict_index', 'influence', 'penalties_saved', 'red_cards', 'saves',
       'threat', 'yellow_cards', 'team_a_score', 'team_h_score', 'team_won',
       'team_mv','opponent_team_mv','xg', 'xa', 'npxg','team_xg', 'team_xa', 'team_npxg']]

#Convert 'season' column into string
next_gw['season'] = next_gw['season'].apply(str)
    
next_gw

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  season_gw_no_match[['xg', 'xa', 'npxg']] = 0
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  season_gw_no_match[['xg', 'xa', 'npxg']] = 0
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  season_gw_no_match[['xg', 'xa', 'npxg']] = 0


Unnamed: 0,player,position,gw,team,opponent_team,was_home,season,minutes,total_points,assists,...,team_h_score,team_won,team_mv,opponent_team_mv,xg,xa,npxg,team_xg,team_xa,team_npxg
0,Hugo Lloris,GK,22,Tottenham,Manchester City,True,2223,0,0,0,...,0,0,727.3,1010.00,0.00,0.00,0.00,1.45,1.08,1.34
1,Fraser Forster,GK,22,Tottenham,Manchester City,True,2223,0,0,0,...,0,0,727.3,1010.00,0.00,0.00,0.00,1.45,1.08,1.34
2,Harry Kane,FWD,22,Tottenham,Manchester City,True,2223,0,0,0,...,0,0,727.3,1010.00,0.57,0.16,0.45,1.45,1.08,1.34
3,Son Heung-min,MID,22,Tottenham,Manchester City,True,2223,0,0,0,...,0,0,727.3,1010.00,0.28,0.18,0.28,1.45,1.08,1.34
4,Matt Doherty,DEF,22,Tottenham,Manchester City,True,2223,0,0,0,...,0,0,727.3,1010.00,0.10,0.04,0.10,1.45,1.08,1.34
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
690,Matthew Smith,MID,22,Arsenal,Everton,False,2223,0,0,0,...,0,0,671.5,415.95,0.00,0.00,0.00,1.96,1.38,1.94
691,Lino Sousa,DEF,22,Arsenal,Everton,False,2223,0,0,0,...,0,0,671.5,415.95,0.00,0.00,0.00,1.96,1.38,1.94
692,Karl Hein,GK,22,Arsenal,Everton,False,2223,0,0,0,...,0,0,671.5,415.95,0.00,0.00,0.00,1.96,1.38,1.94
693,Amario Cozier-Duberry,MID,22,Arsenal,Everton,False,2223,0,0,0,...,0,0,671.5,415.95,0.00,0.00,0.00,1.96,1.38,1.94


Now, we take the dataframe we just created, and concatenate it to the original data (with all previous gws' info), in order to get the relevant lagged statistics - the ones we used in our original model. Then, we only keep next gameweek's rows and drop the rest to make our predictions:

In [25]:
#Adjusting original data to concatenate with upcoming gameweek dataframe
data_adjusted = data[['player', 'position', 'gw', 'team', 'opponent_team', 'was_home', 'total_points',
                         'creativity','ict_index','influence','threat','xg', 'xa', 'npxg', 'team_xg',
                         'team_xa', 'team_npxg', 'fdr','season', 'minutes', 'assists', 
                         'bonus', 'bps', 'clean_sheets', 'goals_conceded', 'goals_scored', 'penalties_saved', 
                         'red_cards', 'saves', 'yellow_cards', 'team_a_score', 'team_h_score', 'team_won', 
                         'team_mv', 'opponent_team_mv']]

#We concatenate adjusted original data with next gameweek's dataframe
data_adjusted = pd.concat([data_adjusted, next_gw])
data_adjusted = data_adjusted.drop_duplicates().reset_index()
data_adjusted = data_adjusted.drop('index', axis=1)

#Total points
data_adjusted, players_lag = player_lag_stats(data_adjusted, ['total_points'], [1])

#Assists
data_adjusted, players_lag = player_lag_stats(data_adjusted, ['assists'], [1])

#Clean sheets
data_adjusted, players_lag = player_lag_stats(data_adjusted, ['clean_sheets'], [1])

#Goals conceded
data_adjusted, players_lag = player_lag_stats(data_adjusted, ['goals_conceded'], [1])

#Goals scored
data_adjusted, players_lag = player_lag_stats(data_adjusted, ['goals_scored'], [1])

#Red Cards
data_adjusted, players_lag = player_lag_stats(data_adjusted, ['red_cards'], [1])

#Only keep data for upcoming gw
data_adjusted = data_adjusted.loc[(data_adjusted['gw'] == gameweek + 1)]
data_adjusted = data_adjusted.loc[(data_adjusted['season'] == '2223')]

#Columns to drop
drop_columns = ['gw', 'player', 'minutes',
                'assists', 'bonus', 'bps', 'clean_sheets','goals_conceded', 
                'goals_scored', 'penalties_saved', 'red_cards', 'saves',
                'yellow_cards', 'season', 'team_a_score', 'team_h_score', 
                'team_won', 'ict_index', 'creativity',
                'npxg', 'team_npxg', 'team', 'opponent_team',
                'was_home', 'position']

next_gw_stats = data_adjusted.drop(drop_columns,axis = 1)

#Fill NaN values with 0
next_gw_stats = next_gw_stats.fillna(0)

#Round all numbers to two decimal points for simplicity
next_gw_stats = next_gw_stats.round(2)

Now we make our predictions and add them to the upcoming gameweek dataframe:

In [26]:
#Features to make our predictions
gw22 = next_gw_stats.drop('total_points', axis=1)

#Make predictions
predictions_next_gw = rnd_clf.predict(gw22)

In [27]:
#We add predictions to the original dataframe
data_adjusted['predicted_total_points'] = predictions_next_gw
predictions_gw22 = data_adjusted[['player', 'gw', 'position', 'team', 'opponent_team', 'season', 'predicted_total_points']]
predictions_gw22 = predictions_gw22.reset_index()
predictions_gw22 = predictions_gw22.drop('index', axis=1)

#We make all negative values equal to 0
predictions_gw22['predicted_total_points'] = predictions_gw22['predicted_total_points'].round(0).astype(int)
predictions_gw22['predicted_total_points'] = predictions_gw22['predicted_total_points'].where(predictions_gw22['predicted_total_points'] > 0, other=0)

#Player data from this season
players_2223 = pd.read_csv(path/'2022-23/cleaned_players.csv')

#Adjust 2222-23 players dataframe - get players' full names and keep cost column
players_2223['player'] = players_2223['first_name'] + ' ' + players_2223['second_name']
players_2223 = players_2223.set_index('player')
players_2223 = players_2223.drop(['first_name', 'second_name'], axis=1)
players_2223 = players_2223[['now_cost']]
players_2223 = players_2223.rename({'now_cost': 'cost'}, axis=1)
players_2223 = players_2223.astype(str)

#Adjust cost values to represent actual FPL costs
for index, row in players_2223.iterrows():
    if (len(row['cost'])) == 3:
        row['cost'] = (row['cost'][:2] + '.' + row['cost'][2:])
    if (len(row['cost'])) == 2:
        row['cost'] = (row['cost'][:1] + '.' + row['cost'][1:])
        
players_2223['cost'] = players_2223['cost'].astype(float)
players_2223 = players_2223.reset_index()

#Merge predictions with players' costs
predictions_gw22 = predictions_gw22.merge(players_2223, on='player')
predictions_gw22

predictions_gw22.to_csv(path/'gw22_predictions.csv', index=False)

In [28]:
predictions_gw22

Unnamed: 0,player,gw,position,team,opponent_team,season,predicted_total_points,cost
0,Hugo Lloris,22,GK,Tottenham,Manchester City,2223,2,5.5
1,Fraser Forster,22,GK,Tottenham,Manchester City,2223,0,3.9
2,Harry Kane,22,FWD,Tottenham,Manchester City,2223,6,11.7
3,Son Heung-min,22,MID,Tottenham,Manchester City,2223,2,11.6
4,Matt Doherty,22,DEF,Tottenham,Manchester City,2223,0,4.6
...,...,...,...,...,...,...,...,...
664,Matthew Smith,22,MID,Arsenal,Everton,2223,0,4.5
665,Lino Sousa,22,DEF,Arsenal,Everton,2223,0,4.0
666,Karl Hein,22,GK,Arsenal,Everton,2223,0,4.0
667,Amario Cozier-Duberry,22,MID,Arsenal,Everton,2223,0,4.5


Let's take a look at our highest-expected scorers:

In [29]:
#Highest-expected scorers for Gameweek 17
highest_expected_scorers_gw22 = predictions_gw22.sort_values('predicted_total_points', ascending=False)
highest_expected_scorers_gw22.head(25)

Unnamed: 0,player,gw,position,team,opponent_team,season,predicted_total_points,cost
358,Erling Haaland,22,FWD,Manchester City,Tottenham,2223,17,12.2
419,Jarrod Bowen,22,MID,West Ham,Newcastle Utd,2223,16,8.0
642,Eddie Nketiah,22,FWD,Arsenal,Everton,2223,13,6.6
556,Jaidon Anthony,22,MID,Bournemouth,Brighton,2223,9,5.2
644,Bukayo Saka,22,MID,Arsenal,Everton,2223,9,8.2
225,Kaoru Mitoma,22,MID,Brighton,Bournemouth,2223,9,5.1
106,Marcus Rashford,22,MID,Manchester Utd,Crystal Palace,2223,8,7.1
229,Pervis Estupiñán,22,DEF,Brighton,Bournemouth,2223,8,4.5
75,Kieran Trippier,22,DEF,Newcastle Utd,West Ham,2223,8,6.0
37,Sam Surridge,22,FWD,Nottingham Forest,Leeds,2223,8,4.7


## Ideal Teams <a class="anchor" id="ideal_team"></a>

### Ideal Team - Gameweek 17 (No Budget Constraint) <a class="anchor" id="ideal_team_gw17_no_budget"></a>

In [30]:
ideal_team_gw17_no_budget = pd.read_csv(path/'gw17_ideal_team_no_budget.csv')
ideal_team_gw17_no_budget = ideal_team_gw17_no_budget.merge(player_stats_gw17, on='player')
ideal_team_gw17_no_budget

Unnamed: 0,player,gw,position,team,opponent_team,season,predicted_total_points,cost,total_points
0,Gavin Bazunu,17,GK,Southampton,Brighton,2223,9,4.5,1
1,Danny Ward,17,GK,Leicester City,Newcastle Utd,2223,3,4.1,1
2,Andrew Robertson,17,DEF,Liverpool,Aston Villa,2223,9,6.8,8
3,Daniel Amartey,17,DEF,Leicester City,Newcastle Utd,2223,7,4.3,1
4,Mohammed Salisu,17,DEF,Southampton,Brighton,2223,6,4.4,1
5,William Saliba,17,DEF,Arsenal,West Ham,2223,6,5.3,2
6,Luke Thomas,17,DEF,Leicester City,Newcastle Utd,2223,6,4.2,1
7,Rodrigo Bentancur,17,MID,Tottenham,Brentford,2223,14,5.4,0
8,Rodrigo Moreno,17,MID,Leeds,Manchester City,2223,13,6.3,2
9,Martin Ødegaard,17,MID,Arsenal,West Ham,2223,12,6.4,11


### Ideal Team - Gameweek 17 (Budget Constraint) <a class="anchor" id="ideal_team_gw17_budget"></a>

In [31]:
ideal_team_gw17_budget = pd.read_csv(path/'gw17_ideal_team_budget.csv')
ideal_team_gw17_budget = ideal_team_gw17_budget.merge(player_stats_gw17, on='player')
ideal_team_gw17_budget

Unnamed: 0,player,team,opponent_team,position,predicted_total_points,cost,total_points
0,Gavin Bazunu,Southampton,Brighton,GK,9,4.5,1
1,Danny Ward,Leicester City,Newcastle Utd,GK,3,4.1,1
2,Mohammed Salisu,Southampton,Brighton,DEF,6,4.4,1
3,William Saliba,Arsenal,West Ham,DEF,6,5.3,2
4,Daniel Amartey,Leicester City,Newcastle Utd,DEF,7,4.3,1
5,Manuel Akanji,Manchester City,Leeds,DEF,6,5.0,2
6,Andrew Robertson,Liverpool,Aston Villa,DEF,9,6.8,8
7,Martin Ødegaard,Arsenal,West Ham,MID,12,6.4,11
8,Joe Willock,Newcastle Utd,Leicester City,MID,11,4.9,3
9,Harvey Barnes,Leicester City,Newcastle Utd,MID,8,6.9,2


### Ideal Team - Gameweek 19 (No Budget Constraint) <a class="anchor" id="ideal_team_gw19_no_budget"></a>

In [32]:
ideal_team_gw19_no_budget = pd.read_csv(path/'gw19_ideal_team_no_budget.csv')
ideal_team_gw19_no_budget = ideal_team_gw19_no_budget.merge(player_stats_gw19, on='player')
ideal_team_gw19_no_budget

Unnamed: 0,player,gw,position,team,opponent_team,season,predicted_total_points,cost,total_points
0,David De Gea Quintana,19,GK,Manchester Utd,Bournemouth,2223,6,4.9,7
1,Nick Pope,19,GK,Newcastle Utd,Arsenal,2223,6,5.3,10
2,Ethan Pinnock,19,DEF,Brentford,Liverpool,2223,7,4.4,2
3,Serge Aurier,19,DEF,Nottingham Forest,Southampton,2223,7,4.5,8
4,Sven Botman,19,DEF,Newcastle Utd,Arsenal,2223,6,4.4,6
5,Raphaël Varane,19,DEF,Manchester Utd,Bournemouth,2223,6,4.8,0
6,Dan Burn,19,DEF,Newcastle Utd,Arsenal,2223,6,4.5,6
7,Martin Ødegaard,19,MID,Arsenal,Newcastle Utd,2223,12,6.6,2
8,Gabriel Martinelli Silva,19,MID,Arsenal,Newcastle Utd,2223,10,6.8,3
9,Kiernan Dewsbury-Hall,19,MID,Leicester City,Fulham,2223,9,4.9,0


### Ideal Team - Gameweek 19 (Budget Constraint) <a class="anchor" id="ideal_team_gw19_budget"></a>

In [33]:
ideal_team_gw19_budget = pd.read_csv(path/'gw19_ideal_team_budget.csv')
ideal_team_gw19_budget = ideal_team_gw19_budget.merge(player_stats_gw19, on='player')
ideal_team_gw19_budget

Unnamed: 0,player,team,opponent_team,position,predicted_total_points,cost,total_points
0,David De Gea Quintana,Manchester Utd,Bournemouth,GK,6,4.9,7
1,Nick Pope,Newcastle Utd,Arsenal,GK,6,5.3,10
2,Ethan Pinnock,Brentford,Liverpool,DEF,7,4.4,2
3,Serge Aurier,Nottingham Forest,Southampton,DEF,7,4.5,8
4,Dan Burn,Newcastle Utd,Arsenal,DEF,6,4.5,6
5,Sven Botman,Newcastle Utd,Arsenal,DEF,6,4.4,6
6,Clément Lenglet,Tottenham,Crystal Palace,DEF,6,4.8,6
7,Demarai Gray,Everton,Brighton,MID,9,5.3,7
8,Raheem Sterling,Chelsea,Manchester City,MID,9,9.7,1
9,Raheem Sterling,Chelsea,Manchester City,MID,9,9.7,0


### Ideal Team - Gameweek 20 (No Budget Constraint) <a class="anchor" id="ideal_team_gw20_no_budget"></a>

In [34]:
ideal_team_gw20_no_budget = pd.read_csv(path/'gw20_ideal_team_no_budget.csv')
ideal_team_gw20_no_budget = ideal_team_gw20_no_budget.merge(player_stats_gw20, on='player')
ideal_team_gw20_no_budget

Unnamed: 0,player,gw,position,team,opponent_team,season,predicted_total_points,cost,total_points
0,Nick Pope,20,GK,Newcastle Utd,Fulham,2223,3,5.4,5
1,David Raya Martin,20,GK,Brentford,Bournemouth,2223,2,4.6,5
2,Luke Shaw,20,DEF,Manchester Utd,Manchester City,2223,9,5.1,2
3,Luke Shaw,20,DEF,Manchester Utd,Manchester City,2223,9,5.1,2
4,Matt Doherty,20,DEF,Tottenham,Arsenal,2223,8,4.6,1
5,Matt Doherty,20,DEF,Tottenham,Arsenal,2223,8,4.6,0
6,Kieran Trippier,20,DEF,Newcastle Utd,Fulham,2223,7,6.0,9
7,Ethan Pinnock,20,DEF,Brentford,Bournemouth,2223,7,4.4,6
8,Dan Burn,20,DEF,Newcastle Utd,Fulham,2223,6,4.5,9
9,Kaoru Mitoma,20,MID,Brighton,Liverpool,2223,11,4.9,6


### Ideal Team - Gameweek 20 (Budget Constraint) <a class="anchor" id="ideal_team_gw20_budget"></a>

In [35]:
ideal_team_gw20_budget = pd.read_csv(path/'gw20_ideal_team_budget.csv')
ideal_team_gw20_budget = ideal_team_gw20_budget.merge(player_stats_gw20, on='player')
ideal_team_gw20_budget

Unnamed: 0,player,team,opponent_team,position,predicted_total_points,cost,total_points
0,Nick Pope,Newcastle Utd,Fulham,GK,3,5.4,5
1,Lukasz Fabianski,West Ham,Wolves,GK,2,5.0,3
2,Kieran Trippier,Newcastle Utd,Fulham,DEF,7,6.0,9
3,Dan Burn,Newcastle Utd,Fulham,DEF,6,4.5,9
4,Luke Shaw,Manchester Utd,Manchester City,DEF,9,5.1,2
5,Luke Shaw,Manchester Utd,Manchester City,DEF,9,5.1,2
6,Ethan Pinnock,Brentford,Bournemouth,DEF,7,4.4,6
7,Matt Doherty,Tottenham,Arsenal,DEF,8,4.6,1
8,Matt Doherty,Tottenham,Arsenal,DEF,8,4.6,0
9,Daniel Castelo Podence,Wolves,West Ham,MID,8,5.3,8


### Ideal Team - Gameweek 22 (No Budget Constraint) <a class="anchor" id="ideal_team_gw22_no_budget"></a>

The following algorithm returns an ideal team, according to **predicted_total_points**. The team satisfies the position requirements (2 goalkeepers, 5 defenders, 5 midfielders, and 3 forwards) and the team constraint (no more than 3 players per team), however, it does **NOT** satisfy the budget constraint.

If we were to pick players based on our model's **predicted_total_points** (without any budget constraint), we should pick the following players:

*Why would we want a team that doesn't satisfy the budget constraint?*
- Although we could not put together this squad for regular FPL, we could still select it for FPL draft, which doesn't have any budget constraint.

In [39]:
def get_ideal_team(gk = 2, df = 5, md = 5, fwd = 3, team_max = 3):
    ideal_team = []
    positions = {'GK': gk, 'DEF': df, 'MID': md, 'FWD': fwd}
    teams = {'Arsenal': team_max, 'Leeds': team_max, 'Manchester City': team_max, 
             'Tottenham': team_max, 'Liverpool': team_max, 'Southampton': team_max, 
             'Chelsea': team_max, 'Brentford': team_max, 'Nottingham Forest': team_max, 
             'Wolves': team_max, 'Aston Villa': team_max, 
             'Crystal Palace': team_max, 'West Ham': team_max, 'Leicester City': team_max, 
             'Newcastle Utd': team_max, 'Bournemouth': team_max, 'Everton': team_max, 
             'Brighton': team_max, 
             'Manchester Utd': team_max, 'Fulham': team_max}
    t = highest_expected_scorers_gw22.iterrows()
    for i, row1 in t:
        if (positions[row1['position']] > 0):
            ideal_team.append(row1['player'])
            positions[row1['position']] = positions[row1['position']] - 1
            teams[row1['team']] = teams[row1['team']] - 1
    return ideal_team

ideal_team_gw22_no_budget = pd.DataFrame(get_ideal_team()) 

In [40]:
ideal_team_gw22_no_budget = ideal_team_gw22_no_budget.rename({0: 'player'}, axis=1)
ideal_team_gw22_no_budget = ideal_team_gw22_no_budget.merge(highest_expected_scorers_gw22, on='player')
ideal_team_gw22_no_budget.position = pd.Categorical(ideal_team_gw22_no_budget.position, categories=['GK', 'DEF', 'MID', 'FWD'])
ideal_team_gw22_no_budget = ideal_team_gw22_no_budget.sort_values('position')
ideal_team_gw22_no_budget = ideal_team_gw22_no_budget.reset_index().drop('index', axis=1)
ideal_team_gw22_no_budget.to_csv(path/'gw22_ideal_team_no_budget.csv', index=False)
ideal_team_gw22_no_budget

Unnamed: 0,player,gw,position,team,opponent_team,season,predicted_total_points,cost
0,Aaron Ramsdale,22,GK,Arsenal,Everton,2223,6,4.9
1,Nick Pope,22,GK,Newcastle Utd,West Ham,2223,6,5.5
2,Pervis Estupiñán,22,DEF,Brighton,Bournemouth,2223,8,4.5
3,Kieran Trippier,22,DEF,Newcastle Utd,West Ham,2223,8,6.0
4,John Stones,22,DEF,Manchester City,Tottenham,2223,8,5.4
5,Lisandro Martínez,22,DEF,Manchester Utd,Crystal Palace,2223,8,4.5
6,Takehiro Tomiyasu,22,DEF,Arsenal,Everton,2223,6,4.2
7,Jarrod Bowen,22,MID,West Ham,Newcastle Utd,2223,16,8.0
8,Jaidon Anthony,22,MID,Bournemouth,Brighton,2223,9,5.2
9,Bukayo Saka,22,MID,Arsenal,Everton,2223,9,8.2


In [41]:
#Ideal team expected total points
predicted_points = str(sum(ideal_team_gw22_no_budget['predicted_total_points']))
print("Ideal team expected total points: " + predicted_points)

Ideal team expected total points: 139


### Ideal Team - Gameweek 22 (Budget Constraint) <a class="anchor" id="ideal_team_gw22_budget"></a>

The following algorithm returns an ideal team, according to **predicted_total_points**. The team satisfies the position requirements (2 goalkeepers, 5 defenders, 5 midfielders, and 3 forwards), the team constraint (no more than 3 players per team), **AND** the budget constraint (squad cost must not exceed £100 million).

If we were to pick players based on our model's **predicted_total_points** (satisfying the budget constraint), we should pick the following players:

In [42]:
positions = highest_expected_scorers_gw22.position.unique()
clubs = highest_expected_scorers_gw22.team.unique()
budget = 100
available_roles = {
    'GK': 2,
    'DEF': 5,
    'MID': 5,
    'FWD': 3    
}

names = [highest_expected_scorers_gw22.player[i] for i in highest_expected_scorers_gw22.index]
teams = [highest_expected_scorers_gw22.team[i] for i in highest_expected_scorers_gw22.index]
roles = [highest_expected_scorers_gw22.position[i] for i in highest_expected_scorers_gw22.index]
costs = [highest_expected_scorers_gw22.cost[i] for i in highest_expected_scorers_gw22.index]
predicted_points = [highest_expected_scorers_gw22.predicted_total_points[i] for i in highest_expected_scorers_gw22.index]
players = [LpVariable("player_" + str(i), cat="Binary") for i in highest_expected_scorers_gw22.index]
prob = LpProblem("Fantasy Ideal Team (total_points)", LpMaximize)

#Maximize predicted_total_points
prob += lpSum(players[i] * predicted_points[i] for i in range(len(highest_expected_scorers_gw22)))
#Budget constraint
prob += lpSum(players[i] * highest_expected_scorers_gw22.cost[highest_expected_scorers_gw22.index[i]] for i in range(len(highest_expected_scorers_gw22))) <= budget

for pos in positions:
    prob += lpSum(players[i] for i in range(len(highest_expected_scorers_gw22)) if roles[i] == pos) <= available_roles[pos]
#Max 3 per team constraint
for club in clubs:
    prob += lpSum(players[i] for i in range(len(highest_expected_scorers_gw22)) if teams[i] == club) <= 3
prob.solve()
df_list = []
for variable in prob.variables():
    if variable.varValue != 0:
        name = highest_expected_scorers_gw22.player[int(variable.name.split("_")[1])]
        club = highest_expected_scorers_gw22.team[int(variable.name.split("_")[1])]
        role = highest_expected_scorers_gw22.position[int(variable.name.split("_")[1])]
        predicted_points = highest_expected_scorers_gw22.predicted_total_points[int(variable.name.split("_")[1])]
        cost = highest_expected_scorers_gw22.cost[int(variable.name.split("_")[1])]
        opponent_team = highest_expected_scorers_gw22.opponent_team[int(variable.name.split("_")[1])]
        df_list.append((name, club, opponent_team, role, predicted_points, cost))
    

# Dataframe with name, club, position, points, cost
ideal_team_gw22_budget = pd.DataFrame(df_list, columns = ['player', 'team', 'opponent_team', 'position', 'predicted_total_points', 'cost'])



Welcome to the CBC MILP Solver 
Version: 2.10.3 
Build Date: Dec 15 2019 

command line - /Users/amirgrunhaus/opt/miniconda3/lib/python3.9/site-packages/pulp/apis/../solverdir/cbc/osx/64/cbc /var/folders/mh/djnbz20x6yx5_43kjtkrmxmh0000gn/T/bd772a8ae39244b9af67f05ea103eaa4-pulp.mps max timeMode elapsed branch printingOptions all solution /var/folders/mh/djnbz20x6yx5_43kjtkrmxmh0000gn/T/bd772a8ae39244b9af67f05ea103eaa4-pulp.sol (default strategy 1)
At line 2 NAME          MODEL
At line 3 ROWS
At line 30 COLUMNS
At line 3634 RHS
At line 3660 BOUNDS
At line 4330 ENDATA
Problem MODEL has 25 rows, 669 columns and 2007 elements
Coin0008I MODEL read with 0 errors
Option for timeMode changed from cpu to elapsed
Continuous objective value is 139 - 0.00 seconds
Cgl0004I processed model has 25 rows, 241 columns (241 integer (227 of which binary)) and 723 elements
Cutoff increment increased from 1e-05 to 0.9999
Cbc0038I Initial state - 0 integers unsatisfied sum - 0
Cbc0038I Solution found of -139


In [43]:
ideal_team_gw22_budget.position = pd.Categorical(ideal_team_gw22_budget.position, categories=['GK', 'DEF', 'MID', 'FWD'])
ideal_team_gw22_budget = ideal_team_gw22_budget.sort_values('position')
ideal_team_gw22_budget = ideal_team_gw22_budget.reset_index().drop('index', axis=1)
ideal_team_gw22_budget.to_csv(path/'gw22_ideal_team_budget.csv', index=False)
ideal_team_gw22_budget

Unnamed: 0,player,team,opponent_team,position,predicted_total_points,cost
0,Ederson Santana de Moraes,Manchester City,Tottenham,GK,6,5.4
1,Nick Pope,Newcastle Utd,West Ham,GK,6,5.5
2,Lisandro Martínez,Manchester Utd,Crystal Palace,DEF,8,4.5
3,Pervis Estupiñán,Brighton,Bournemouth,DEF,8,4.5
4,John Stones,Manchester City,Tottenham,DEF,8,5.4
5,Kieran Trippier,Newcastle Utd,West Ham,DEF,8,6.0
6,Dan Burn,Newcastle Utd,West Ham,DEF,6,4.6
7,Marcus Rashford,Manchester Utd,Crystal Palace,MID,8,7.1
8,Kaoru Mitoma,Brighton,Bournemouth,MID,9,5.1
9,Jarrod Bowen,West Ham,Newcastle Utd,MID,16,8.0


In [44]:
#Ideal team expected total points
predicted_points = str(sum(ideal_team_gw22_budget['predicted_total_points']))
print("Ideal team expected total points: " + predicted_points)

Ideal team expected total points: 139
