# FPL Gameweek Player Predictions

Managers in Fantasy Premier League (FPL) earn points from their players for a number of actions. These include goals, assists, clean sheets and saves. They can also earn additional bonus points if they among the top-performing players in the Bonus Points System (BPS) in any given match.

You can look at a detailed breakdown of the scoring system [here.](https://fantasy.premierleague.com/help/rules)

## FPL Points Prediction Model

In this notebook I created a model that predicts how many points a player will score for a specific gameweek based on data dating back to the 2017-18 PL Season. I set up a Machine Learning Linear Multiple Regression Model using the Scikit-Learn Python library. Later in the notebook I will specify the model's variables and other details. 

## Index
* [Data](#data)
* [Lags](#lags)
* [Training Data](#training-data)
* [Model](#model)
* [Predictions](#predictions)

In [350]:
#Import relevant libraries and packages
import pandas as pd
import numpy as np
import os
import sys
from pathlib import Path

## Data <a class="anchor" id="data"></a>

In [351]:
#Paths
path = Path('fpl_model/Data')
path_22_23 = Path('fpl_model/Data/2022-23')

#Import datasets
data = pd.read_csv(path/'training_data_updated.csv', 
                       index_col=0, 
                       dtype={'season':str,
                              'squad':str,
                              'comp':str})
season_gws = pd.read_csv(path/'remaining_season.csv', index_col=0)
player_stats = pd.read_csv(path_22_23/'gws/merged_gw.csv')

data = data.reset_index()
data = data.drop_duplicates()

The data has one row per player, per gameweek, for each player and gameweek since the 2016-2017 season. Each row contains information and statistics for each player and gameweek (I will explain some of them later). The dataframe's columns are:

In [352]:
#Data info
data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 140045 entries, 0 to 140044
Data columns (total 23 columns):
 #   Column           Non-Null Count   Dtype  
---  ------           --------------   -----  
 0   player           140045 non-null  object 
 1   position         140045 non-null  int64  
 2   gw               140045 non-null  int64  
 3   team             140045 non-null  object 
 4   opponent_team    140045 non-null  object 
 5   was_home         140045 non-null  bool   
 6   season           140045 non-null  object 
 7   minutes          140045 non-null  int64  
 8   total_points     140045 non-null  int64  
 9   assists          140045 non-null  int64  
 10  bonus            140045 non-null  int64  
 11  bps              140045 non-null  int64  
 12  clean_sheets     140045 non-null  int64  
 13  creativity       140045 non-null  float64
 14  goals_conceded   140045 non-null  int64  
 15  goals_scored     140045 non-null  int64  
 16  ict_index        140045 non-null  floa

I add a column to the data called fixture difficulty rating (FDR). The FDR creates a value that offers a perceived fixture difficulty for each team when facing another team. These values are then simplified into ratings from 1 to 5, with 5 being the highest difficulty value.

FPL develops FDR based on a complex algorithm that analyses the performance statistics for each team across their home and away matches. It then combines this data with each team's home and away form over the past six fixtures. In FPL, FDR can change from week to week. In other words, Team X might have an FDR of 3 this gameweek, and an FDR of 4 next gameweek (due to multiple factors).

For the sake of simplicity and to maintain consistency across the time-series data, I will assign a constant FDR for each team based on historical PL standings and historical FDRs (dating back to the 2017-18 season). These are my FDR assignments and the logic behind them:

**FDR = 5 &rarr; Manchester City and Liverpool**
- Since the 2017-18 season, Manchester City and Liverpool are the highest-achieving and most consistent teams. They are arguably the most difficult teams to play against and have ended all these seasons in the top-4.

**FDR = 4 &rarr; Arsenal, Chelsea, Manchester United, Tottenham Hotspur**
- These four teams, along with Man City and Liverpool are considered the PL's "big six", and thus the most difficult teams to play against. I didn't assign these teams an FDR of 5 because they haven't been as consistent as Man City and Liverpool and haven't made as many points as them since the 2017-18 season.

**FDR = 3 &rarr; Brighton, Crystal Palace, Everton, Leicester City, Newcaste United, West Ham United, Wolves**
- These teams are considered "mid-table teams". Although consistency and regularity across these teams varies, and some of them are arguably more difficult to play against than others, it makes sense to group them under the same FDR rating due to their historic standings (and similar consistency) since the 2017-18 season. 

**FDR = 2 &rarr; Aston Villa, Brentford, Burnley, Leeds, Norwich, Southampton, Watford, Hull City, Middlesbrough, Bournemouth, Sunderland, Swansea, West Brom, Stoke City, Huddersfield, Fulham, Cardiff City, Sheffield United, Nottingham Forest**
- All of these teams (with the exception of Southampton) have been relegated at least once in the past 5 seasons, and during their time in the Premier League they have struggled to make it out of the relegation zone or past the 10th standing. The reason I grouped Southampton with the rest of the teams here is because it is the only team that despite not having been relegated, hasn't finished a season above the 11th position (since the 2017-18 season). 

**FDR = 1 &rarr; NONE**
- I didn't assign a score of 1 to any of the teams because FPL rarely gives an FDR of 1 to any fixture.

In [353]:
#Function to add fdr (fixture difficulty rating) to dataframe
def fdr_assignment(data):
    if data['opponent_team'] == 'Arsenal':
        return 4
    if data['opponent_team'] == 'Aston Villa':
        return 2
    if data['opponent_team'] == 'Brentford':
        return 2
    if data['opponent_team'] == 'Brighton and Hove Albion':
        return 3
    if data['opponent_team'] == 'Burnley':
        return 2
    if data['opponent_team'] == 'Chelsea':
        return 4
    if data['opponent_team'] == 'Crystal Palace':
        return 3
    if data['opponent_team'] == 'Everton':
        return 3
    if data['opponent_team'] == 'Leeds':
        return 2
    if data['opponent_team'] == 'Leicester City':
        return 3
    if data['opponent_team'] == 'Liverpool':
        return 5
    if data['opponent_team'] == 'Manchester City':
        return 5
    if data['opponent_team'] == 'Manchester United':
        return 4
    if data['opponent_team'] == 'Newcastle United':
        return 3
    if data['opponent_team'] == 'Norwich':
        return 2
    if data['opponent_team'] == 'Southampton':
        return 2
    if data['opponent_team'] == 'Tottenham Hotspur':
        return 4
    if data['opponent_team'] == 'Watford':
        return 2
    if data['opponent_team'] == 'West Ham United':
        return 3
    if data['opponent_team'] == 'Wolverhampton Wanderers':
        return 3
    if data['opponent_team'] == 'Hull City':
        return 2
    if data['opponent_team'] == 'Middlesbrough':
        return 2
    if data['opponent_team'] == 'Bournemouth':
        return 2
    if data['opponent_team'] == 'Sunderland':
        return 2
    if data['opponent_team'] == 'Swansea City':
        return 2
    if data['opponent_team'] == 'West Bromwich Albion':
        return 2
    if data['opponent_team'] == 'Stoke City':
        return 2
    if data['opponent_team'] == 'Huddersfield Town':
        return 2
    if data['opponent_team'] == 'Fulham':
        return 2
    if data['opponent_team'] == 'Cardiff City':
        return 2
    if data['opponent_team'] == 'Sheffield United':
        return 2
    if data['opponent_team'] == 'Nottingham Forest':
        return 2
    
data['fdr'] = data.apply(fdr_assignment, axis = 1)

## Lags <a class="anchor" id="lags"></a>

Since we are dealing with a time series, we create a function to keep track of lags (a fixed amount of passing time). This will be helpful for modeling.

In [354]:
#Lagged stats for players
def player_lag_stats(df, stats, lags):    
    player_lag = []
    updated_df = df.copy()
    stats.insert(0, 'minutes')
    for stat in stats:
        for lag in lags:
            stat_name = stat + '_last_' + str(lag)
            minute_game = 'minutes_last_' + str(lag)
            if lag == 'all':
                updated_df[stat_name] = updated_df.groupby(['player'])[stat].apply(lambda x: x.cumsum() - x)
            else: 
                updated_df[stat_name] = updated_df.groupby(['player'])[stat].apply(lambda x: x.rolling(min_periods=1, 
                                                                                            window=lag+1).sum() - x)
            if stat != 'minutes':
                pg_stat_name = stat + '_pg_last_' + str(lag)
                player_lag.append(pg_stat_name)
                updated_df[pg_stat_name] = 90 * updated_df[stat_name] / updated_df[minute_game]
                #Adjusting for negative values and 0 minutes played
                updated_df[pg_stat_name] = updated_df[pg_stat_name].replace([np.inf, -np.inf], np.nan)
            else: player_lag.append(minute_game)
                
    return updated_df, player_lag

In [355]:
#Lagged stats for teams
def team_lag_stats(df, stats, lags):
    team_lag = []
    updated_new = df.copy()
    for stat in stats:
        stat_name_team = stat + '_team'
        stat_conceded_team = stat_name_team + '_conceded'
        stat_team = (df.groupby(['team', 'season', 'gw',
                                   'opponent_team'])
                        [stat].sum().rename(stat_name_team).reset_index())
        stat_team = stat_team.merge(stat_team,
                           left_on=['team', 'season', 'gw',
                                    'opponent_team'],
                           right_on=['opponent_team', 'season', 'gw',
                                     'team'],
                           how='left',
                           suffixes = ('', '_conceded'))
        stat_team.drop(['team_conceded', 'opponent_team_conceded'], axis=1, inplace=True)
        for lag in lags:
            stat_name = stat + '_team_last_' + str(lag)
            stat_conceded_name = stat + '_team_conceded_last_' + str(lag)
            pg_stat_name = stat + '_team_pg_last_' + str(lag)
            pg_stat_conceded_name = stat + '_team_conceded_pg_last_' + str(lag)
            team_lag.extend([pg_stat_name])
            if lag == 'all':
                stat_team[stat_name] = (stat_team.groupby('team')[stat_name_team]
                                              .apply(lambda x: x.cumsum() - x))
                
                stat_team[stat_conceded_name] = (stat_team.groupby('team')[stat_conceded_team]
                                              .apply(lambda x: x.cumsum() - x))
                stat_team[pg_stat_name] = (stat_team[stat_name]
                                                 / stat_team.groupby('team').cumcount())
                stat_team[pg_stat_conceded_name] = (stat_team[stat_conceded_name]
                                                 / stat_team.groupby('team').cumcount())
            else:
                stat_team[stat_name] = (stat_team.groupby('team')[stat_name_team]
                                              .apply(lambda x: x.rolling(min_periods=1, 
                                                                         window=lag + 1).sum() - x))
                stat_team[stat_conceded_name] = (stat_team.groupby('team')[stat_conceded_team]
                                              .apply(lambda x: x.rolling(min_periods=1, 
                                                                         window=lag + 1).sum() - x))
                stat_team[pg_stat_name] = (stat_team[stat_name] / 
                                                 stat_team.groupby('team')[stat_name_team]
                                                 .apply(lambda x: x.rolling(min_periods=1, 
                                                                            window=lag + 1).count() - 1))
                stat_team[pg_stat_conceded_name] = (stat_team[stat_conceded_name] / 
                                                    stat_team.groupby('team')[stat_conceded_name]
                                                 .apply(lambda x: x.rolling(min_periods=1, 
                                                                            window=lag + 1).count() - 1))
        updated_new = updated_new.merge(stat_team, 
                          on=['team', 'season', 'gw', 'opponent_team'], 
                          how='left')
        updated_new = updated_new.merge(stat_team,
                 left_on=['team', 'season', 'gw', 'opponent_team'],
                 right_on=['opponent_team', 'season', 'gw', 'team'],
                 how='left',
                 suffixes = ('', '_opponent'))
        updated_new.drop(['team_opponent', 'opponent_team_opponent'], axis=1, inplace=True)
        
    team_lag = team_lag + [team_lag + '_opponent' for team_lag in team_lag]  

    return updated_new, team_lag

In [356]:
#Create training data by adding lag features, dropping irrelevant columns, and adjusting data types

#We drop the '1617' season because of lacking data, and drop duplicates (just in case)
training_data = data[data['season'] != '1617']
training_data = training_data.drop_duplicates()

#Total points
training_data, teams_lag = team_lag_stats(training_data, ['total_points'], ['all', 1, 2, 3, 4, 5, 10])
training_data, players_lag = player_lag_stats(training_data, ['total_points'], ['all', 1, 2, 3, 4, 5, 10])

#Minutes
training_data, players_lag = player_lag_stats(training_data, ['minutes'], ['all', 1, 2, 3, 4, 5, 10])

#Assists
training_data, players_lag = player_lag_stats(training_data, ['assists'], ['all', 1, 2, 3, 4, 5, 10])

#Bonus
training_data, players_lag = player_lag_stats(training_data, ['bonus'], ['all', 1, 2, 3, 4, 5])

#Clean sheets
training_data, players_lag = player_lag_stats(training_data, ['clean_sheets'], ['all', 1, 2, 3, 4, 5, 10])

#Goals conceded
training_data, players_lag = player_lag_stats(training_data, ['goals_conceded'], ['all', 1, 2, 3, 4, 5, 10])

#Goals scored
training_data, players_lag = player_lag_stats(training_data, ['goals_scored'], ['all', 1, 2, 3, 4, 5, 10])

#Penalties Saved
training_data, players_lag = player_lag_stats(training_data, ['penalties_saved'], ['all', 1, 2, 3, 4, 5])

#Red Cards
training_data, players_lag = player_lag_stats(training_data, ['red_cards'], ['all', 1, 2, 3, 4, 5])

#Saves
training_data, players_lag = player_lag_stats(training_data, ['saves'], ['all', 1, 2, 3, 4, 5, 10])

#Yellow Cards
training_data, players_lag = player_lag_stats(training_data, ['yellow_cards'], ['all', 1, 2, 3, 4, 5, 10])

drop_columns = ['gw', 'player', 'minutes', 'position', 'team', 'opponent_team',
                'assists', 'bonus', 'bps', 'clean_sheets','goals_conceded', 
                'goals_scored', 'penalties_saved', 'red_cards', 'saves',
                'yellow_cards', 'season']

training_data = training_data.drop(drop_columns,axis = 1)
training_data = training_data.fillna(0)
training_data['was_home'] = training_data["was_home"].astype(int)
training_data = training_data.round(2)
training_data

Unnamed: 0,was_home,total_points,creativity,ict_index,influence,threat,fdr,total_points_team,total_points_team_conceded,total_points_team_last_all,...,yellow_cards_last_2,yellow_cards_pg_last_2,yellow_cards_last_3,yellow_cards_pg_last_3,yellow_cards_last_4,yellow_cards_pg_last_4,yellow_cards_last_5,yellow_cards_pg_last_5,yellow_cards_last_10,yellow_cards_pg_last_10
0,0,0,0.6,1.9,0.4,18.0,4,11,83.0,0,...,0.0,0.0,0.0,0.00,0.0,0.00,0.0,0.0,0.0,0.00
1,1,0,0.0,0.0,0.0,0.0,2,53,24.0,0,...,0.0,0.0,0.0,0.00,0.0,0.00,0.0,0.0,0.0,0.00
2,0,6,46.9,8.7,40.2,0.0,3,66,15.0,0,...,0.0,0.0,0.0,0.00,0.0,0.00,0.0,0.0,0.0,0.00
3,1,6,11.2,6.7,29.6,26.0,3,52,40.0,0,...,0.0,0.0,0.0,0.00,0.0,0.00,0.0,0.0,0.0,0.00
4,1,9,25.2,10.9,48.6,35.0,5,43,44.0,0,...,0.0,0.0,0.0,0.00,0.0,0.00,0.0,0.0,0.0,0.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
116361,0,0,0.0,0.0,0.0,0.0,3,69,19.0,10933,...,0.0,0.0,0.0,0.00,0.0,0.00,0.0,0.0,0.0,0.00
116362,0,0,0.0,0.0,0.0,0.0,3,69,19.0,10933,...,0.0,0.0,0.0,0.00,0.0,0.00,0.0,0.0,0.0,0.00
116363,0,0,0.0,0.0,0.0,0.0,3,69,19.0,10933,...,0.0,0.0,0.0,0.00,0.0,0.00,0.0,0.0,0.0,0.00
116364,0,1,0.5,0.1,0.6,0.0,3,69,19.0,10933,...,1.0,0.5,1.0,0.33,1.0,0.25,1.0,0.2,1.0,0.18


In [357]:
#Linear Prediction Model
#X and Y variables
x = training_data.drop('total_points', axis=1)
y = training_data['total_points'] 

#Split up data into train and test sets, fit model, and make predictions
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state = 42)
linear_reg = LinearRegression()
linear_reg.fit(x_train,y_train)
y_prediction = linear_reg.predict(x_test)
y_prediction

array([ 0.51841468, -0.37384089, -0.21849392, ...,  1.55466463,
        0.59621953,  1.50255786])

In [358]:
#Calculating r2_score, mse, and rmse
from sklearn.metrics import r2_score
from sklearn.metrics import mean_squared_error
score = r2_score(y_test,y_prediction)
print('r2 score is ',score)
print('MSE is ',mean_squared_error(y_test,y_prediction))
print('RMSE is ',np.sqrt(mean_squared_error(y_test,y_prediction)))

r2 score is  0.7396231936393034
MSE is  1.6170059914758566
RMSE is  1.2716155045751276


In [359]:
#Make predictions for all data points
all_data_predictions = linear_reg.predict(x)

#Add predictions to data
data = data[data['season'] != '1617']
data["predicted_total_points"] = all_data_predictions

#Predictions Dataframe
predictions_vs_actual = data[['player', 'gw', 'team', 'opponent_team', 'season', 'total_points', 'predicted_total_points']]
predictions_vs_actual = predictions_vs_actual.reset_index()
predictions_vs_actual = predictions_vs_actual.drop('index', axis=1)
predictions_vs_actual['predicted_total_points'] = predictions_vs_actual['predicted_total_points'].round(0).astype(int)
predictions_vs_actual

Unnamed: 0,player,gw,team,opponent_team,season,total_points,predicted_total_points
0,Aaron Cresswell,1,West Ham United,Manchester United,1718,0,0
1,Aaron Lennon,1,Everton,Stoke City,1718,0,1
2,Aaron Mooy,1,Huddersfield Town,Crystal Palace,1718,6,7
3,Aaron Ramsey,1,Arsenal,Leicester City,1718,6,5
4,Abdoulaye Doucouré,1,Watford,Liverpool,1718,9,8
...,...,...,...,...,...,...,...
116361,Josh Wilson-Esbrand,1,Manchester City,West Ham United,2223,0,0
116362,Liam Delap,1,Manchester City,West Ham United,2223,0,0
116363,Stefan Ortega Moreno,1,Manchester City,West Ham United,2223,0,0
116364,Kalvin Phillips,1,Manchester City,West Ham United,2223,1,1


In [360]:
#Mohamed Salah 21-22 predictions
predictions_vs_actual[(predictions_vs_actual['player'] == 'Mohamed Salah') & (predictions_vs_actual['season'] == '2122')]

Unnamed: 0,player,gw,team,opponent_team,season,total_points,predicted_total_points
91511,Mohamed Salah,1,Liverpool,Norwich,2122,17,14
92070,Mohamed Salah,2,Liverpool,Burnley,2122,3,5
92645,Mohamed Salah,3,Liverpool,Chelsea,2122,10,9
93235,Mohamed Salah,4,Liverpool,Leeds,2122,8,11
93838,Mohamed Salah,5,Liverpool,Crystal Palace,2122,12,12
94449,Mohamed Salah,6,Liverpool,Brentford,2122,7,8
95062,Mohamed Salah,7,Liverpool,Manchester City,2122,13,13
95678,Mohamed Salah,8,Liverpool,Watford,2122,13,14
96295,Mohamed Salah,9,Liverpool,Manchester United,2122,24,22
96916,Mohamed Salah,10,Liverpool,Brighton and Hove Albion,2122,5,7


# CHANGE GAMEWEEK HERE

In [363]:
#Dataframe with upcoming gameweek information
gameweek = 2
next_gw = season_gws[season_gws['gw'] == gameweek]
next_gw = next_gw[['player', 'position', 'gw', 'team', 'opponent_team', 'was_home', 'season']]

#Dataframe with player's creativity, ict_index, influence, and threat, from last gameweek
player_stats_last_gw = player_stats[player_stats['GW'] == gameweek - 1]
relevant_columns = ['name','creativity', 'ict_index','influence', 'threat']
player_stats_last_gw = player_stats_last_gw[relevant_columns]
player_stats_last_gw = player_stats_last_gw.rename(columns={'name': 'player'})
next_gw = next_gw.merge(player_stats_last_gw, on = 'player')

#Adding relevant columns with value = 0
next_gw[['minutes', 'total_points', 'assists', 'bonus', 'bps',
       'clean_sheets', 'goals_conceded', 'goals_scored',
       'penalties_saved', 'red_cards', 'saves', 'yellow_cards']] = 0
next_gw['fdr'] = upcoming_gw.apply(fdr_assignment, axis = 1)

#Ordering columns
next_gw = next_gw[['player', 'position', 'gw', 'team', 'opponent_team', 'was_home', 'total_points',
                         'creativity','ict_index','influence','threat','fdr','season', 'minutes', 'assists', 
                         'bonus', 'bps', 'clean_sheets', 'goals_conceded', 'goals_scored', 'penalties_saved', 
                         'red_cards', 'saves', 'yellow_cards']]
next_gw['season'] = next_gw['season'].apply(str)


#Adjusting original data to concatenate with upcoming gameweek dataframe
data_adjusted = data[['player', 'position', 'gw', 'team', 'opponent_team', 'was_home', 'total_points',
                         'creativity','ict_index','influence','threat','fdr','season', 'minutes', 'assists', 
                         'bonus', 'bps', 'clean_sheets', 'goals_conceded', 'goals_scored', 'penalties_saved', 
                         'red_cards', 'saves', 'yellow_cards']]

data_adjusted = pd.concat([data_adjusted, next_gw])
data_adjusted = data_adjusted.drop_duplicates().reset_index()
data_adjusted = data_adjusted.drop('index', axis=1)

#Total points
data_adjusted, teams_lag = team_lag_stats(data_adjusted, ['total_points'], ['all', 1, 2, 3, 4, 5, 10])
data_adjusted, players_lag = player_lag_stats(data_adjusted, ['total_points'], ['all', 1, 2, 3, 4, 5, 10])

#Minutes
data_adjusted, players_lag = player_lag_stats(data_adjusted, ['minutes'], ['all', 1, 2, 3, 4, 5, 10])

#Assists
data_adjusted, players_lag = player_lag_stats(data_adjusted, ['assists'], ['all', 1, 2, 3, 4, 5, 10])

#Bonus
data_adjusted, players_lag = player_lag_stats(data_adjusted, ['bonus'], ['all', 1, 2, 3, 4, 5])

#Clean sheets
data_adjusted, players_lag = player_lag_stats(data_adjusted, ['clean_sheets'], ['all', 1, 2, 3, 4, 5, 10])

#Goals conceded
data_adjusted, players_lag = player_lag_stats(data_adjusted, ['goals_conceded'], ['all', 1, 2, 3, 4, 5, 10])

#Goals scored
data_adjusted, players_lag = player_lag_stats(data_adjusted, ['goals_scored'], ['all', 1, 2, 3, 4, 5, 10])

#Penalties Saved
data_adjusted, players_lag = player_lag_stats(data_adjusted, ['penalties_saved'], ['all', 1, 2, 3, 4, 5])

#Red Cards
data_adjusted, players_lag = player_lag_stats(data_adjusted, ['red_cards'], ['all', 1, 2, 3, 4, 5])

#Saves
data_adjusted, players_lag = player_lag_stats(data_adjusted, ['saves'], ['all', 1, 2, 3, 4, 5, 10])

#Yellow Cards
data_adjusted, players_lag = player_lag_stats(data_adjusted, ['yellow_cards'], ['all', 1, 2, 3, 4, 5, 10])

data_adjusted = data_adjusted.fillna(0)

next_gw_stats = data_adjusted.loc[(data_adjusted['gw'] == 2)]
next_gw_stats = next_gw_stats.loc[(next_gw_stats['season'] == '2223')]

drop_columns = ['gw', 'minutes', 'player', 'position', 'team', 'opponent_team',
                'assists', 'bonus', 'bps', 'clean_sheets','goals_conceded', 
                'goals_scored', 'penalties_saved', 'red_cards', 'saves',
                'yellow_cards', 'season']

next_gw_stats = next_gw_stats.drop(drop_columns,axis = 1)
next_gw_stats['was_home'] = next_gw_stats["was_home"].astype(int)
next_gw_stats = next_gw_stats.round(2)
next_gw_stats 

Unnamed: 0,was_home,total_points,creativity,ict_index,influence,threat,fdr,total_points_team,total_points_team_conceded,total_points_team_last_all,...,yellow_cards_last_2,yellow_cards_pg_last_2,yellow_cards_last_3,yellow_cards_pg_last_3,yellow_cards_last_4,yellow_cards_pg_last_4,yellow_cards_last_5,yellow_cards_pg_last_5,yellow_cards_last_10,yellow_cards_pg_last_10
116366,1,0,4.7,1.6,3.2,8.0,3,0,0.0,10474,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00
116367,1,0,29.8,5.1,8.2,13.0,3,0,0.0,10474,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00
116368,1,0,1.8,3.2,30.4,0.0,3,0,0.0,10474,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00
116369,1,0,16.8,2.3,6.4,0.0,3,0,0.0,10474,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.15
116370,1,0,0.0,0.0,0.0,0.0,3,0,0.0,10474,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
116898,0,0,11.8,4.1,12.2,17.0,2,0,0.0,2877,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00
116899,0,0,2.0,2.2,3.6,16.0,2,0,0.0,2877,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00
116900,0,0,0.0,0.0,0.0,0.0,2,0,0.0,2877,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00
116901,0,0,12.0,2.4,11.0,1.0,2,0,0.0,2877,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.00


In [364]:
x_2 = next_gw_stats.drop('total_points', axis=1)
predictions_next_gw = linear_reg.predict(x_2)

In [365]:
data_upcoming_gw['predicted_total_points'] = predictions_next_gw
predictions = data_upcoming_gw[['player', 'gw', 'team', 'opponent_team', 'season', 'predicted_total_points']]
predictions = predictions.reset_index()
predictions = predictions.drop('index', axis=1)
predictions['predicted_total_points'] = predictions['predicted_total_points'].round(0).astype(int)
predictions['predicted_total_points'] = predictions['predicted_total_points'].where(predictions['predicted_total_points'] > 0, other=0)
predictions

Unnamed: 0,player,gw,team,opponent_team,season,predicted_total_points
0,James Milner,2,Liverpool,Crystal Palace,2223,0
1,Jordan Henderson,2,Liverpool,Crystal Palace,2223,0
2,Joel Matip,2,Liverpool,Crystal Palace,2223,4
3,Thiago Alcántara do Nascimento,2,Liverpool,Crystal Palace,2223,0
4,Alex Oxlade-Chamberlain,2,Liverpool,Crystal Palace,2223,0
...,...,...,...,...,...,...
532,Marc Roca Junqué,2,Leeds,Southampton,2223,1
533,Brenden Aaronson,2,Leeds,Southampton,2223,0
534,Darko Gyabi,2,Leeds,Southampton,2223,0
535,Tyler Adams,2,Leeds,Southampton,2223,0


In [367]:
predictions.sort_values('predicted_total_points', ascending=False).head(25)

Unnamed: 0,player,gw,team,opponent_team,season,predicted_total_points
390,Aleksandar Mitrović,2,Fulham,Wolverhampton Wanderers,2223,12
142,Pascal Groß,2,Brighton and Hove Albion,Newcastle United,2223,11
323,Dejan Kulusevski,2,Tottenham Hotspur,Chelsea,2223,10
22,Darwin Núñez Ribeiro,2,Liverpool,Crystal Palace,2223,9
421,Fabian Schär,2,Newcastle United,Brighton and Hove Albion,2223,9
255,James Ward-Prowse,2,Southampton,Leeds,2223,7
449,Lloyd Kelly,2,Bournemouth,Manchester City,2223,6
80,Dean Henderson,2,Nottingham Forest,West Ham United,2223,6
466,Timothy Castagne,2,Leicester City,Arsenal,2223,6
125,Daniel Castelo Podence,2,Wolverhampton Wanderers,Fulham,2223,6
