# Fantasy Football - Predicting Clean Sheets

In this notebook, we use the goal predicting model developed in the previous notebook to estimate the probability of a team achieving a clean sheet (clean sheet is when a team concededs 0 goals in a game).

In [1]:
import pandas as pd
import warnings
from functools import reduce
import itertools
import numpy as np
import sklearn.preprocessing as preprocessing
import sklearn.model_selection as model_selection
from sklearn import linear_model
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn import linear_model
from joblib import dump, load

In [2]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
warnings.filterwarnings("ignore")

Player points are determined using the predicted goals and conceded metrics. To begin, we calculate the percentage chance of a team keeping a clean sheet. In cases where a player did not participate in a game, we can retain the inaccurate clean sheet data, as we will assign zero points to those players regardless.

In [3]:
predict_goals = pd.read_csv('data/predicting_team_goals.csv')

In [4]:
predict_goals.head()

Unnamed: 0,season,fpl_game_week,team,opponent,team_goals,opponent_goals,pred_goals_scored,pred_goals_conceded
0,2017-2018,6,Watford,Swansea City,2.0,1.0,1.213222,1.545529
1,2017-2018,6,Swansea City,Watford,1.0,2.0,1.545529,1.213222
2,2017-2018,6,Burnley,Huddersfield,0.0,0.0,1.53414,0.882562
3,2017-2018,6,Huddersfield,Burnley,0.0,0.0,0.882562,1.53414
4,2017-2018,6,Crystal Palace,Manchester City,0.0,5.0,0.389978,2.447532


Adding clean sheet column.

In [5]:
predict_goals['clean_sheet'] = (predict_goals.opponent_goals == 0).astype(int)

In [6]:
predict_goals.head()

Unnamed: 0,season,fpl_game_week,team,opponent,team_goals,opponent_goals,pred_goals_scored,pred_goals_conceded,clean_sheet
0,2017-2018,6,Watford,Swansea City,2.0,1.0,1.213222,1.545529,0
1,2017-2018,6,Swansea City,Watford,1.0,2.0,1.545529,1.213222,0
2,2017-2018,6,Burnley,Huddersfield,0.0,0.0,1.53414,0.882562,1
3,2017-2018,6,Huddersfield,Burnley,0.0,0.0,0.882562,1.53414,1
4,2017-2018,6,Crystal Palace,Manchester City,0.0,5.0,0.389978,2.447532,0


Predicting clean sheet probability model.

In [7]:
x = predict_goals[['pred_goals_scored', 'pred_goals_conceded']]

In [8]:
x.columns

Index(['pred_goals_scored', 'pred_goals_conceded'], dtype='object')

In [9]:
x = x[x.columns[0:]] 
ss = preprocessing.StandardScaler()
x = pd.DataFrame(ss.fit_transform(x),columns = x.columns)

In [10]:
y = predict_goals.clean_sheet

In [11]:
x_train, x_test, y_train, y_test = model_selection.train_test_split(
    x, y, train_size=0.75, test_size=0.25, random_state=1)

In [12]:
reg_bay = linear_model.BayesianRidge()
reg_bay.fit(x_train, y_train)

In [13]:
print("Bayesian Ridge Regressor results:")
cv_mae = cross_val_score(reg_bay, x_train, y_train, cv=10,scoring='neg_mean_absolute_error')
cv_rmse = cross_val_score(reg_bay, x_train, y_train, cv=10,scoring='neg_root_mean_squared_error')
cv_r2 = cross_val_score(reg_bay, x_train, y_train, cv=10,scoring='r2')
print('MAE =',round(cv_mae.mean(),4))
print('RSME =',round(cv_rmse.mean(),4))
print('R2 =',round(cv_r2.mean(),4))

Bayesian Ridge Regressor results:
MAE = -0.3667
RSME = -0.4271
R2 = 0.0957


In [14]:
predict_goals['cleen_sheet_prob'] = reg_bay.predict(x)

Exporting model for later use.

In [15]:
dump(reg_bay, 'predict_clean_sheet_model.joblib') 

['predict_clean_sheet_model.joblib']

Removing all negative values and replacing season values to match main df.

In [16]:
predict_goals.loc[predict_goals.cleen_sheet_prob<.01, 'cleen_sheet_prob'] = 0.01

In [17]:
predict_goals['season'] = predict_goals.season.str.replace('2017-2018', '2017-18').str.replace('2018-2019', '2018-19')

In [18]:
predict_goals.head()

Unnamed: 0,season,fpl_game_week,team,opponent,team_goals,opponent_goals,pred_goals_scored,pred_goals_conceded,clean_sheet,cleen_sheet_prob
0,2017-18,6,Watford,Swansea City,2.0,1.0,1.213222,1.545529,0,0.237033
1,2017-18,6,Swansea City,Watford,1.0,2.0,1.545529,1.213222,0,0.324177
2,2017-18,6,Burnley,Huddersfield,0.0,0.0,1.53414,0.882562,1,0.377977
3,2017-18,6,Huddersfield,Burnley,0.0,0.0,0.882562,1.53414,1,0.207109
4,2017-18,6,Crystal Palace,Manchester City,0.0,5.0,0.389978,2.447532,0,0.01


Export for later use.

In [28]:
predict_goals.to_csv('predict_goals_cs.csv', index=False)

In [19]:
game_weeks = [i for i in range(1,39)]

In [20]:
opponent_goals = predict_goals.query('season == "2018-19"')[['fpl_game_week', 'team', 'opponent_goals']]

In [21]:
def get_data():
    df = pd.read_csv(fr"data\predicting_weekly_team_goals\2018-19\game_week_{week}.csv")
    return pd.merge(df, opponent_goals, right_on=['team', 'fpl_game_week'], left_on=['team', 'fpl_gw'])

In [22]:
def set_cs(df):
    # if teams play 2 games in a gameweek devide by 2 to get individual game clean sheet prob
    df.loc[df['count']==2, 'scored'] = df.scored / 2
    df.loc[df['count']==2, 'conceded'] = df.scored / 2
    predict_goals['clean_sheet'] = (predict_goals.opponent_goals == 0).astype(int)
    return df

In [23]:
def predict(df):
    x = df[['scored', 'conceded']]
    x.columns = ['pred_goals_scored', 'pred_goals_conceded']
    x = x[x.columns[0:]] 
    ss = preprocessing.StandardScaler()
    x = pd.DataFrame(ss.fit_transform(x),columns = x.columns)
    
    df['cleen_sheet_prob'] = reg_bay.predict(x)
    df.loc[df.cleen_sheet_prob<.01, 'cleen_sheet_prob'] = 0.01
    df.loc[df['count']==2, 'cleen_sheet_prob'] = df.cleen_sheet_prob * 2
    df.loc[df['count']==2, 'scored'] = df.scored * 2
    df.loc[df['count']==2, 'conceded'] = df.conceded * 2
    return df

In [24]:
def export(df):
    cols = ['count', 'fpl_game_week', 'team', 'scored','conceded', 'cleen_sheet_prob', 'date']
    df[cols].drop_duplicates().to_csv(f'predicting_weekly_team_clean_sheet/2018-19/game_week_clean_sheet_{week}.csv', index=False)

In [25]:
def main(week):
    return (get_data()
            .pipe(set_cs)
            .pipe(predict)
            .pipe(export))

In [26]:
for week in game_weeks:
    main(week)

Now that we have successfully predicted goals, conceded goals, and clean sheet probabilities, we can proceed to the next notebook, where we will focus on predicting individual player points.