# Fantasy Football - Predicting Team Goals

In this notebook, we use the dataset from the previous notebook to create a BayesianRidge linear regression model. This model helps us predict the number of goals a team will score and concede in future matches.

In [1]:
import pandas as pd
import warnings
from functools import reduce
import itertools
import numpy as np
import sklearn.preprocessing as preprocessing
import sklearn.model_selection as model_selection
from sklearn import linear_model
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn import linear_model

warnings.filterwarnings("ignore")
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

Importing data.

In [2]:
team_fixture_data_with_team_elos = pd.read_csv('data/wrangled_data_final.csv')

In [3]:
team_fixture_data_with_team_elos.head()

Assessing the accuracy of various models by comparing their performance using 3, 5, and 10 expected and actual form features.

In [4]:
kind  = ['xg_3', 'xg_5', 'xg_10', 'actual_3', 'actual_5', 'actual_10']

for k in kind:
    x = team_fixture_data_with_team_elos[['fpl_game_week', 'team_elo', 'opponent_elo', f'team_goals_scored_{k}game_form',
                   f'opponent_goals_conceded_{k}game_form', 'elodiff', 'home']]

    x = x[x.columns[0:]] 
    ss = preprocessing.StandardScaler()
    x = pd.DataFrame(ss.fit_transform(x),columns = x.columns)

    y = team_fixture_data_with_team_elos.team_goals

    x_train, x_test, y_train, y_test = model_selection.train_test_split(
        x, y, train_size=0.75, test_size=0.25, random_state=1)

    reg_bay = linear_model.BayesianRidge()
    reg_bay.fit(x_train, y_train)

    cv_mae = cross_val_score(reg_bay, x_train, y_train, cv=10,scoring='neg_mean_absolute_error')
    cv_rmse = cross_val_score(reg_bay, x_train, y_train, cv=10,scoring='neg_root_mean_squared_error')
    cv_r2 = cross_val_score(reg_bay, x_train, y_train, cv=10,scoring='r2')
    print(f'{k}, MAE =',round(cv_mae.mean(),4))
    print(f'{k}, RSME =',round(cv_rmse.mean(),4))
    print(f'{k}, R2 =',round(cv_r2.mean(),4))
    print('----------------------')

After comparing different models and features, BayesianRidge linear regression using 5 game xg form with the features outlined below produces the best results.

In [5]:
x = team_fixture_data_with_team_elos[['fpl_game_week', 'team_elo', 'opponent_elo', f'team_goals_scored_xg_5game_form',
               f'opponent_goals_conceded_xg_5game_form', 'elodiff', 'home']]

x = x[x.columns[0:]] 
ss = preprocessing.StandardScaler()
x = pd.DataFrame(ss.fit_transform(x),columns = x.columns)

y = team_fixture_data_with_team_elos.team_goals

x_train, x_test, y_train, y_test = model_selection.train_test_split(
    x, y, train_size=0.75, test_size=0.25, random_state=1)

reg_bay = linear_model.BayesianRidge()
reg_bay.fit(x_train, y_train)

Importance of each feature used

In [6]:
print(x.columns)
importance = reg_bay.coef_
for i, v in enumerate(importance):
    print(f'Feature: %0d, Score: %.5f' % (i, v))  

Fitting model do data set resulting in predicted goals scored columns.

In [7]:
team_fixture_data_with_team_elos['pred_goals_scored'] = reg_bay.predict(x)

In [8]:
team_fixture_data_with_team_elos[['fpl_game_week', 'team', 'opponent', 'team_goals', 'pred_goals_scored']].head()

On average, our model exhibits a deviation of 0.8984 goals from the actual goals scored. Considering that we employ team goals as a metric for predicting player points, it is not necessary for this metric to be absolutely precise. Rather, it serves as a reliable indicator of team performance and the efficacy of their offensive and defensive units in any given gameweek.

Since football is a zero-sum game, 
the number of goals one team scores will be equal to 
the number of goals the opposing team concedes. 
Therefore, once we can predict the goals scored by 
each team, we can also predict the goals conceded.

In [9]:
match_team_1 = team_fixture_data_with_team_elos.query('home==1')[['game_id', 'pred_goals_scored', 'home', 'team']]
match_team_2 = team_fixture_data_with_team_elos.query('home==0')[['game_id', 'pred_goals_scored', 'home', 'team']]

In [10]:
combine = pd.merge(match_team_1,match_team_2, on='game_id')

In [11]:
combine.head()

In [12]:
team_fixture_data_with_team_elos = pd.merge(team_fixture_data_with_team_elos, combine[['game_id', 'pred_goals_scored_x', 'pred_goals_scored_y']], on='game_id')

In [13]:
team_fixture_data_with_team_elos[['fpl_game_week', 'team', 'opponent', 'team_goals', 'pred_goals_scored', 'pred_goals_scored_x', 'pred_goals_scored_y']].head()

Renaming columns and removing negative values for goals predicted.

In [14]:
team_fixture_data_with_team_elos = (team_fixture_data_with_team_elos
                                    .assign(pred_goals_conceded=team_fixture_data_with_team_elos['pred_goals_scored_x'])
                                    .assign(pred_goals_conceded=lambda x: x['pred_goals_conceded'].where(x['pred_goals_scored_x'] != x['pred_goals_scored'], x['pred_goals_scored_y']))
                                    .assign(pred_goals_conceded=lambda x: x['pred_goals_conceded'].clip(lower=0.119887576083739))
                                    .assign(pred_goals_scored=lambda x: x['pred_goals_scored'].clip(lower=0.119887576083739))
                                    .drop(columns=['pred_goals_scored_x', 'pred_goals_scored_y']))

In [15]:
team_fixture_data_with_team_elos[['fpl_game_week', 'team', 'opponent', 'team_goals', 'pred_goals_scored', 'pred_goals_conceded']].head()

We have successfully generated predictions for goals scored and conceded for every game using the available data. However, as we are required to forecast FPL teams well in advance of each game, rather than just before each game week, we need to predict goals scored and conceded for each specific game based on past game data from all preceding weeks. For example, in GW10, we will predict the goals scored and conceded from GW10 until the end of the season using game data from GW1 to GW9. 

In [16]:
final_team_data = team_fixture_data_with_team_elos[['season', 'fpl_game_week', 'date', 'team', 'opponent', 'game_id', 'team_elo', 'opponent_elo', 'elodiff', 'home', 'team_goals_scored_xg_5game_form', 'team_goals_conceded_xg_5game_form', 'opponent_goals_conceded_xg_5game_form', 'team_goals', 'opponent_goals']]

In [17]:
final_team_data.head()

In [18]:
game_weeks = [i for i in range(1,39)]

In [19]:
# loop over game weeks
for week in game_weeks:
    path = '2018-19'
    
    #get current form and future games we need to predict goals for based on current form
    current_form = final_team_data.query(f'fpl_game_week <= {week} & season == "2018-2019"')[['team', 'team_elo', 'team_goals_scored_xg_5game_form', 'team_goals_conceded_xg_5game_form']].drop_duplicates(subset='team', keep='last')
    future_games = final_team_data.query(f'fpl_game_week >= {week} & season == "2018-2019"')[['fpl_game_week', 'date', 'team', 'opponent', 'home']]
    
    # convert future and current form into data set that matches model format
    predict_future_goals = (current_form
                            .merge(future_games, on='team', how='outer')
                            .merge(current_form, left_on='opponent', right_on='team', how='outer')
                            .set_axis(['team', 'team_elo', 'team_goals_scored_xg_5game_form', 'team_goals_conceded_xg_5game_form', 
                                       'fpl_game_week', 'date', 'opponent', 'home', 'opponent_2', 'opponent_elo', 
                                       'opponent_goals_scored_xg_5game_form', 'opponent_goals_conceded_xg_5game_form'], axis='columns')
                            .drop(columns=['team_goals_conceded_xg_5game_form', 'opponent_2', 'opponent_goals_scored_xg_5game_form'])
                            .sort_values(by='fpl_game_week')
                            .reset_index(drop=True)
                            .assign(elodiff=lambda x: x['team_elo'] - x['opponent_elo'])
                            .dropna())
    
    # add predicted goals scored for all future matches based on current form
    x = predict_future_goals[['fpl_game_week', 'team_elo', 'opponent_elo', 'team_goals_scored_xg_5game_form',
                   'opponent_goals_conceded_xg_5game_form', 'elodiff', 'home']]
    
    x = x[x.columns[0:]] 
    ss = preprocessing.StandardScaler()
    x = pd.DataFrame(ss.fit_transform(x),columns = x.columns)

    predict_future_goals['pred_goals_scored'] = reg_bay.predict(x)
    
    # add goals conceded and adjust for when teams have multiple matches in a single game week
    add_goals_conceded = (predict_future_goals[['team', 'pred_goals_scored', 'fpl_game_week', 'date']]
                          .merge(predict_future_goals[['opponent', 'pred_goals_scored', 'fpl_game_week']], 
                                 left_on=['fpl_game_week', 'team'], 
                                 right_on=['fpl_game_week', 'opponent'])
                          .loc[:, ['date', 'fpl_game_week', 'team', 'pred_goals_scored_x', 'pred_goals_scored_y']]
                          .rename(columns={'pred_goals_scored_x': 'pred_goals_scored', 'pred_goals_scored_y': 'pred_goals_conceded'})
                          .groupby(['team', 'fpl_game_week'])
                          .agg(count=('team', 'count'), 
                               team=('team', 'first'), 
                               fpl_gw=('fpl_game_week', 'first'), 
                               date=('date', 'first'), 
                               scored=('pred_goals_scored', 'mean'), 
                               conceded=('pred_goals_conceded', 'mean'))
                          .assign(scored=lambda x: x['scored'].where(x['count'] != 4, x['scored'] * 2))
                          .assign(conceded=lambda x: x['conceded'].where(x['count'] != 4, x['conceded'] * 2))
                          .assign(count=lambda x: x['count'].where(x['count'] != 4, 2)))
    
    # export predictions
    add_goals_conceded.to_csv(f'predicting_weekly_team_goals/{path}/game_week_{week}.csv', index=False)

Having obtained the predicted goals scored and conceded for every game played by a team, we are now ready to progress to the next notebook, where we will focus on predicting specific FPL player points.