# Offensive Contributions

## Introduction

Football, compared to many other sports, doesn't have much scoring. This can lead to traditional measures of offensive play, such as goals and assists, not fully representating an individual's contribution. Here, we will look to measure the offensive contribution of Premier League players to their team's scoring. The top goal scorers in the league will likely still have high measures of offensive contribution but this analysis aims to highlight the influence of other players in the team.

## Analysis

We shall we use multiple linear regression to establish the relationship between goals as the dependent variable and shots and passes as the independent variables. Using the resultant equation at player level, we can determine a player's contribution to his team's scoring. We will only use data from the Premier League to maintain consistency as there may be variance between the leagues.

NOTE: The analysis will be expanded when the data is collated for dribbles completed and this will be added to the model as another independent variable. More detailed variables should also be included and ruled out if the coefficients are not statistically significant from zero.

In [1]:
import pandas as pd
import numpy as np
import statsmodels.api as sm

#EXTRACT DATA
player_goals = pd.read_excel('Offensive_Contributions_Data.xlsx', sheet_name = 'Player Goals')
player_shots = pd.read_excel('Offensive_Contributions_Data.xlsx', sheet_name = 'Player Shots')
player_passes = pd.read_excel('Offensive_Contributions_Data.xlsx', sheet_name = 'Player Passes')
team_goals = pd.read_excel('Offensive_Contributions_Data.xlsx', sheet_name = 'Team Goals')
team_shots = pd.read_excel('Offensive_Contributions_Data.xlsx', sheet_name = 'Team Shots')
team_passes = pd.read_excel('Offensive_Contributions_Data.xlsx', sheet_name = 'Team Passes')

#MERGE DATA
#Player data
player_join = pd.merge(player_goals, player_shots, how = 'outer', on = ['Player', 'Club', 'Nationality'])
player_summary = pd.merge(player_join, player_passes, how = 'outer', on = ['Player', 'Club', 'Nationality']) 
player_summary.columns = ['Goals_rank', 'Player', 'Club', 'Nationality', 'Goals', 'Shots_rank', 'Shots', 'Pass_rank', 'Passes']

#Team data
team_join = pd.merge(team_goals, team_shots, how = 'outer', on = ['Club', 'Season'])
team_summary = pd.merge(team_join, team_passes, how = 'outer', on = ['Club', 'Season']) 
team_summary.columns = ['Goals_rank', 'Club', 'Goals', 'Season', 'Shots_rank', 'Shots', 'Pass_rank', 'Passes']

#CLEAN DATA
player_summary = player_summary.fillna(0)

In [2]:
#REGRESSION ANALYSIS
y = team_summary['Goals']              #Dependent variable
X = team_summary[['Shots', 'Passes']]  #Independent variables
X = sm.add_constant(X)                 #Add a constant
model = sm.OLS(y, X).fit()
model.summary()

0,1,2,3
Dep. Variable:,Goals,R-squared:,0.795
Model:,OLS,Adj. R-squared:,0.79
Method:,Least Squares,F-statistic:,149.5
Date:,"Fri, 12 Oct 2018",Prob (F-statistic):,3.0400000000000004e-27
Time:,12:39:26,Log-Likelihood:,-275.67
No. Observations:,80,AIC:,557.3
Df Residuals:,77,BIC:,564.5
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,-32.3282,4.899,-6.599,0.000,-42.083,-22.574
Shots,0.1203,0.016,7.696,0.000,0.089,0.151
Passes,0.0015,0.000,3.588,0.001,0.001,0.002

0,1,2,3
Omnibus:,0.27,Durbin-Watson:,1.97
Prob(Omnibus):,0.874,Jarque-Bera (JB):,0.103
Skew:,0.087,Prob(JB):,0.95
Kurtosis:,3.02,Cond. No.,97200.0


## Interpretation

The table above displays the regression results summary on team level data using the number of shots (S) and passes (P) to predict the number of goals (G). The resulting regression equation is:

$$ G = (0.1203 * S) + (0.0015 * P) - 32.3282 $$

This model has a relatively high R-squared of 79.5% and all the coefficients are statistically different from zero. This R-squared value means that our model explains 79.5% of the variance in our dependent variable (goals). Adding more independent variables to this model will improve its accuracy.

To obtain the equation at the player level we use a similar equation but divide the constant by ten to reflect the ten outfield players on the pitch that could contribute to scoring at any one time. As a result, we will substitute individual player values into the following equation to obtain their offensive contributions (OC):

$$ OC = (0.1203 * S) + (0.0015 * P) - 3.23282 $$

## Analysis:

In the code below, we will take individual player statistics and use our formula to estimate their offensive contributions. We compare which players from the top 10 goal scorers are also included in the top 10 offensive contributions. We will use individual player data from the 2017/18 Premier League season.

In [3]:
#OFFENSIVE CONTRIBUTION CALCULATION
player_summary['Offensive_contribution'] = (model.params.Shots*player_summary.Shots) + (model.params.Passes*player_summary.Passes) + (model.params.const/10)

#TOP 10 GOALSCORERS/OFFENSIVE CONTRIBUTIONS PREM 17/18
top10_goals = player_summary[['Player', 'Club', 'Nationality', 'Goals', 'Offensive_contribution']].nlargest(10, 'Goals')
top10_goals.index = top10_goals.index + 1
top10_contr = player_summary[['Player', 'Club', 'Nationality', 'Offensive_contribution', 'Goals']].nlargest(10, 'Offensive_contribution')
top10_merge = pd.merge(top10_contr, top10_goals, how = 'left', on = ['Player', 'Club', 'Nationality'])
top10_merge.index = top10_merge.index + 1

#UPPER CASE PLAYER DETAILS IF IN BOTH TABLES
def BOTH(row, column):
    if row.Goals_y > 0:
        return row[column].upper()
    else:
        return row[column]

top10_merge['Player'] = top10_merge.apply(lambda row: BOTH(row, 'Player'), axis=1)
top10_merge['Club'] = top10_merge.apply(lambda row: BOTH(row, 'Club'), axis=1)
top10_merge['Nationality'] = top10_merge.apply(lambda row: BOTH(row, 'Nationality'), axis=1)

#RENAME COLUMNS
top10_merge.rename(columns={'Goals_x': 'Goals'}, inplace=True)
top10_merge.rename(columns={'Offensive_contribution_x': 'Offensive_contribution'}, inplace=True)

#OUTPUT
display(top10_goals[['Player', 'Club', 'Nationality', 'Goals']])
display(top10_merge[['Player', 'Club', 'Nationality', 'Offensive_contribution', 'Goals']])

Unnamed: 0,Player,Club,Nationality,Goals
1,Mohamed Salah,Liverpool,Egypt,32.0
2,Harry Kane,Tottenham Hotspur,England,30.0
3,Sergio Agüero,Manchester City,Argentina,21.0
4,Jamie Vardy,Leicester City,England,20.0
5,Raheem Sterling,Manchester City,England,18.0
6,Romelu Lukaku,Manchester United,Belgium,16.0
7,Roberto Firmino,Liverpool,Brazil,15.0
8,Alexandre Lacazette,Arsenal,France,14.0
9,Gabriel Jesus,Manchester City,Brazil,13.0
10,Eden Hazard,Chelsea,Belgium,12.0


Unnamed: 0,Player,Club,Nationality,Offensive_contribution,Goals
1,HARRY KANE,TOTTENHAM HOTSPUR,ENGLAND,19.891505,30.0
2,MOHAMED SALAH,LIVERPOOL,EGYPT,15.510505,32.0
3,Kevin De Bruyne,Manchester City,Belgium,12.107386,8.0
4,Christian Eriksen,Tottenham Hotspur,Denmark,11.817612,10.0
5,Alexis Sánchez,Manchester United,Chile,9.533291,9.0
6,Richarlison,-,Brazil,9.401709,5.0
7,Granit Xhaka,Arsenal,Switzerland,9.370861,1.0
8,SERGIO AGÜERO,MANCHESTER CITY,ARGENTINA,9.018073,21.0
9,RAHEEM STERLING,MANCHESTER CITY,ENGLAND,8.986588,18.0
10,ROBERTO FIRMINO,LIVERPOOL,BRAZIL,8.62409,15.0


Comments:
- Interestingly, half of the top 10 goal scorers are also in the top 10 for offensive contributions despite goals scored not being a variable in our model.
- Creative players such as De Bruyne, Eriksen and Sanchez who were not in the top 10 goal scorers all make the top 10 for offensive contributions.
- Granit Xhaka is perhaps a surprising inclusion. His league high 3116 passes put him 7th in the top offensive contributions.
- You can see that the contribution of the top goal scorers is lower than the number of goals they scored. This reflects the contributions of other people to their goals.

It will be interesting to include more independent variables into the regression model and see how this influences the results.