# Penalty kicks 

This notebook analyzes the penalty kick data as a simultaneous move game of complete information. The data come from English premier league games in 2016-17. 

**Data source:** https://www.kaggle.com/mauryashubham/english-premier-league-penalty-dataset-201617

In [1]:
import pandas as pd 
import numpy as np 
import nashpy 

Global settings for pandas printing floats with two decimals. 

In [2]:
pd.options.display.float_format = '{:,.2f}'.format

In [3]:
def print_payoffs(U, A): 
    '''print_payoffs: Nicely formatted for a 2*2 game 
        INPUTS: 
            U1,U2: (matrices, dim=na1*na2) Payoffs 
            A1: (list of str, len=na1) List of actions of player 1
            A2: (list of str, len=na2) list of actions of player 2
        
        OUTPUT:
            tab: pandas dataframe, na1*na2 with payoff tuples 
    '''
    assert len(U) == 2, f'only implemented for 2-player games'
    assert len(A) == 2, f'only implemented for 2-player games'

    U1 = U[0]
    U2 = U[1]
    A1 = A[0]
    A2 = A[1]

    na1,na2 = U1.shape
    assert len(A1) == na1
    assert len(A2) == na2

    # "matrix" of tuples 
    X = [[(U1[r,c],U2[r,c]) for c in range(na2)] for r in range(na1)]

    # dataframe version 
    tab = pd.DataFrame(X, columns=A2, index=A1)
    
    return tab 

In [4]:
dat = pd.read_csv('penalty_data.csv', encoding='latin')

In [5]:
dat['Date'] = pd.to_datetime(dat.Date)

In [6]:
print(f'Penalty kicks from games: ')
dat.Date.dt.year.value_counts()

Penalty kicks from games: 


2016    63
2017    43
Name: Date, dtype: int64

In [7]:
dat.head(3)

Unnamed: 0,No.,Match Week,Date,Player,Team,Match,Time of Penalty Awarded,Scored,Final Results,Foot,Kick_Direction,Keeper_Direction,Saved
0,1,1,2016-08-13,Riyad Mahrez,Leicester,Hull vs Leicester,47' minute,Scored,42737,L,C,R,
1,2,1,2016-08-13,Sergio Agüero,Man City,Man City vs Sunderland,4' minute,Scored,42737,R,L,L,
2,3,1,2016-08-14,Theo Walcott,Arsenal,Arsenal vs Liverpool,30' minute,Missed,42828,R,L,L,1.0


There are missing observations in the data. We can only analyzes instances where all variables are observed. 

In [8]:
I = (dat.Kick_Direction.notnull()) & (dat.Keeper_Direction.notnull()) & (dat.Scored.notnull())
print(f'Deleting {(I==False).sum()} rows => N = {I.sum()} penalty kicks in final data.')
dat = dat[I].copy()

Deleting 3 rows => N = 103 penalty kicks in final data.


**Action distribution:** How frequently do each player choose each action? 

In [9]:
dat.Kick_Direction.value_counts(normalize=True)

L   0.46
R   0.38
C   0.17
Name: Kick_Direction, dtype: float64

In [10]:
dat.Keeper_Direction.value_counts(normalize=True)

R   0.50
L   0.44
C   0.06
Name: Keeper_Direction, dtype: float64

**Joint distribution** Plotted together by cross-tabulating, we get: 

In [11]:
pd.crosstab(dat.Keeper_Direction, dat.Kick_Direction, normalize=True)

Kick_Direction,C,L,R
Keeper_Direction,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
C,0.01,0.02,0.03
L,0.07,0.19,0.17
R,0.09,0.24,0.17


What does it mean for outcomes? 

In [12]:
dat['goal'] = dat.Scored == 'Scored' # penalty kick resulted in a score

# Reduced matrix, discarding `C`

In [13]:
I = (dat.Kick_Direction != "C") & (dat.Keeper_Direction != "C")

In [14]:
shares = dat.loc[I].groupby(['Kick_Direction', 'Keeper_Direction']).goal.mean().unstack().round(2)
shares

Keeper_Direction,L,R
Kick_Direction,Unnamed: 1_level_1,Unnamed: 2_level_1
L,0.65,0.88
R,0.83,0.56


In [38]:
# extracting the names of the actions 
A1 = shares.index.values
A2 = shares.columns.values

print(A1)
print(A2)

['L' 'R']
['L' 'R']


In [15]:
U1 = shares.values

***... continue work from here...***

# Full payoff matrix

Display overall means of the key outcome (whether a goal was scored). 

In [29]:
tab = dat.groupby(['Kick_Direction', 'Keeper_Direction']).goal.mean().unstack().round(2)
tab

Keeper_Direction,C,L,R
Kick_Direction,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
C,0.0,1.0,0.89
L,1.0,0.65,0.88
R,1.0,0.83,0.56


In [34]:
# names of the actions (in the same sorted order as in the table)
A1 = tab.index.values
A2 = tab.columns.values
print(A1)
print(A2)

['C' 'L' 'R']
['C' 'L' 'R']


In [28]:
U = dat.groupby(['Kick_Direction', 'Keeper_Direction']).goal.mean().unstack().values
U

array([[0.        , 1.        , 0.88888889],
       [1.        , 0.65      , 0.88      ],
       [1.        , 0.83333333, 0.55555556]])

***... continue from here...***