## Penalty shootout data setup
Functions to create additional data columns:
- 'sudden death': Was the shot taken during a sudden death round?
- 'could win': Could the shot immediately result in a victory if scored?
- 'must survive': Could the shot immediately result in a loss if missed?

In [30]:
# dependencies
import pandas as pd
import numpy as np
import plotly.express as px

In [5]:
pk = pd.read_csv("../00_data/pks_raw.csv")
pk.head()

Unnamed: 0,matchup,tournament,year,round,attacker_team,gk_team,attacker,goalkeeper,goal,missed,saved,shot_order,take_first,FT_tie,neutral_stadium,attacker_home
0,1,WC,1982,semi,France,Germany,Alain Giresse,Toni Schumacher,1.0,0.0,0.0,1.0,1.0,3,1,0.0
1,1,WC,1982,semi,France,Germany,Manuel Amoros,Toni Schumacher,1.0,0.0,0.0,2.0,1.0,3,1,0.0
2,1,WC,1982,semi,France,Germany,Dominique Rocheteau,Toni Schumacher,1.0,0.0,0.0,3.0,1.0,3,1,0.0
3,1,WC,1982,semi,France,Germany,Didier Six,Toni Schumacher,0.0,0.0,1.0,4.0,1.0,3,1,0.0
4,1,WC,1982,semi,France,Germany,Michel Platini,Toni Schumacher,1.0,0.0,0.0,5.0,1.0,3,1,0.0


### Create a column to score whether a given shot is taken during 'sudden death' rounds
- Per the FA Laws of the Game: 'If, after both teams have taken five kicks, the scores are level, kicks continue until one team has scored a goal more than the other from the same number of kicks'

In [28]:
 # create sudden death column using numpy 'where'

pk['sudden_death'] = np.where(pk['shot_order'] > 5, 1, 0)

pk.head()

Unnamed: 0,matchup,tournament,year,round,attacker_team,gk_team,attacker,goalkeeper,goal,missed,saved,shot_order,take_first,FT_tie,neutral_stadium,attacker_home,sudden_death
0,1,WC,1982,semi,France,Germany,Alain Giresse,Toni Schumacher,1.0,0.0,0.0,1.0,1.0,3,1,0.0,0
1,1,WC,1982,semi,France,Germany,Manuel Amoros,Toni Schumacher,1.0,0.0,0.0,2.0,1.0,3,1,0.0,0
2,1,WC,1982,semi,France,Germany,Dominique Rocheteau,Toni Schumacher,1.0,0.0,0.0,3.0,1.0,3,1,0.0,0
3,1,WC,1982,semi,France,Germany,Didier Six,Toni Schumacher,0.0,0.0,1.0,4.0,1.0,3,1,0.0,0
4,1,WC,1982,semi,France,Germany,Michel Platini,Toni Schumacher,1.0,0.0,0.0,5.0,1.0,3,1,0.0,0


In [37]:
# check that it worked

print(pk['sudden_death'].value_counts())
fig = px.histogram(pk, x='sudden_death')
fig.show()

0    2020
1     263
Name: sudden_death, dtype: int64


In [40]:
# check a known example: Euro 2016 quarterfinal - Italy vs. Germany
pk[pk.matchup == 53]

Unnamed: 0,matchup,tournament,year,round,attacker_team,gk_team,attacker,goalkeeper,goal,missed,saved,shot_order,take_first,FT_tie,neutral_stadium,attacker_home,sudden_death
496,53,EU,2016,quarter,Italy,Germany,Lorenzo Insigne,Manuel Neuer,1.0,0.0,0.0,1.0,1.0,1,1,0.0,0
497,53,EU,2016,quarter,Italy,Germany,Simone Zaza,Manuel Neuer,0.0,1.0,0.0,2.0,1.0,1,1,0.0,0
498,53,EU,2016,quarter,Italy,Germany,Andrea Barzagli,Manuel Neuer,1.0,0.0,0.0,3.0,1.0,1,1,0.0,0
499,53,EU,2016,quarter,Italy,Germany,Graziano Pelle,Manuel Neuer,0.0,1.0,0.0,4.0,1.0,1,1,0.0,0
500,53,EU,2016,quarter,Italy,Germany,Leonardo Bonucci,Manuel Neuer,0.0,0.0,1.0,5.0,1.0,1,1,0.0,0
501,53,EU,2016,quarter,Italy,Germany,Emanuele Giaccherini,Manuel Neuer,1.0,0.0,0.0,6.0,1.0,1,1,0.0,1
502,53,EU,2016,quarter,Italy,Germany,Marco Parolo,Manuel Neuer,1.0,0.0,0.0,7.0,1.0,1,1,0.0,1
503,53,EU,2016,quarter,Italy,Germany,Mattia De Sciglio,Manuel Neuer,1.0,0.0,0.0,8.0,1.0,1,1,0.0,1
504,53,EU,2016,quarter,Italy,Germany,Matteo Darmian,Manuel Neuer,0.0,0.0,1.0,9.0,1.0,1,1,0.0,1
505,53,EU,2016,quarter,Germany,Italy,Toni Kroos,Gianluigi Buffon,1.0,0.0,0.0,1.0,0.0,1,1,0.0,0


### Create a column to score whether a given shot could win the match
Generally, there are two possible ways to win a penalty shootout:
- 'best of five': scoring more goals than the opponent could possibly score within the first five rounds of shots
- 'sudden death': scoring one more goal than the opponent during a sudden death round

In [41]:
# function to determine if a shot could win the match

def could_win(df):
    
    return 1


In [None]:
# apply the function to each matchup
pk.groupby('matchup').apply(could_win)