Created by: [SmirkyGraphs](https://smirkygraphs.github.io/). Code: [Github](https://github.com/SmirkyGraphs/Python-Notebooks). Source: [NHL API](https://gitlab.com/dword4/nhlapi).
<hr>

# NHL Coaching Challenge - 2020-21 Season

Nearing the end of the season during playoffs, one of the Tampa Bay's games against Montreal had a pretty early timeout `2021-07-03` the 196th timeout of the season was called by Montreal after falling down by 2 in the 1st period in game 3. The commentators started talking about how while rare, many teams end up losing a game and never even using their timeout at all.

This stuck with me so I decided to find any potentially interesting fact about timeouts that I could for the 2020-21 season. I chose to stick to just a single season rather then multiple seasons as previously failed coaches challenges lead to a loss of a timeout and didn't want to mix the data.

This code below is used to loop over the game event json files, find any timeouts, excluding TV-timeouts and do some basic data cleaning like adding time (in seconds) which team called timeout, who was winning at the time, by how much, etc.

<hr>

In [1]:
import json
import numpy as np
import pandas as pd
from pathlib import Path

In [2]:
def collect_season(folder):
    files = Path(f'../data/raw/live-feed/{folder}/').glob('*.json')
    
    frames = []
    count = 1
    for file in files:
        with open(file, 'r') as f:
            data = json.load(f)

        df = pd.json_normalize(data['liveData']['plays']['allPlays'])
        df = (df
          .assign(game_pk = data['gameData']['game']['pk'])
          .assign(season = data['gameData']['game']['season'])
          .assign(game_type = data['gameData']['game']['type'])
          .assign(date = data['gameData']['datetime']['dateTime'])
          .assign(game_id = count)
        )

        frames.append(df)
        count += 1
    
    return pd.concat(frames)

def get_season_stats(df):
    # filter for regular season
    df = df[df['game_type']=='R'].copy()
    
    # label challenges
    df.loc[df['result.eventTypeId'] == 'CHALLENGE', 'coaches_challenge'] = 1
    
    # get games + # of challenges
    df = df.groupby(['season', 'game_pk'], as_index=False)['coaches_challenge'].sum()
    
    # add a count of how many games were played
    df['game_id'] = df.groupby('season').cumcount() + 1
    
    return df

In [None]:
folders = Path('../data/raw/live-feed/')
folders = [f for f in folders.iterdir() if f.is_dir()]

data = []
for folder in folders:
    df = collect_season(folder.name)
    df = get_season_stats(df)
    data.append(df)
    
df = pd.concat(data)

In [None]:
# get max number of games for most recent season (rounded down to 10th)
end_range = np.floor(df['game_id'].max() / 10) * 10

interval_range = pd.interval_range(start=0, freq=10, end=end_range)
df['binned'] = pd.cut(df['game_id'], bins=interval_range)

In [None]:
# bin by number of games 0-10 10-20 etc. then save
limit = df.groupby('season')['binned'].max().apply(lambda x: x.right)
final_df = (df
    .groupby(by=['season','binned'])['coaches_challenge'].sum()
    .groupby(level='season').cumsum()
    .reset_index(name='coaches_challenge')
    .assign(bin_max=lambda x: x['binned'].apply(lambda x: x.right))
    .merge(limit, how='left', on='season', suffixes=['', '_limit'])
    .query('bin_max <= binned_limit')
    .drop(columns='binned_limit')
)

for season in final_df['season'].unique():
    row = {'season': season, 'bin_max': 0, 'coaches_challenge': 0}
    final_df = final_df.append(row, ignore_index=True)

final_df.to_csv('../data/clean/coaches-challenge-binned.csv', index=False)

In [None]:
final_df.head()