# NHL API Scraper v1.1 (Beta)

A hockey scraper for the new NHL API. The `game_id` parameter is the same used to call the NHL API. This script tries to improve on the speed and efficiency of the v1.0 which was largely smashed together just to get something to work. The `hockey_scraper` library now works again and is currently faster than this scraper but may as well use this scraper as I can get the exact data that I want from it. 

### Implemented Functions

- `get_away_roster(game_id)`: returns a dictionary with player names and IDs of the away roster.
- `get_home_roster(game_id)`: returns a dictionary with player names and IDs of the home roster.
- `get_game_roster(game_id)`: returns a dictionary with player names and IDs of both teams.

- `get_away_positions(game_id)`: returns a dictionary with player IDs and player positions of the away roster.
- `get_home_positions(game_id)`: returns a dictionary with player IDs and player positions of the home roster.
- `get_game_positions(game_id)`: returns a dictionary with player IDs and player positions of both teams.

- `get_play_by_play(game_id)`: returns a pandas DataFrame of the play-by-play. 
- `get_multi_play_by_play([list of game ids])`: returns a pandas DataFrame of the play-by-play of all the listed games. 


### Other Functions to Implement

- ?

<br>

### Methods of Improvement

- Faster for loop methods (use 'for item in list' rather than 'for i in range(len(list))'
- Cutting out unnecessary loop run throughs
- Differentiation of strength state into how many skaters on ice

## Current State

- Need to correct get_play_by_play() function: ensure shifts are recorded correctly.

### Useful Endpoints

- https://api-web.nhle.com/v1/gamecenter/2023020001/play-by-play
- https://api-web.nhle.com/v1/player/8477474/landing
- https://api-web.nhle.com/v1/gamecenter/2023020001/boxscore

<br>

In [54]:
import requests
import pandas as pd
import numpy as np
from datetime import datetime

In [55]:
game_id = 2023020001

In [56]:
url = 'https://api-web.nhle.com/v1/gamecenter/{}/play-by-play'.format(game_id)
pbp = requests.get(url)
pbp_data = pbp.json()

In [57]:
pbp = pbp_data.get('plays')

In [58]:
types = [pbp[i].get('typeDescKey').upper() for i in range(len(pbp))]

In [59]:
set(types)

{'BLOCKED-SHOT',
 'DELAYED-PENALTY',
 'FACEOFF',
 'GAME-END',
 'GIVEAWAY',
 'GOAL',
 'HIT',
 'MISSED-SHOT',
 'PENALTY',
 'PERIOD-END',
 'PERIOD-START',
 'SHOT-ON-GOAL',
 'STOPPAGE',
 'TAKEAWAY'}

In [60]:
url = 'https://api-web.nhle.com/v1/gamecenter/{}/boxscore'.format(2023020200)
pbp = requests.get(url)
pbp_data = pbp.json()

In [61]:
pbp_data

{'id': 2023020200,
 'season': 20232024,
 'gameType': 2,
 'limitedScoring': False,
 'gameDate': '2023-11-09',
 'venue': {'default': 'Canada Life Centre'},
 'venueLocation': {'default': 'Winnipeg'},
 'startTimeUTC': '2023-11-10T01:00:00Z',
 'easternUTCOffset': '-05:00',
 'venueUTCOffset': '-06:00',
 'tvBroadcasts': [{'id': 375,
   'market': 'A',
   'countryCode': 'US',
   'network': 'BSSO',
   'sequenceNumber': 70},
  {'id': 292,
   'market': 'H',
   'countryCode': 'CA',
   'network': 'TSN3',
   'sequenceNumber': 75}],
 'gameState': 'OFF',
 'gameScheduleState': 'OK',
 'periodDescriptor': {'number': 3, 'periodType': 'REG'},
 'regPeriods': 3,
 'awayTeam': {'id': 18,
  'name': {'default': 'Predators'},
  'abbrev': 'NSH',
  'score': 3,
  'sog': 23,
  'logo': 'https://assets.nhle.com/logos/nhl/svg/NSH_light.svg',
  'placeName': {'default': 'Nashville'}},
 'homeTeam': {'id': 52,
  'name': {'default': 'Jets'},
  'abbrev': 'WPG',
  'score': 6,
  'sog': 37,
  'logo': 'https://assets.nhle.com/logo

In [62]:
pbp_data.get('playerByGameStats').get('awayTeam')

{'forwards': [{'playerId': 8476887,
   'sweaterNumber': 9,
   'name': {'default': 'F. Forsberg'},
   'position': 'L',
   'goals': 2,
   'assists': 0,
   'points': 2,
   'plusMinus': -2,
   'pim': 0,
   'hits': 0,
   'powerPlayGoals': 0,
   'shots': 3,
   'faceoffWinningPctg': 0.0,
   'toi': '20:51'},
  {'playerId': 8476925,
   'sweaterNumber': 10,
   'name': {'default': 'C. Sissons'},
   'position': 'C',
   'goals': 0,
   'assists': 0,
   'points': 0,
   'plusMinus': -1,
   'pim': 0,
   'hits': 2,
   'powerPlayGoals': 0,
   'shots': 0,
   'faceoffWinningPctg': 0.7,
   'toi': '16:45'},
  {'playerId': 8478508,
   'sweaterNumber': 13,
   'name': {'default': 'Y. Trenin'},
   'position': 'C',
   'goals': 0,
   'assists': 0,
   'points': 0,
   'plusMinus': -1,
   'pim': 0,
   'hits': 2,
   'powerPlayGoals': 0,
   'shots': 0,
   'faceoffWinningPctg': 0.0,
   'toi': '13:41'},
  {'playerId': 8474679,
   'sweaterNumber': 14,
   'name': {'default': 'G. Nyquist'},
   'position': 'C',
   'goals': 0

In [63]:
url = 'https://api-web.nhle.com/v1/player/{}/landing'.format(8477474)
pbp = requests.get(url)
pbp_data = pbp.json()
pbp_data

{'playerId': 8477474,
 'isActive': False,
 'firstName': {'default': 'Madison'},
 'lastName': {'default': 'Bowey'},
 'sweaterNumber': 24,
 'position': 'D',
 'headshot': 'https://assets.nhle.com/mugs/nhl/latest/8477474.png',
 'heroImage': 'https://assets.nhle.com/mugs/actionshots/1296x729/8477474.jpg',
 'heightInInches': 73,
 'heightInCentimeters': 185,
 'weightInPounds': 207,
 'weightInKilograms': 94,
 'birthDate': '1995-04-22',
 'birthCity': {'default': 'Winnipeg'},
 'birthStateProvince': {'default': 'Manitoba'},
 'birthCountry': 'CAN',
 'shootsCatches': 'R',
 'draftDetails': {'year': 2013,
  'teamAbbrev': 'WSH',
  'round': 2,
  'pickInRound': 23,
  'overallPick': 53},
 'playerSlug': 'madison-bowey-8477474',
 'inTop100AllTime': 0,
 'inHHOF': 0,
 'featuredStats': {'season': 20212022,
  'regularSeason': {'subSeason': {'assists': 0,
    'gameWinningGoals': 0,
    'gamesPlayed': 2,
    'goals': 0,
    'otGoals': 0,
    'pim': 0,
    'plusMinus': -3,
    'points': 0,
    'powerPlayGoals': 0

In [64]:
(pbp_data.get('firstName').get('default') + " " + pbp_data.get('lastName').get('default')).upper()

'MADISON BOWEY'

## Player Names and IDs Dictionary

In [65]:
def get_away_roster(game_id):
    
    url = 'https://api-web.nhle.com/v1/gamecenter/{}/boxscore'.format(game_id)
    
    try:
        data = requests.get(url)
        boxscore = data.json()
        
    except Exception as e:
        print('URL does not exist for Game_Id {}'.format(game_id))
        return None
        
    else:
        away_roster = {}
        forwards = boxscore.get('playerByGameStats').get('awayTeam').get('forwards')
        defense = boxscore.get('playerByGameStats').get('awayTeam').get('defense')
        goalies = boxscore.get('playerByGameStats').get('awayTeam').get('goalies')

        for spot in forwards:
            url = 'https://api-web.nhle.com/v1/player/{}/landing'.format(spot.get('playerId'))
            playerInfo = requests.get(url)
            playerInfo = playerInfo.json()
            away_roster.update({ spot.get('playerId') : (playerInfo.get('firstName').get('default') + " " + playerInfo.get('lastName').get('default')).upper()})
        
        for spot in defense:
            url = 'https://api-web.nhle.com/v1/player/{}/landing'.format(spot.get('playerId'))
            playerInfo = requests.get(url)
            playerInfo = playerInfo.json()
            away_roster.update({spot.get('playerId') : (playerInfo.get('firstName').get('default') + " " + playerInfo.get('lastName').get('default')).upper()})
        
        for spot in goalies:
            url = 'https://api-web.nhle.com/v1/player/{}/landing'.format(spot.get('playerId'))
            playerInfo = requests.get(url)
            playerInfo = playerInfo.json()
            away_roster.update({spot.get('playerId') : (playerInfo.get('firstName').get('default') + " " + playerInfo.get('lastName').get('default')).upper()})
                
        return away_roster

In [66]:
# Testing
# get_away_roster(game_id) 

In [67]:
def get_home_roster(game_id):
    
    url = 'https://api-web.nhle.com/v1/gamecenter/{}/boxscore'.format(game_id)
    
    try:
        data = requests.get(url)
        boxscore = data.json()
        
    except Exception as e:
        print('URL does not exist for Game_Id {}'.format(game_id))
        return None
        
    else:
        home_roster = {}
        forwards = boxscore.get('playerByGameStats').get('homeTeam').get('forwards')
        defense = boxscore.get('playerByGameStats').get('homeTeam').get('defense')
        goalies = boxscore.get('playerByGameStats').get('homeTeam').get('goalies')

        for spot in forwards:
            url = 'https://api-web.nhle.com/v1/player/{}/landing'.format(spot.get('playerId'))
            playerInfo = requests.get(url)
            playerInfo = playerInfo.json()
            home_roster.update({ spot.get('playerId') : (playerInfo.get('firstName').get('default') + " " + playerInfo.get('lastName').get('default')).upper()})
        
        for spot in defense:
            url = 'https://api-web.nhle.com/v1/player/{}/landing'.format(spot.get('playerId'))
            playerInfo = requests.get(url)
            playerInfo = playerInfo.json()
            home_roster.update({ spot.get('playerId') : (playerInfo.get('firstName').get('default') + " " + playerInfo.get('lastName').get('default')).upper()})
        
        for spot in goalies:
            url = 'https://api-web.nhle.com/v1/player/{}/landing'.format(spot.get('playerId'))
            playerInfo = requests.get(url)
            playerInfo = playerInfo.json()
            home_roster.update({ spot.get('playerId') : (playerInfo.get('firstName').get('default') + " " + playerInfo.get('lastName').get('default')).upper()})
                
        return home_roster

In [68]:
# Testing
# get_home_roster(game_id)

In [69]:
def get_game_roster(game_id):
    
    away_roster = get_away_roster(game_id)
    home_roster = get_home_roster(game_id)
    
    game_roster = away_roster.update(home_roster)
    
    return away_roster

In [70]:
# Testing
# get_game_roster(2023020001)

<br>

## Getting Player Positions

For the purposes of my model, skaters are classified as either forwards or defencemen with no differentiation between centers or wingers. 

In [71]:
def get_away_positions(game_id):
    
    url = 'https://api-web.nhle.com/v1/gamecenter/{}/boxscore'.format(game_id)
    
    try:
        data = requests.get(url)
        boxscore = data.json()
        
    except Exception as e:
        print('URL does not exist for Game_Id {}'.format(game_id))
        return None
        
    else:
        away_roster = {}
        forwards = boxscore.get('playerByGameStats').get('awayTeam').get('forwards')
        defense = boxscore.get('playerByGameStats').get('awayTeam').get('defense')
        goalies = boxscore.get('playerByGameStats').get('awayTeam').get('goalies')

        for spot in forwards:
            url = 'https://api-web.nhle.com/v1/player/{}/landing'.format(spot.get('playerId'))
            playerInfo = requests.get(url)
            playerInfo = playerInfo.json()
            away_roster.update({ spot.get('playerId') : 'F'})
        
        for spot in defense:
            url = 'https://api-web.nhle.com/v1/player/{}/landing'.format(spot.get('playerId'))
            playerInfo = requests.get(url)
            playerInfo = playerInfo.json()
            away_roster.update({spot.get('playerId') : 'D'})
        
        for spot in goalies:
            url = 'https://api-web.nhle.com/v1/player/{}/landing'.format(spot.get('playerId'))
            playerInfo = requests.get(url)
            playerInfo = playerInfo.json()
            away_roster.update({spot.get('playerId') : 'G'})
                
        return away_roster

In [72]:
# Testing
# get_away_positions(2023020001)

In [73]:
def get_home_positions(game_id):
    
    url = 'https://api-web.nhle.com/v1/gamecenter/{}/boxscore'.format(game_id)
    
    try:
        data = requests.get(url)
        boxscore = data.json()
        
    except Exception as e:
        print('URL does not exist for Game_Id {}'.format(game_id))
        return None
        
    else:
        home_roster = {}
        forwards = boxscore.get('playerByGameStats').get('homeTeam').get('forwards')
        defense = boxscore.get('playerByGameStats').get('homeTeam').get('defense')
        goalies = boxscore.get('playerByGameStats').get('homeTeam').get('goalies')

        for spot in forwards:
            url = 'https://api-web.nhle.com/v1/player/{}/landing'.format(spot.get('playerId'))
            playerInfo = requests.get(url)
            playerInfo = playerInfo.json()
            home_roster.update({ spot.get('playerId') : 'F'})
        
        for spot in defense:
            url = 'https://api-web.nhle.com/v1/player/{}/landing'.format(spot.get('playerId'))
            playerInfo = requests.get(url)
            playerInfo = playerInfo.json()
            home_roster.update({spot.get('playerId') : 'D'})
        
        for spot in goalies:
            url = 'https://api-web.nhle.com/v1/player/{}/landing'.format(spot.get('playerId'))
            playerInfo = requests.get(url)
            playerInfo = playerInfo.json()
            home_roster.update({spot.get('playerId') : 'G'})
                
        return home_roster

In [74]:
# Testing
# get_home_positions(2023020001)

In [75]:
def get_game_positions(game_id):
    
    away_positions = get_away_positions(game_id)
    home_positions = get_home_positions(game_id)
    
    game_positions = away_positions.update(home_positions)
    
    return away_positions

In [76]:
# Testing
# get_game_positions(2023020001)

<br>

### Getting Goalies (Helper Function)

In [77]:
def get_goalies_id(game_id):
    
    url = 'https://api-web.nhle.com/v1/gamecenter/{}/boxscore'.format(game_id)
    
    try:
        data = requests.get(url)
        boxscore = data.json()
        
    except Exception as e:
        print('URL does not exist for Game_Id {}'.format(game_id))
        return None
        
    else:
        goalies = {}
        away_goalies = boxscore.get('playerByGameStats').get('awayTeam').get('goalies')
        home_goalies = boxscore.get('playerByGameStats').get('homeTeam').get('goalies')
        
        for spot in away_goalies:
            url = 'https://api-web.nhle.com/v1/player/{}/landing'.format(spot.get('playerId'))
            playerInfo = requests.get(url)
            playerInfo = playerInfo.json()
            goalies.update({ spot.get('playerId') : (playerInfo.get('firstName').get('default') + " " + playerInfo.get('lastName').get('default')).upper()})
            
        for spot in home_goalies:
            url = 'https://api-web.nhle.com/v1/player/{}/landing'.format(spot.get('playerId'))
            playerInfo = requests.get(url)
            playerInfo = playerInfo.json()
            goalies.update({ spot.get('playerId') : (playerInfo.get('firstName').get('default') + " " + playerInfo.get('lastName').get('default')).upper()})
                
        return goalies

In [78]:
# Tester
# get_goalies_id(2023020105)

<br>

### Known Event typeCodes

In [79]:
typeCodes = {502 : 'FACEOFF', 503 : 'HIT', 504 : 'GIVEAWAY', 505 : 'GOAL', 506 : 'SHOT_ON_GOAL', 507 : 'MISSED_SHOT',
             508 : 'BLOCKED_SHOT', 509 : 'PENALTY', 516 : 'STOPPAGE', 520 : 'PERIOD_START', 521 : 'PERIOD_END',
             523 : 'SHOOTOUT_COMPLETE', 524 : 'GAME_END', 525 : 'TAKEAWAY', 535 : 'DELAYED_PENALTY', 
             537 : 'FAILED_SHOT_ATTEMPT'}

shotCodes = [505,506,507,508]

<br>

## Play-By-Play DataFrame

This combines the play-by-play and shift data. Most of the chosen columns were inspired by Harry Shomer's public hockey scraper that I had used previously. 

In [18]:
def parse_event(play_dict):
    
    event_dict_keys = ['Period','Event_tc','Event','Time_Elapsed','Strength','Ev_Zone','Type','Ev_Team','p1_name','p1_ID',
                       'p2_name','p2_ID','p3_name','p3_ID','xC','yC']
    
    event_dict = dict()
    
    # Common play items across all plays
    event_dict['Period'] = play_dict['period']
    event_dict['Event_tc'] = play_dict['typeCode']
    event_dict['Event'] = play_dict['typeDescKey'].upper()
    event_dict['Time_Elapsed'] = play_dict['timeRemaining']
    event_dict['Strength'] = play_dict['situationCode']
    event_dict['sort_order'] = play_dict['sortOrder']
      
    # Below is applicable for FACEOFF, HIT, GIVEAWAY, GOAL, SHOT_ON_GOAL, MISSED_SHOT, BLOCKED_SHOT, PENALTY, TAKEAWAY, DELAYED_PENALTY    
    if 'details' in play_dict.keys(): 
        if 'zoneCode' in play_dict['details'].keys():
            event_dict['Ev_Zone'] = play_dict['details']['zoneCode']
        if 'xCoord' in play_dict['details'].keys():
            event_dict['xC'] = play_dict['details']['xCoord']
            event_dict['yC'] = play_dict['details']['yCoord']
        
        if event_dict['Event_tc'] == 502: # Faceoffs
            if 'winningPlayerId' in play_dict['details'].keys():
                event_dict['p1_ID'] = play_dict['details']['winningPlayerId']
                event_dict['Ev_Team'] = play_dict['details']['eventOwnerTeamId']
            if 'losingPlayerId' in play_dict['details'].keys():
                event_dict['p2_ID'] = play_dict['details']['losingPlayerId']
            
        if event_dict['Event_tc'] == 503: # Hits
            if 'hittingPlayerId' in play_dict['details'].keys():
                event_dict['p1_ID'] = play_dict['details']['hittingPlayerId']
                event_dict['Ev_Team'] = play_dict['details']['eventOwnerTeamId']
            if 'hitteePlayerId' in play_dict['details'].keys():
                event_dict['p2_ID'] = play_dict['details']['hitteePlayerId']
  
        if event_dict['Event_tc'] == 504: # Giveaways
            if 'playerId' in play_dict['details'].keys():
                event_dict['p1_ID'] = play_dict['details']['playerId']
                event_dict['Ev_Team'] = play_dict['details']['eventOwnerTeamId']
            
        if event_dict['Event_tc'] in [505,506,507,508]: # Goals, Shots_On_Goal, Missed_Shots, Blocked_Shots
            if 'scoringPlayerId' in play_dict['details'].keys():
                event_dict['p1_ID'] = play_dict['details']['scoringPlayerId']
            if 'shootingPlayerId' in play_dict['details'].keys():
                event_dict['p1_ID'] = play_dict['details']['shootingPlayerId']
                event_dict['Ev_Team'] = play_dict['details']['eventOwnerTeamId']
            if 'assist1PlayerId' in play_dict['details'].keys():
                event_dict['p2_ID'] = play_dict['details']['assist1PlayerId']
            if 'assist2PlayerId' in play_dict['details'].keys():
                event_dict['p3_ID'] = play_dict['details']['assist2PlayerId']
            if 'blockingPlayerId' in play_dict['details'].keys():
                event_dict['p2_ID'] = play_dict['details']['blockingPlayerId']
            if 'type' in play_dict['details'].keys():
                event_dict['Type'] = play_dict['details']['shotType'].upper()
            if 'homeScore' in play_dict['details'].keys():
                event_dict['Home_Score'] = play_dict['details']['homeScore']
                event_dict['Away_Score'] = play_dict['details']['awayScore']
            
        if event_dict['Event_tc'] == 509: # Penalties
            if 'committedByPlayerId' in play_dict['details'].keys():
                event_dict['p1_ID'] = play_dict['details']['committedByPlayerId']
                event_dict['Ev_Team'] = play_dict['details']['eventOwnerTeamId']
            if 'drawnByPlayerId' in play_dict['details'].keys():
                event_dict['p2_ID'] = play_dict['details']['drawnByPlayerId']
            if 'descKey' in play_dict['details'].keys():
                event_dict['Type'] = play_dict['details']['typeCode'] + ' for ' + play_dict['details']['descKey'].upper()
            
        if event_dict['Event_tc'] == 525: # Takeaways
            if 'playerId' in play_dict['details'].keys():
                event_dict['p1_ID'] = play_dict['details']['playerId']
                event_dict['Ev_Team'] = play_dict['details']['eventOwnerTeamId']
            
        if event_dict['Event_tc'] == 535: # Delayed Penalties
            if 'eventOwnerTeamId' in play_dict['details'].keys():
                event_dict['Ev_Team'] = play_dict['details']['eventOwnerTeamId']
            
        # Failed_Shot_Attempts?
        
    return event_dict

In [19]:
def get_pbp_improvement_beta(game_id):
    
    url = 'https://api-web.nhle.com/v1/gamecenter/{}/play-by-play'.format(game_id)
    
    try:
        pbp = requests.get('https://api-web.nhle.com/v1/gamecenter/'+str(game_id)+'/play-by-play')
        pbp_data = pbp.json()
    
    except Exception as e:
        print('Unable to get play-by-play for Game_Id {}'.format(game_id))
        return None
        
    else:
        plays = pbp_data['plays']
        events = [parse_event(play) for play in plays]
        pbp_df = pd.DataFrame(events)
        
        awayTeam_Id, homeTeam_Id, game_roster = pbp_data.get('awayTeam').get('id'), pbp_data.get('homeTeam').get('id'), get_game_roster(game_id)
        goalie_ids = get_goalies_id(game_id)
        NoneType = type(None)
    
        away_on_ice_player_ids = ['awayPlayer1_id','awayPlayer2_id','awayPlayer3_id','awayPlayer4_id','awayPlayer5_id',
                                  'awayPlayer6_id']
        home_on_ice_player_ids = ['homePlayer1_id','homePlayer2_id','homePlayer3_id','homePlayer4_id','homePlayer5_id',
                                  'homePlayer6_id']
    
        cols = ['Game_Id','Date','Period','Event_tc','Event','Time_Elapsed','Strength','Ev_Zone','Type','Ev_Team','Away_Team',
                'Home_Team','p1_name','p1_ID','p2_name','p2_ID','p3_name','p3_ID','awayPlayer1','awayPlayer1_id','awayPlayer2',
                'awayPlayer2_id','awayPlayer3','awayPlayer3_id','awayPlayer4','awayPlayer4_id','awayPlayer5','awayPlayer5_id',
                'awayPlayer6','awayPlayer6_id','homePlayer1','homePlayer1_id','homePlayer2','homePlayer2_id','homePlayer3',
                'homePlayer3_id','homePlayer4','homePlayer4_id','homePlayer5','homePlayer5_id','homePlayer6','homePlayer6_id',
                'Away_Score','Home_Score','Away_Goalie','Away_Goalie_Id','Home_Goalie','Home_Goalie_Id','xC','yC']
        
        pbp_df['Away_Team'] = pbp_data.get('awayTeam').get('abbrev')
        pbp_df['Home_Team'] = pbp_data.get('homeTeam').get('abbrev')
        for col in cols:
            if col not in pbp_df.columns:
                pbp_df[col] = None
        
        # Adding Shift Data
        shifts = requests.get('https://api.nhle.com/stats/rest/en/shiftcharts?cayenneExp=gameId={}'.format(game_id))

        shift_data = shifts.json()
        shift_data.get('data')

        homePlayerList = ['homePlayer1_id','homePlayer2_id','homePlayer3_id','homePlayer4_id','homePlayer5_id','homePlayer6_id']
        homePlayerList_names = ['homePlayer1','homePlayer2','homePlayer3','homePlayer4','homePlayer5','homePlayer6']
        awayPlayerList = ['awayPlayer1_id','awayPlayer2_id','awayPlayer3_id','awayPlayer4_id','awayPlayer5_id','awayPlayer6_id']
        awayPlayerList_names = ['awayPlayer1','awayPlayer2','awayPlayer3','awayPlayer4','awayPlayer5','awayPlayer6']

        for i in range(len(shift_data.get('data'))):

            shift = shift_data.get('data')[i]

            period = shift.get('period')
            shift_start = datetime.strptime(shift.get('startTime'),'%M:%S')
            shift_end = datetime.strptime(shift.get('endTime'),'%M:%S')

            for j in range(len(pbp_df)):

                time_elapsed = datetime.strptime(pbp_df.at[j,'Time_Elapsed'],'%M:%S')

                awayPlayerList_loop = [pbp_df.at[j,'awayPlayer1_id'],pbp_df.at[j,'awayPlayer2_id'],pbp_df.at[j,'awayPlayer3_id'],
                                       pbp_df.at[j,'awayPlayer4_id'],pbp_df.at[j,'awayPlayer5_id'],pbp_df.at[j,'awayPlayer6_id']]
                homePlayerList_loop = [pbp_df.at[j,'homePlayer1_id'],pbp_df.at[j,'homePlayer2_id'],pbp_df.at[j,'homePlayer3_id'],
                                       pbp_df.at[j,'homePlayer4_id'],pbp_df.at[j,'homePlayer5_id'],pbp_df.at[j,'homePlayer6_id']]

                if (period == pbp_df.at[j,'Period']) & (shift_start <= time_elapsed < shift_end):

                    if shift.get('teamId') == awayTeam_Id:
                        if shift.get('playerId') not in awayPlayerList_loop:
                            for k in range(len(awayPlayerList)):
                                if pd.isna(pbp_df.at[j,awayPlayerList[k]]) == True:
                                    pbp_df.at[j,awayPlayerList[k]] = shift.get('playerId')
                                    pbp_df.at[j,awayPlayerList_names[k]] = game_roster.get(shift.get('playerId'))
                                    break

                    if shift.get('teamId') == homeTeam_Id:
                        if shift.get('playerId') not in homePlayerList_loop:
                            for k in range(len(homePlayerList)):
                                if pd.isna(pbp_df.at[j,homePlayerList[k]]) == True:
                                    pbp_df.at[j,homePlayerList[k]] = shift.get('playerId')
                                    pbp_df.at[j,homePlayerList_names[k]] = game_roster.get(shift.get('playerId'))
                                    break
        
        # Adding goalies to blocked shots
        for i in range(len(pbp_df)):
            for j in range(len(away_on_ice_player_ids)):
                for g_id in goalie_ids:
                    if g_id == pbp_df.at[i,away_on_ice_player_ids[j]]:
                        pbp_df.at[i,'Away_Goalie_Id'] = g_id
                        pbp_df.at[i,'Away_Goalie'] = game_roster.get(g_id)
                        break
            for j in range(len(home_on_ice_player_ids)):
                for g_id in goalie_ids:
                    if g_id == pbp_df.at[i,home_on_ice_player_ids[j]]:
                        pbp_df.at[i,'Home_Goalie_Id'] = g_id
                        pbp_df.at[i,'Home_Goalie'] = game_roster.get(g_id)
                        break
                        
        
        return pbp_df

In [20]:
%%time

pbp = get_pbp_improvement_beta(game_id)
pbp

CPU times: total: 46.9 ms
Wall time: 331 ms


Unnamed: 0,Period,Event_tc,Event,Time_Elapsed,Strength,sort_order,Ev_Zone,xC,yC,p1_ID,Ev_Team,p2_ID,Type,p3_ID,Home_Score,Away_Score
0,1,520,PERIOD-START,20:00,1551,10,,,,,,,,,,
1,1,502,FACEOFF,20:00,1551,11,N,0.0,0.0,8473994.0,25.0,8480012.0,,,,
2,1,503,HIT,19:19,1551,19,O,-95.0,-27.0,8476468.0,23.0,8480036.0,,,,
3,1,525,TAKEAWAY,19:14,1551,21,D,50.0,6.0,8476468.0,23.0,,,,,
4,1,506,SHOT-ON-GOAL,19:09,1551,22,O,-57.0,-4.0,8478444.0,23.0,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
285,3,516,STOPPAGE,00:14,0651,788,,,,,,,,,,
286,3,502,FACEOFF,00:14,0651,789,D,69.0,22.0,8476468.0,23.0,8473994.0,,,,
287,3,508,BLOCKED-SHOT,00:09,0651,790,D,42.0,-35.0,8475794.0,23.0,8480012.0,,,,
288,3,521,PERIOD-END,00:00,0651,791,,,,,,,,,,


<br>

### Best Current Scraper

In [20]:
def get_play_by_play(game_id):
    
    url = 'https://api-web.nhle.com/v1/gamecenter/{}/play-by-play'.format(game_id)
    
    try:
        pbp = requests.get('https://api-web.nhle.com/v1/gamecenter/'+str(game_id)+'/play-by-play')
        pbp_data = pbp.json()
    
    except Exception as e:
        print('Unable to get play-by-play for Game_Id {}'.format(game_id))
        
    else:
        print('Scraping Game Id',game_id) # Print status message
    
        awayTeam_Id, homeTeam_Id, game_roster = pbp_data.get('awayTeam').get('id'), pbp_data.get('homeTeam').get('id'), get_game_roster(game_id)
        goalie_ids = get_goalies_id(game_id)
        NoneType = type(None)
    
        away_on_ice_player_ids = ['awayPlayer1_id','awayPlayer2_id','awayPlayer3_id','awayPlayer4_id','awayPlayer5_id',
                                  'awayPlayer6_id']
        home_on_ice_player_ids = ['homePlayer1_id','homePlayer2_id','homePlayer3_id','homePlayer4_id','homePlayer5_id',
                                  'homePlayer6_id']
    
        cols = ['Game_Id','Date','Period','Event_tc','Event','Time_Elapsed','Strength','Ev_Zone','Type','Ev_Team','Away_Team',
                'Home_Team','p1_name','p1_ID','p2_name','p2_ID','p3_name','p3_ID','awayPlayer1','awayPlayer1_id','awayPlayer2',
                'awayPlayer2_id','awayPlayer3','awayPlayer3_id','awayPlayer4','awayPlayer4_id','awayPlayer5','awayPlayer5_id',
                'awayPlayer6','awayPlayer6_id','homePlayer1','homePlayer1_id','homePlayer2','homePlayer2_id','homePlayer3',
                'homePlayer3_id','homePlayer4','homePlayer4_id','homePlayer5','homePlayer5_id','homePlayer6','homePlayer6_id',
                'Away_Score','Home_Score','Away_Goalie','Away_Goalie_Id','Home_Goalie','Home_Goalie_Id','xC','yC']

        pbp_df = pd.DataFrame(index=np.arange(len(pbp_data.get('plays'))),columns=cols)
        
        pbp_df['Game_Id'] = pbp_data.get('id')
        pbp_df['Date'] = pbp_data.get('gameDate')
        pbp_df['Away_Team'] = pbp_data.get('awayTeam').get('abbrev')
        pbp_df['Home_Team'] = pbp_data.get('homeTeam').get('abbrev')
    
        for i in range(len(pbp_data.get('plays'))):
            
            this_play = pbp_data.get('plays')[i]
    
            pbp_df.at[i,'Period'] = this_play.get('period')
            pbp_df.at[i,'Event_tc'] = this_play.get('typeCode')
            pbp_df.at[i,'Event'] = typeCodes.get(this_play.get('typeCode'))
            pbp_df.at[i,'Time_Elapsed'] = this_play.get('timeInPeriod')
            pbp_df.at[i,'Strength'] = str(this_play.get('situationCode'))
    
            if i == 0:
                pbp_df.at[i,'Away_Score'] = 0
                pbp_df.at[i,'Home_Score'] = 0
            elif this_play.get('typeCode') != 505:
                pbp_df.at[i,'Away_Score'] = pbp_df.at[i-1,'Away_Score']
                pbp_df.at[i,'Home_Score'] = pbp_df.at[i-1,'Home_Score']
   
            if this_play.get('typeCode') in [502,503,504,505,506,507,508,509,525,537]: # If Event has xC and yC
                pbp_df.at[i,'xC'] = this_play.get('details').get('xCoord')
                pbp_df.at[i,'yC'] = this_play.get('details').get('yCoord')
                pbp_df.at[i,'Ev_Zone'] = this_play.get('details').get('zoneCode')
                if this_play.get('details').get('eventOwnerTeamId') == awayTeam_Id:
                    pbp_df.at[i,'Ev_Team'] = pbp_data.get('awayTeam').get('abbrev')
                else:
                    pbp_df.at[i,'Ev_Team'] = pbp_data.get('homeTeam').get('abbrev')
            
            if this_play.get('typeCode') == 502: # If it's a faceoff
                pbp_df.at[i,'p1_ID'] = this_play.get('details').get('winningPlayerId')
                pbp_df.at[i,'p1_name'] = game_roster.get(this_play.get('details').get('winningPlayerId'))
                pbp_df.at[i,'p2_ID'] = this_play.get('details').get('losingPlayerId')
                pbp_df.at[i,'p2_name'] = game_roster.get(this_play.get('details').get('losingPlayerId'))
        
            if this_play.get('typeCode') == 503: # If it's a hit
                pbp_df.at[i,'p1_ID'] = this_play.get('details').get('hittingPlayerId')
                pbp_df.at[i,'p1_name'] = game_roster.get(this_play.get('details').get('hittingPlayerId'))
                pbp_df.at[i,'p2_ID'] = this_play.get('details').get('hitteePlayerId')
                pbp_df.at[i,'p2_name'] = game_roster.get(this_play.get('details').get('hitteePlayerId'))
    
            if this_play.get('typeCode') == 504: # If it's a giveaway
                pbp_df.at[i,'p1_ID'] = this_play.get('details').get('playerId')
                pbp_df.at[i,'p1_name'] = game_roster.get(this_play.get('details').get('playerId'))
            
            if this_play.get('typeCode') in shotCodes: # If the play is a shot type
                if this_play.get('details').get('eventOwnerTeamId') == pbp_data.get('awayTeam').get('id'): # Away team shooting
                    pbp_df.at[i,'Home_Goalie_Id'] = this_play.get('details').get('goalieInNetId')
                    pbp_df.at[i,'Home_Goalie'] = game_roster.get(this_play.get('details').get('goalieInNetId'))     
                if this_play.get('details').get('eventOwnerTeamId') == pbp_data.get('homeTeam').get('id'): # Home team shooting
                    pbp_df.at[i,'Away_Goalie_Id'] = this_play.get('details').get('goalieInNetId')
                    pbp_df.at[i,'Away_Goalie'] = game_roster.get(this_play.get('details').get('goalieInNetId'))            
        
            if this_play.get('typeCode') == 505: # If it's a goal
                if type(this_play.get('details').get('shotType')) != NoneType: # If shotType is available
                    pbp_df.at[i,'Type'] = this_play.get('details').get('shotType').upper()
                pbp_df.at[i,'p1_ID'] = this_play.get('details').get('scoringPlayerId')
                pbp_df.at[i,'p1_name'] = game_roster.get(this_play.get('details').get('scoringPlayerId'))
                pbp_df.at[i,'p2_ID'] = this_play.get('details').get('assist1PlayerId')
                pbp_df.at[i,'p2_name'] = game_roster.get(this_play.get('details').get('assist1PlayerId'))
                pbp_df.at[i,'p3_ID'] = this_play.get('details').get('assist2PlayerId')
                pbp_df.at[i,'p3_name'] = game_roster.get(this_play.get('details').get('assist2PlayerId'))
                pbp_df.at[i,'Away_Score'] = this_play.get('details').get('awayScore')
                pbp_df.at[i,'Home_Score'] = this_play.get('details').get('homeScore')

            if this_play.get('typeCode') == 506: # If it's a shot on goal
                if type(this_play.get('details').get('shotType')) != NoneType: # If shotType is not available
                    pbp_df.at[i,'Type'] = this_play.get('details').get('shotType').upper()
                pbp_df.at[i,'p1_ID'] = this_play.get('details').get('shootingPlayerId')
                pbp_df.at[i,'p1_name'] = game_roster.get(this_play.get('details').get('shootingPlayerId'))

            if this_play.get('typeCode') == 507: # If it's a missed shot
                if type(this_play.get('details').get('shotType')) != NoneType: # If shotType is not available
                    pbp_df.at[i,'Type'] = this_play.get('details').get('shotType').upper()
                pbp_df.at[i,'p1_ID'] = this_play.get('details').get('shootingPlayerId')
                pbp_df.at[i,'p1_name'] = game_roster.get(this_play.get('details').get('shootingPlayerId'))

            if this_play.get('typeCode') == 508: # If it's blocked shot
                pbp_df.at[i,'p1_ID'] = this_play.get('details').get('blockingPlayerId')
                pbp_df.at[i,'p1_name'] = game_roster.get(this_play.get('details').get('blockingPlayerId'))
                pbp_df.at[i,'p2_ID'] = this_play.get('details').get('shootingPlayerId')
                pbp_df.at[i,'p2_name'] = game_roster.get(this_play.get('details').get('shootingPlayerId'))  
        
            if this_play.get('typeCode') == 509: # If it's a penalty
                pbp_df.at[i,'Type'] = this_play.get('details').get('typeCode') + ' for ' + this_play.get('details').get('descKey').upper()
                pbp_df.at[i,'p1_ID'] = this_play.get('details').get('committedByPlayerId')
                pbp_df.at[i,'p1_name'] = game_roster.get(this_play.get('details').get('committedByPlayerId'))
                pbp_df.at[i,'p2_ID'] = this_play.get('details').get('drawnByPlayerId')
                pbp_df.at[i,'p2_name'] = game_roster.get(this_play.get('details').get('drawnByPlayerId'))

            if this_play.get('typeCode') == 516: # If it's a stoppage
                pbp_df.at[i,'Type'] = this_play.get('details').get('reason').upper()

            if this_play.get('typeCode') == 525: # If it's a takeaway
                pbp_df.at[i,'p1_ID'] = this_play.get('details').get('playerId')
                pbp_df.at[i,'p1_name'] = game_roster.get(this_play.get('details').get('playerId'))

    #      if this_play.get('typeCode') == 535: # If it's a delayed penalty


            if this_play.get('typeCode') == 537: # If it's a failed shot attempt
                if type(this_play.get('details').get('shotType')) != NoneType: # If shotType is not available
                    pbp_df.at[i,'Type'] = this_play.get('details').get('shotType').upper()
                pbp_df.at[i,'p1_ID'] = this_play.get('details').get('shootingPlayerId')
                pbp_df.at[i,'p1_name'] = game_roster.get(this_play.get('details').get('shootingPlayerId'))

        # Adding Shift Data
        shifts = requests.get('https://api.nhle.com/stats/rest/en/shiftcharts?cayenneExp=gameId={}'.format(game_id))

        shift_data = shifts.json()
        shift_data.get('data')

        homePlayerList = ['homePlayer1_id','homePlayer2_id','homePlayer3_id','homePlayer4_id','homePlayer5_id','homePlayer6_id']
        homePlayerList_names = ['homePlayer1','homePlayer2','homePlayer3','homePlayer4','homePlayer5','homePlayer6']
        awayPlayerList = ['awayPlayer1_id','awayPlayer2_id','awayPlayer3_id','awayPlayer4_id','awayPlayer5_id','awayPlayer6_id']
        awayPlayerList_names = ['awayPlayer1','awayPlayer2','awayPlayer3','awayPlayer4','awayPlayer5','awayPlayer6']

        for i in range(len(shift_data.get('data'))):

            shift = shift_data.get('data')[i]

            period = shift.get('period')
            shift_start = datetime.strptime(shift.get('startTime'),'%M:%S')
            shift_end = datetime.strptime(shift.get('endTime'),'%M:%S')

            for j in range(len(pbp_df)):

                time_elapsed = datetime.strptime(pbp_df.at[j,'Time_Elapsed'],'%M:%S')

                awayPlayerList_loop = [pbp_df.at[j,'awayPlayer1_id'],pbp_df.at[j,'awayPlayer2_id'],pbp_df.at[j,'awayPlayer3_id'],
                                       pbp_df.at[j,'awayPlayer4_id'],pbp_df.at[j,'awayPlayer5_id'],pbp_df.at[j,'awayPlayer6_id']]
                homePlayerList_loop = [pbp_df.at[j,'homePlayer1_id'],pbp_df.at[j,'homePlayer2_id'],pbp_df.at[j,'homePlayer3_id'],
                                       pbp_df.at[j,'homePlayer4_id'],pbp_df.at[j,'homePlayer5_id'],pbp_df.at[j,'homePlayer6_id']]

                if (period == pbp_df.at[j,'Period']) & (shift_start <= time_elapsed < shift_end):

                    if shift.get('teamId') == awayTeam_Id:
                        if shift.get('playerId') not in awayPlayerList_loop:
                            for k in range(len(awayPlayerList)):
                                if pd.isna(pbp_df.at[j,awayPlayerList[k]]) == True:
                                    pbp_df.at[j,awayPlayerList[k]] = shift.get('playerId')
                                    pbp_df.at[j,awayPlayerList_names[k]] = game_roster.get(shift.get('playerId'))
                                    break

                    if shift.get('teamId') == homeTeam_Id:
                        if shift.get('playerId') not in homePlayerList_loop:
                            for k in range(len(homePlayerList)):
                                if pd.isna(pbp_df.at[j,homePlayerList[k]]) == True:
                                    pbp_df.at[j,homePlayerList[k]] = shift.get('playerId')
                                    pbp_df.at[j,homePlayerList_names[k]] = game_roster.get(shift.get('playerId'))
                                    break
                                
        return pbp_df

In [21]:
# %%time

# # Testing
# get_play_by_play(game_id)

Scraping Game Id 2023020172
CPU times: total: 9.77 s
Wall time: 18.4 s


Unnamed: 0,Game_Id,Date,Period,Event_tc,Event,Time_Elapsed,Strength,Ev_Zone,Type,Ev_Team,...,homePlayer6,homePlayer6_id,Away_Score,Home_Score,Away_Goalie,Away_Goalie_Id,Home_Goalie,Home_Goalie_Id,xC,yC
0,2023020172,2023-11-04,1,520,PERIOD_START,00:00,1551,,,,...,ANDREI KUZMENKO,8483808,0,0,,,,,,
1,2023020172,2023-11-04,1,502,FACEOFF,00:00,1551,N,,DAL,...,ANDREI KUZMENKO,8483808,0,0,,,,,0,0
2,2023020172,2023-11-04,1,503,HIT,00:41,1551,O,,VAN,...,QUINN HUGHES,8480800,0,0,,,,,-95,-27
3,2023020172,2023-11-04,1,525,TAKEAWAY,00:46,1551,D,,VAN,...,QUINN HUGHES,8480800,0,0,,,,,50,6
4,2023020172,2023-11-04,1,506,SHOT_ON_GOAL,00:51,1551,O,SNAP,VAN,...,QUINN HUGHES,8480800,0,0,JAKE OETTINGER,8479979,,,-57,-4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
285,2023020172,2023-11-04,3,516,STOPPAGE,19:46,0651,,GOALIE-STOPPED-AFTER-SOG,,...,ELIAS PETTERSSON,8480012,0,2,,,,,,
286,2023020172,2023-11-04,3,502,FACEOFF,19:46,0651,D,,VAN,...,ELIAS PETTERSSON,8480012,0,2,,,,,69,22
287,2023020172,2023-11-04,3,508,BLOCKED_SHOT,19:51,0651,D,,VAN,...,ELIAS PETTERSSON,8480012,0,2,,,,,42,-35
288,2023020172,2023-11-04,3,521,PERIOD_END,20:00,0651,,,,...,,,0,2,,,,,,


<br>

## Getting Play-By-Play for Multiple Games

In [22]:
def get_multi_play_by_play(range_of_ids):
    
    df = get_play_by_play(range_of_ids[0])
    
    for i in range(1,len(range_of_ids)):
        df2 = get_play_by_play(range_of_ids[i])
        df = pd.concat([df,df2],axis=0)
        
    return df

In [23]:
# Testing
# get_multi_play_by_play(game_ids)

<br>