# API calls to retrieve data from the game endpoint of the NHL's stats API
## Three libraries needed for this one:
* Requests will be used to access the API itself.
* Json will parse the result and be used to manipulate and write said result to disk.
* Time is used to add time between API calls to reduce the amount of times the API connection is forced closed by host.

In [5]:
import requests
import json
import time

## Year_match function
Not all years have the same amount of games played. This is problematic because using a single number would either lead to a lot of useless API calls returning "Game data not found" or would leave out a significant amount of games that were played and were not queried.
General solution: The number of games played in a regular NHL season is the number of teams in the league in that season times 41 (82 game season/2 teams per game). There are two significant reasons that this does not apply to all seasons.
* Not all seasons are 82 games long. Examples: 
    * 2020 season shortened by COVID-19
    * 2012 season shortened by contract negotiations
    * 2019 season shortened unevenly by COVID-19
* Not all seasons exist
    * 2004-2005 season cancelled from contract negotiations

Solution: Check whether season exists (is greater than 1917 and is not 2004). If it does not exist, return 1. Check whether season has uneven number of games played (2019). If so, return specific amount of games played. For each year that exists and has an even amount of games played, reference https://statsapi.web.nhl.com/api/v1/seasons/{seasonID} to retrieve numberOfGames, then sum the number of entries in https://statsapi.web.nhl.com/api/v1/franchises for which seasonID is in range (firstSeasonID, lastSeasonID). Multiply these, divide by 2, and add 1 to return general match.

In [74]:
def year_match(year):
    if(year==2020):
        return 869 ##Covid-related lockdown shortened the season
    elif(year==2022):
        return 361  ##FIX ME! Hard coded with current (as of 11/29/2022 at noon CST) number of games played in
                    ##the current season; this should ideally be redone to be variable by the
                    ##current season (year if date > July, year-1 if date<July), get default value from max(
                    ##https://statsapi.web.nhl.com/api/v1/schedule[dates[0[games[range(0, len(games))[gamePK]]]]]). 
                    ##For days without games, iterate backward through ?date=today(ISO 8601) until games?
    elif(year==2012):
        return 721 ##Lockout-shortened year
    elif(year==2004):
        return 1 ##Year cancelled due to labor dispute
    elif(year>=2021):
        return 1313 ##32-team league means 1312 games. Heck yeah.
    elif(year>=2017):
        return 1272 ##31-team league means 1271 games.
    elif(year<2017):
        return 1231 ##30-team league means 1230 games.
    else:
        print("Please check your years and try again.")
        return null

## Game_pull function
### Retrieve games from the NHL API. 

Long timeout because I also got a couple of timeout errors.

Encoding for file output in utf-8 to deal with the copyright symbol in every single response.

Print statement ends in a carriage return to overwrite the current line instead of making a very tall and mostly useless stack of outputs.

Extremely long sleep here, kept getting connections reset with shorter sleeps.




In [19]:
def game_pull(yearStart, yearEnd):
    # This loop iterates through the years starting from the yearStart variable until the yearEnd variable
    for i in range(yearStart, yearEnd):
        # This loop iterates through the games in a given year
        for j in range(1, year_match(i)):
            # Calculate the gameID by concatenating the year and game number
            gameID = i*1000000 + 20000 + j
            # Construct the API endpoint using the gameID
            endpoint = f'https://statsapi.web.nhl.com/api/v1/game/{gameID}/feed/live'
            # Make a request to the API and get the game data in JSON format
            game = requests.get(endpoint, timeout=30).json()
            # Write the game data to a file named "game_" followed by the gameID
            with open("./games/game_"+str(gameID)+".json", "w", encoding='utf-8') as outfile:
                json.dump(game, outfile, ensure_ascii=False, indent=4, sort_keys=True)
            # Print the gameID to show progress and include a return character to overwrite the line on each iteration
            print(gameID, end="\r")
            # Sleep for 15 seconds to avoid overloading the API with requests
            time.sleep(15)

Same as above cell, but this one adds a second copy of the loop to deal with situations where you're cut off in the middle of a batch of requests.

In [28]:
# This function pulls game data from the NHL API and saves it to a file
def game_pull_resumed(yearReached, yearEnd, numberReached):
    # Loop through all the games in the specified range
    for j in range(numberReached, year_match(yearReached)):
        # Calculate the game ID using the year and game number
        gameID = yearReached*1000000 + 20000 + j
        
        # Construct the API endpoint for the game
        endpoint = f'https://statsapi.web.nhl.com/api/v1/game/{gameID}/feed/live'
        
        # Make a request to the endpoint and get the game data
        game = requests.get(endpoint, timeout=30).json()
        
        # Save the game data to a file
        with open("./games/game_"+str(gameID)+".json", "w", encoding='utf-8') as outfile:
            json.dump(game, outfile, ensure_ascii=False, indent=4, sort_keys=True)
        
        # Print the game ID to the console
        print(gameID, end="\r")
        
        # Sleep for 15 seconds
        time.sleep(15)
    # Continue above cell, starting at the next full year.
    for i in range(yearReached+1, yearEnd):
        for j in range(1, year_match(i)):
            gameID = i*1000000 + 20000 + j
            endpoint = f'https://statsapi.web.nhl.com/api/v1/game/{gameID}/feed/live'
            game = requests.get(endpoint, timeout=30).json()
            with open("./games/game_"+str(gameID)+".json", "w", encoding='utf-8') as outfile:
                json.dump(game, outfile, ensure_ascii=False, indent=4, sort_keys=True)
            print(gameID, end="\r")
            time.sleep(15)

In [34]:
game_pull(2012, 2021)

2012020656

ConnectionError: ('Connection aborted.', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))

In [51]:
game_pull_resumed(2020, 2021, 612)

2020020868

# Shift Charts pull function
## Retrieve shift-by-shift player on ice data for games handled by the NHL API

Similar to the above, with a different endpoint and a shorter timeout (api.nhle.com seems less close-connection-happy on testing.)

When both of these functions have been called, you will have a .json file for each game in both the games and shifts directory.

Todo: Consider combining these into a single function that supports other NHL API endpoints as well, plus different unique identifiers (such as player ID, team ID, or date).

In [56]:
def shift_charts_pull(yearStart, yearEnd):
    # Loop over the years in the specified range
    for i in range(yearStart, yearEnd):
        # Loop over the games in the current year
        for j in range(1, year_match(i)):
            # Create a unique identifier for the game
            gameID = i * 1000000 + 20000 + j

            # Create an API endpoint for the shift data for the current game
            endpoint = f'https://api.nhle.com/stats/rest/en/shiftcharts?cayenneExp=gameId={gameID}'

            # Send a request to the endpoint and get the shift data
            game = requests.get(endpoint, timeout=30).json()

            # Save the shift data to a JSON file
            with open("./shifts/game_"+str(gameID)+".json", "w", encoding='utf-8') as outfile:
                json.dump(game, outfile, ensure_ascii=False, indent=4, sort_keys=True)

            # Print the current game ID and move the cursor to the start of the line
            # This is to show progress without printing a newline for each game
            print(gameID, end="\r")

            # Sleep for 10 seconds to avoid overloading the API
            time.sleep(10)

In [73]:
def shift_charts_pull_resumed(yearReached, yearEnd, numberReached):
    # Iterate over all remaining game IDs in the year reached by previous iteration
    for j in range(numberReached, year_match(yearReached)):
        gameID = yearReached*1000000 + 20000 + j
        endpoint = f'https://api.nhle.com/stats/rest/en/shiftcharts?cayenneExp=gameId={gameID}'
        game = requests.get(endpoint, timeout=30).json()
        
        # Save the game data to a file
        with open("./shifts/game_"+str(gameID)+".json", "w", encoding='utf-8') as outfile:
            json.dump(game, outfile, ensure_ascii=False, indent=4, sort_keys=True)
        
        # Print the game ID
        print(gameID, end="\r")
        
        # Wait 10 seconds before continuing
        time.sleep(10)
    # Loop over remaining years
    for i in range(yearReached+1, yearEnd):
        # This loop iterates through the games in a given year
        for j in range(1, year_match(i)):
            gameID = i*1000000 + 20000 + j
            endpoint = f'https://api.nhle.com/stats/rest/en/shiftcharts?cayenneExp=gameId={gameID}'
            game = requests.get(endpoint, timeout=30).json()
            
            with open("./shifts/game_"+str(gameID)+".json", "w", encoding='utf-8') as outfile:
                json.dump(game, outfile, ensure_ascii=False, indent=4, sort_keys=True)
            # Print the gameID to show progress and include a return character to overwrite the line on each iteration
            print(gameID, end="\r")
            time.sleep(10)

In [72]:
shift_charts_pull_resumed(2022, 2023, 148)

2022020360