# Build DIM Tables

After doing some work on the key boxscore and scoring data, I want to build out the framework for dimensional tables that could be a part of these datasets:

- dim_team
- dim_game
- dim_date
- dim_schedule
- dim_score_type
- dim_sportsbook

Using Rapid API Link: https://rapidapi.com/tank01/api/tank01-nfl-live-in-game-real-time-statistics-nfl

The free tier offers 100 calls per day

## Imports

In [1]:
import pandas as pd
import json
import os
import requests

from dotenv import load_dotenv

## API Credentials

In [2]:
#Get API Key and host
load_dotenv()
api_token = os.getenv('nfl_api_key')
api_host = os.getenv('rapid_api_host')

In [3]:
#rapidapi headers
headers = {
	"X-RapidAPI-Key": "{key}".format(key=api_token),
	"X-RapidAPI-Host": "{host}".format(host=api_host)
}

## Manual Data Adds

These will be for the items that are in the API only by name as far as I would tell. And it would be less efficient to try to loop through all the possible options. So I will define them to the best of my ability.

- score types
- sportsbooks
- game types

### Create dictionaries for lookups. 

#### Score Type Table

In [4]:
score_type_dict = {
    "TD": 1,
    "SF": 2,
    "FG": 3
}

In [7]:
pd.DataFrame(score_type_dict.items(), columns=[['score_type', 'score_type_id']])

Unnamed: 0,score_type,score_type_id
0,TD,1
1,SF,2
2,FG,3


#### Sportsbooks Table

In [5]:
sportsbook_dict = {
    "betmgm": 1,
    "bet365": 2,
    "fanduel": 3,
    "wynnbet": 4,
    "unibet": 5,
    "pointsbet": 6,
    "betrivers": 7,
    "ceasars_sportsbook": 8,
    "draftkings": 9,   
}

#### Game Type Table

In [6]:
game_type_dict = {
    "Preseason": 1,
    "Regular Season": 2,
    "Postseason": 3
}

## Team Data

This data will come from the "Get NFL Teams" endpoint in the Rapid API.

### Extract

In [4]:
#API Endpoint for NFL Team Info
url = "https://tank01-nfl-live-in-game-real-time-statistics-nfl.p.rapidapi.com/getNFLTeams"

In [6]:
#pull in only main team info, none of the rosters, schedules, stats, etc.
querystring = {"rosters":"false","schedules":"false","topPerformers":"false","teamStats":"false"}

In [7]:
#get team data
response = requests.get(url, headers=headers, params=querystring)

In [11]:
#export data file for reference
with open('../data/team_data.json', 'w') as file:
    json.dump(response.json(), file)

### Transform

In [13]:
#drill into body of api response
teams = response.json()['body']

In [14]:
teams[0]

{'teamAbv': 'ARI',
 'teamCity': 'Arizona',
 'currentStreak': {'result': 'W', 'length': '1'},
 'loss': '2',
 'teamName': 'Cardinals',
 'nflComLogo1': 'https://static.www.nfl.com/image/private/f_auto/league/u9fltoslqdsyao8cpm0k',
 'teamID': '1',
 'tie': '0',
 'pa': '67',
 'pf': '72',
 'espnLogo1': 'https://a.espncdn.com/combiner/i?img=/i/teamlogos/nfl/500/ari.png',
 'wins': '1'}

In [17]:
#save record for each team in list
team_list = []

#create dictionary for team data
for team in teams:
    team_info = {
        "team_id": team['teamID'],
        "team_name_location": team['teamCity'],
        "team_name": team['teamName'],
        "team_abrv": team['teamAbv'],
        "team_logo_link": team['nflComLogo1']
    }

    team_list.append(team_info)

In [20]:
team_list[:2]

[{'team_id': '1',
  'team_name_location': 'Arizona',
  'team_name': 'Cardinals',
  'team_abrv': 'ARI',
  'team_logo_link': 'https://static.www.nfl.com/image/private/f_auto/league/u9fltoslqdsyao8cpm0k'},
 {'team_id': '2',
  'team_name_location': 'Atlanta',
  'team_name': 'Falcons',
  'team_abrv': 'ATL',
  'team_logo_link': 'https://static.www.nfl.com/image/private/f_auto/league/d8m7hzpsbrl6pnqht8op'}]

## Schedule

This will come from the "Get NFL Team Schedule" endpoint. I won't include a schedule_id, that could be added later in the database as a bigint datatype.

The querystring requires a teamID or Abrv and the season. I'll prioritize 2022 for now.

My idea so far is to have a record for each team in each game. The API response has the team abrv in the body of the json. So I can do a conditional to check if the home or away team matches that abrv to then get the team information. This way it will grab the same game for each team's schedule, but only save **their** data. And not both teams.

### Extract

In [7]:
#API Endpoint for schedule info
sched_url = "https://tank01-nfl-live-in-game-real-time-statistics-nfl.p.rapidapi.com/getNFLTeamSchedule"

In [40]:
#pull in tes with cardinals data
sched_querystring = {"teamID":"1","season":"2022"}

In [41]:
#get team data
sched_response = requests.get(sched_url, headers=headers, params=sched_querystring)

In [42]:
sched_response.json()['body']['team']

'ARI'

In [43]:
sched_response.json()['body']['schedule'][0]

{'gameID': '20220812_ARI@CIN',
 'seasonType': 'Preseason',
 'away': 'ARI',
 'teamIDHome': '7',
 'gameDate': '20220812',
 'gameStatus': 'Completed',
 'gameWeek': 'Preseason Week 1',
 'teamIDAway': '1',
 'home': 'CIN',
 'awayResult': 'W',
 'homePts': '23',
 'gameTime': '7:30p',
 'homeResult': 'L',
 'awayPts': '36'}

### Transform

#### Single Team Test

In [46]:
#drill to schedule details

schedule_body = sched_response.json()['body']

In [47]:
#save record for schedule in test
schedule_list = []

#identify team abrv to check against
team = schedule_body['team']

#drill into schedule
schedule = schedule_body['schedule']

#create dictionary for team data
for game in schedule:
    if game['home'] == team:
        schedule_info = {
            "game_id": game['gameID'],
            "team_id": game['teamIDHome'],
            "game_type_id": game_type_dict.get(game['seasonType'], ""),
            "season": 2022, #can I find a way to automate this?
            "game_week": game['gameWeek'],
            "is_home_team_flag": 1,
            "is_complete_flag": 1
        }
    else:
        schedule_info = {
            "game_id": game['gameID'],
            "team_id": game['teamIDAway'],
            "game_type_id": game_type_dict.get(game['seasonType'], ""),
            "season": 2022, #can I find a way to automate this?
            "game_week": game['gameWeek'],
            "is_home_team_flag": 0,
            "is_complete_flag": 1
        }
    
    schedule_list.append(schedule_info)

In [50]:
schedule_list[:2]

[{'game_id': '20220812_ARI@CIN',
  'team_id': '1',
  'game_type_id': 1,
  'season': 2022,
  'game_week': 'Preseason Week 1',
  'is_home_team_flag': 0,
  'is_complete_flag': 1},
 {'game_id': '20220821_BAL@ARI',
  'team_id': '1',
  'game_type_id': 1,
  'season': 2022,
  'game_week': 'Preseason Week 2',
  'is_home_team_flag': 1,
  'is_complete_flag': 1}]

#### Multiple Teams

Pull all team IDs from team table and prep to loop through each to save full schedule data.

I'll run with a subset to save API calls.

In [8]:
#set variables
sample_team_ids = ['1','18']
season = 2022

In [44]:
#setup list to save schedules to
test_schedule_list = []
test_list = []

#run with sample
for team in sample_team_ids:
    #set query string
    test_querystring = {"teamID":"{team_id}".format(team_id=team),"season":"{season}".format(season=season)}
    #call API
    test_response = requests.get(sched_url, headers=headers, params=test_querystring)
    #drill to schedule details
    test_schedule_body = test_response.json()['body']

    ###transform steps###
    #identify team abrv to check against
    team = test_schedule_body['team']

    #drill into schedule
    schedule = test_schedule_body['schedule']

    #test saving off new variables to splitup extract and transform steps
    test_list.append([team, season, schedule])

    # #create dictionary for team data
    # for game in schedule:
    #     if game['home'] == team:
    #         schedule_info = {
    #             "game_id": game['gameID'],
    #             "team_id": game['teamIDHome'],
    #             "game_type_id": game_type_dict.get(game['seasonType'], ""),
    #             "season": season, #can I find a way to automate this?
    #             "game_week": game['gameWeek'],
    #             "is_home_team_flag": 1,
    #             "is_complete_flag": 1
    #         }
    #     else:
    #         schedule_info = {
    #             "game_id": game['gameID'],
    #             "team_id": game['teamIDAway'],
    #             "game_type_id": game_type_dict.get(game['seasonType'], ""),
    #             "season": season, #can I find a way to automate this?
    #             "game_week": game['gameWeek'],
    #             "is_home_team_flag": 0,
    #             "is_complete_flag": 1
    #         }
        
    #     test_schedule_list.append(schedule_info)    

#### Update - Test exporting just the raw data 

Export into a list with sublist: [Team, Schedule Data]

Then transform from this list

In [45]:
#create empty list for this
new_list = []

for i in test_list:
    #set team and season value
    team = i[0]
    season = i[1]
    #now loop through games
    for game in i[2]:
        #create dictionary for team data
        if game['home'] == team:
            schedule_info = {
                "game_id": game['gameID'],
                "team_id": game['teamIDHome'],
                "game_type_id": game_type_dict.get(game['seasonType'], ""),
                "season": season, #can I find a way to automate this?
                "game_week": game['gameWeek'],
                "is_home_team_flag": 1,
                "is_complete_flag": 1
            }
        else:
            schedule_info = {
                "game_id": game['gameID'],
                "team_id": game['teamIDAway'],
                "game_type_id": game_type_dict.get(game['seasonType'], ""),
                "season": season, #can I find a way to automate this?
                "game_week": game['gameWeek'],
                "is_home_team_flag": 0,
                "is_complete_flag": 1
            }
        
        new_list.append(schedule_info)  


In [46]:
new_list

[{'game_id': '20220812_ARI@CIN',
  'team_id': '1',
  'game_type_id': 1,
  'season': 2022,
  'game_week': 'Preseason Week 1',
  'is_home_team_flag': 0,
  'is_complete_flag': 1},
 {'game_id': '20220821_BAL@ARI',
  'team_id': '1',
  'game_type_id': 1,
  'season': 2022,
  'game_week': 'Preseason Week 2',
  'is_home_team_flag': 1,
  'is_complete_flag': 1},
 {'game_id': '20220827_ARI@TEN',
  'team_id': '1',
  'game_type_id': 1,
  'season': 2022,
  'game_week': 'Preseason Week 3',
  'is_home_team_flag': 0,
  'is_complete_flag': 1},
 {'game_id': '20220911_KC@ARI',
  'team_id': '1',
  'game_type_id': 2,
  'season': 2022,
  'game_week': 'Week 1',
  'is_home_team_flag': 1,
  'is_complete_flag': 1},
 {'game_id': '20220918_ARI@LV',
  'team_id': '1',
  'game_type_id': 2,
  'season': 2022,
  'game_week': 'Week 2',
  'is_home_team_flag': 0,
  'is_complete_flag': 1},
 {'game_id': '20220925_LAR@ARI',
  'team_id': '1',
  'game_type_id': 2,
  'season': 2022,
  'game_week': 'Week 3',
  'is_home_team_flag':

## Game Data

This data will come from the "Get General Game Information" endpoint.

game_id will be our primary key, it will also be the parameter in the endpoint. So I will need to pull all the unique game IDs from the schedule to pull down the game data. This grows pretty big for a whole season, more than my daily API limit. So I will need to do this in steps. Here, I'll just test a sample of a few.

Will need to get the game location and arena from the box score data. I already have to call that API to build another table. So I can try to adapt that code to save off a persistant object I can refer to later.

### Extract

In [6]:
#setup variabes
game_url = "https://tank01-nfl-live-in-game-real-time-statistics-nfl.p.rapidapi.com/getNFLGameInfo"

game_id_sample = ['20230114_LAC@JAX']#, '20221009_PHI@ARI', '20221020_NO@ARI', '20221121_SF@ARI']

In [7]:
#pull single sample
game_test_querystring = {"gameID":"20221020_NO@ARI"}
game_test_response = requests.get(game_url, headers=headers, params=game_test_querystring)

In [9]:
#view formatting
game_test_response.json()['body']

{'espnID': '401437791',
 'gameStatus': 'Final',
 'season': '2022',
 'gameDate': '20221020',
 'neutralSite': 'False',
 'teamIDHome': '1',
 'cbsLink': 'https://www.cbssports.com/nfl/gametracker/boxscore/NFL_20221020_NO@ARI',
 'gameTime': '8:15p',
 'teamIDAway': '23',
 'away': 'NO',
 'gameWeek': 'Week 7',
 'gameID': '20221020_NO@ARI',
 'seasonType': 'Regular Season',
 'espnLink': 'https://www.espn.com/nfl/boxscore/_/gameId/401437791',
 'home': 'ARI'}

In [10]:
#recreate game_type_mapping
game_type_dict = {
    "Preseason": 1,
    "Regular Season": 2,
    "Postseason": 3
}

#### Test Game Location Read
Since the game location isn't available in this endpoint, our plan is to save that off in a json and read it in when we update the game data. I'll test that here.

In [39]:
lookup_game_id = 'errortest' #'20230108_ARI@SF'

In [42]:
with open('../data/persist_variables.json', 'r') as f:
    data = json.load(f)

for i in data['location_data']:
    if i['game_id'] == lookup_game_id:
        loc = i['game_location']
        arena = i['game_arena']
        break
    else:
        loc = ''
        arena = ''

In [43]:
print(loc)
print(arena)





In [11]:
#setup dictionary
#save record for schedule in test
game_list = []

#test on our sample
for gameID in game_id_sample:
    game_querystring = {"gameID":"{game_id}".format(game_id=gameID)}
    game_response = requests.get(game_url, headers=headers, params=game_querystring)

    #drill into game
    game = game_response.json()['body']

    #flag if the game is on a neutral site
    if game['neutralSite'] == 'True':
        neutral_site_flag = 1
    else:
        neutral_site_flag = 0

    #map game type - from our game_type_dict
    game_type_id = game_type_dict.get(game['seasonType'], game['seasonType'])

    #create dictionary for team data
    game_info = {
        "game_id": game['gameID'],
        "game_date_id": game['gameDate'],
        "game_type_id": game_type_id,
        "home_team_id": game['teamIDHome'],
        "away_team_id": game['teamIDAway'],
        "game_start_time": game['gameTime'],
        "game_location": '',
        "game_arena": '',
        "is_neautral_site_flag": neutral_site_flag,
        "espn_link": game['espnLink'],
        "cbs_link": game['cbsLink']
    }
        
    game_list.append(game_info)

In [12]:
game_list[:2]

[{'game_id': '20230114_LAC@JAX',
  'game_date_id': '20230114',
  'game_type_id': 3,
  'home_team_id': '15',
  'away_team_id': '18',
  'game_start_time': '8:15p',
  'game_location': '',
  'game_arena': '',
  'is_neautral_site_flag': 0,
  'espn_link': 'https://www.espn.com/nfl/boxscore/_/gameId/401437998',
  'cbs_link': 'https://www.cbssports.com/nfl/gametracker/boxscore/NFL_20230114_LAC@JAC'}]