# NBA Bet Prop Analysis

## Part 1 - Setup

This workbook will go over the creation of the Bet Prop tables that will be stored within a SQLite3 database. To do this, I will scrape the bet lines for NBA and NFL games from the DraftKings website. I will then scrape the player projections for these NBA and NFL games from SportsLine's website. By linking the bet lines for players with their SportsLine projections, I will be able to find potential good bet opportunities. However, in order to determine how good a bet opportunity is, I will have to consider more than just the difference between the two numbers. I will also need to factor in the bet odds of a bet. A large difference is not as valuable if there is a steep betting price. To assist with this comparison, I will need some data on the historical range of outcomes for a player stat in order to estimate the probability. By comparing the probability for a bet to hit to the implied probability of a bet, I can find the estimated "edge" I have for a bet. The larger the edge, the better odds I should have of winning a bet.

However, before I go risking any real money, I would want to test if this model is actually performing. To do this, I will hit RapidAPI's API-NBA feed to pull in stats from the 2021 NBA season. By pulling in the bet results and comparing them to the bet odds and projections, I will be able to grade my betting model to see if it is profitable. The process of comparing these stats with the odds and projections will take place in the second script which will also invole updating the SQLite tables daily with new data. 

In [18]:
import requests
from bs4 import BeautifulSoup
import re
import pandas as pd
import numpy as np
import json
from datetime import datetime, timedelta
from pytz import timezone
import sqlite3
from sqlalchemy import create_engine
import time

## Helper Functions 

In [19]:
def find_between( s, first, last ):
    try:
        start = s.index( first ) + len( first )
        end = s.index( last, start )
        return s[start:end]
    except ValueError:
        return ""

In [20]:
def american2DecimalOdds(americanBetOdds):
    """
    @betOdds (str) the American odds of the bet. Must be prefaced with a '+' or '-'
    
    Returns the bet odds in decimal format. 
    """
    try:
        if (americanBetOdds[0] == '+'):
            americanBetOdds = int(americanBetOdds[1:])
            decimalOdds = (americanBetOdds/100) + 1
            # Return the decimalOdds in a clean format
            return(round(decimalOdds,2))
        elif americanBetOdds[0] == '-':
            americanBetOdds = int(americanBetOdds[1:])
            decimalOdds = (100/americanBetOdds) + 1
            # Return the decimalOdds in a clean format
            return(round(decimalOdds,2))
        else:
            print("Bet odds must begin with a '+' or '-'")
    except:
        return(None)

In [21]:
def impliedOdds(betOdds):
    """
    @betOdds (str) the American odds of the bet. Must be prefaced with a '+' or '-'
    
    Takes in the betOdds and returns the implied probability of the bet
    """
    try:
        # First need to convert the American odds to decimal odds
        decimalOdds = american2DecimalOdds(betOdds)

        # Use the decimal odds to return the implied probability
        probability = 1/decimalOdds * 100

        # Return the probability rounded to the nearest whole number
        return(round(probability))
    except:
        return(None)

## DraftKings - NBA and NFL 

In [22]:
# Create a dataframe to append results to
bet_offers_df = pd.DataFrame(columns=['League', 'Game', 'StartDate', 'StartTime', 'Player', 'BetLabel', 
                                      'SportsLine_Projection', 'DK_Line', 'Outcome1_Label', 'Outcome1_Odds', 
                                      'Outcome2_Label', 'Outcome2_Odds'])

# Create a list of the DraftKings URLs to parse
urls = ["https://sportsbook.draftkings.com/leagues/basketball/88670846", 
        "https://sportsbook.draftkings.com/leagues/football/88670561"]

# Loop over the URLs and parse results
for url in urls:
    
    # Get the BeautifulSoup data from the DraftKings website
    dk_response = requests.get(url)
    dk_soup = BeautifulSoup(dk_response.text, "html.parser")

    # Narrow the Beautiful Soup extract to just the field of interest "window.__INITIAL_STATE__"

    # Filter out opening and closing <script> tags
    dk_scrape = str(list(list(list(dk_soup.children)[2])[3])[11]).replace("<script>","").replace("</script>", "")
    # Remove leading and trailing whitespace
    dk_scrape = dk_scrape.strip()
    # Split sections
    dk_scrape = dk_scrape.split(";\n")
    # Isolate to json of interest
    dk_scrape = dk_scrape[6].strip()
    # Format as json dictionary
    dk_scrape = dk_scrape.replace("window.__INITIAL_STATE__ = ","")
    dk_scrape = json.loads(dk_scrape)
    
    # Grab the sport ID from the scrape
    sportId = list(dk_scrape['eventGroups'].keys())[0]
    
    # From the full scrape of the page, pull a list of the games to loop over and extract data from
    games = dk_scrape['eventGroups'][sportId]['events'].keys()
    
    # Loop over games
    for index, game in enumerate(games):

        # Set the game JSON as variable
        game_details = dk_scrape['eventGroups'][sportId]['events'][game]
        # Get the eventId so I can scrape the actual props
        eventId = game_details['eventId']

        # Web scrape Draft Kings for player props
        props_url = f"https://sportsbook.draftkings.com/event/{eventId}"
        response = requests.get(props_url)
        soup = BeautifulSoup(response.text, "html.parser")
        
        # Clean BeautifulSoup response
        # Filter out opening and closing <script> tags
        scrape = str(list(list(list(soup.children)[2])[3])[11]).replace("<script>","").replace("</script>", "")
        # Remove leading and trailing whitespace
        scrape = scrape.strip()
        # Split sections
        scrape = scrape.split(";\n")
        # Isolate to json of interest
        scrape = scrape[6].strip()
        # Format as json dictionary
        scrape = scrape.replace("window.__INITIAL_STATE__ = ","")
        scrape = json.loads(scrape)
        
        # Parse the scrape results
        eventGroupId = list(scrape['eventGroups'].keys())[0]
        providerEventId = list(scrape['eventGroups'][eventGroupId]['events'].keys())[0]
        providerOfferId = list(scrape['offers'][eventGroupId].keys())[0]
        eventId = scrape['eventGroups'][eventGroupId]['events'][providerEventId]['eventId']
        game_details = scrape['eventGroups'][eventGroupId]['events'][providerEventId]
        game = game_details['name']
        eventGroup = game_details['eventGroupName']
        teamName1 = game_details['teamName1']
        teamName1 = game_details['teamName2']
        startDate = game_details['startDate']
        date, time = startDate.split('T')
        dt = date + ' ' + time[:8]
        dt = datetime.strptime(dt, '%Y-%m-%d %H:%M:%S')
        # Format as right timezone (-5 hours)
        dt = dt - timedelta(hours=5)
        date = dt.strftime('%a, %b %d')
        time = dt.strftime('%I:%M %p')

        # Isolate the bet offers from the beautiful soup scrape
        offers = scrape['offers'][eventGroupId]

        # Loop over offers and add to dataframe
        for index, offer in enumerate(offers):

            # Create a list for parsing the offers 
            offers_parsed = [eventGroup, game, date, time, '', '', '', '', '', '', '', '']

            # Parse the betting offer
            offer_dict = scrape['offers'][eventGroupId][offer]
            try:
                providerOfferId = offer_dict['providerOfferId']
                providerId = offer_dict['providerId']
                providerEventId = offer_dict['providerEventId']
                bet_label = offer_dict['label']
                isOpen = offer_dict['isOpen']
                outcomes = offer_dict['outcomes']
            except:
                continue

            # Assign to list
            offers_parsed[5] = bet_label    

            # Extract outcomes
            if len(outcomes) == 1:

                # Parse the outcome
                outcome_label = outcomes[0]['label']
                outcome_odds = outcomes[0]['oddsAmerican']
                offers_parsed[8] = outcome_label
                offers_parsed[9] = outcome_odds

                # Append the list to the dataframe 
                bet_offers_df.loc[len(bet_offers_df)] = offers_parsed

            elif len(outcomes) == 2:

                # Parse the outcomes
                for i, x in enumerate(outcomes):
                    if i == 0:
                        outcome_label = outcomes[i]['label']
                        try:
                            outcome_line = outcomes[i]['line']
                        except:
                            outcome_line = ''
                        outcome_odds = outcomes[i]['oddsAmerican']
                        offers_parsed[7] = outcome_line
                        offers_parsed[8] = outcome_label
                        offers_parsed[9] = outcome_odds
                    else:
                        outcome_label = outcomes[i]['label']
                        try:
                            outcome_line = outcomes[i]['line']
                        except:
                            continue
                        outcome_odds = outcomes[i]['oddsAmerican']
                        offers_parsed[10] = outcome_label
                        offers_parsed[11] = outcome_odds

                # Append the list to the dataframe 
                bet_offers_df.loc[len(bet_offers_df)] = offers_parsed

            else:

                continue

# Preview output
bet_offers_df.head()

Unnamed: 0,League,Game,StartDate,StartTime,Player,BetLabel,SportsLine_Projection,DK_Line,Outcome1_Label,Outcome1_Odds,Outcome2_Label,Outcome2_Odds
0,NBA,CHA Hornets @ NY Knicks,"Mon, Jan 17",01:10 PM,,Julius Randle Points + Rebounds,,28.5,Over,1100,Under,-8000
1,NBA,CHA Hornets @ NY Knicks,"Mon, Jan 17",01:10 PM,,Julius Randle Points + Assists + Rebounds,,31.5,Over,1000,Under,-5000
2,NBA,CHA Hornets @ NY Knicks,"Mon, Jan 17",01:10 PM,,Cody Martin Steals,,0.5,Over,750,Under,-2000
3,NBA,CHA Hornets @ NY Knicks,"Mon, Jan 17",01:10 PM,,Cody Martin Points + Assists,,13.5,Over,105,Under,-140
4,NBA,CHA Hornets @ NY Knicks,"Mon, Jan 17",01:10 PM,,Miles Bridges Turnovers,,1.5,Over,700,Under,-1600


### Fantasy Pros NBA Projections 

They lock their NBA projections behind a paywall so I can only access the top 10 players. Not worth scraping really. 

In [129]:
# Web scrape Fantasy Pros for relevant information
url = 'https://www.fantasypros.com/nba/projections/daily-overall.php'
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")

projections_list = list(list(list(list(list(list(list(list(list(list(list(soup.children)[2])[3])[15])[13])[1])[1])[1])[11])[1])[3])
projections_list

['\n',
 <tr class="mpb-player-2332"><td class="player-label"><a class="player-name" href="/nba/players/james-harden.php">James Harden</a> <small>(BKN - PG,SG)</small> <a aria-hidden="true" class="fp-player-link fp-id-2332" fp-player-name="James Harden" href="#" tabindex="-1"></a></td>
 <td class="tooltip-top" data-tooltip="8th ranked defense vs. PG">vs OKC</td><td class="center">31.3</td>
 <td class="center">8.3</td>
 <td class="center">10.4</td>
 <td class="center">0.9</td>
 <td class="center">1.4</td>
 <td class="center">.450</td>
 <td class="center">.874</td>
 <td class="center">4.6</td>
 <td class="center">0</td>
 <td class="center">36.7</td>
 <td class="center">4.0</td>
 </tr>,
 '\n',
 <tr class="mpb-player-2803"><td class="player-label"><a class="player-name" href="/nba/players/giannis-antetokounmpo.php">Giannis Antetokounmpo</a> <small>(MIL - PF,C)</small> <a aria-hidden="true" class="fp-player-link fp-id-2803" fp-player-name="Giannis Antetokounmpo" href="#" tabindex="-1"></a></

## Fantasy Pros NFL Projections

In [130]:
# # Create lookup for the stat columns we are collecting
# stats_lookup = {'QB': {1: 'PASS_ATT', 2: 'CMP', 3: 'PASS_YDS', 4: 'PASS_TDS',
#              5: 'INTS', 6: 'RUSH_ATT', 7: 'RUSH_YDS', 8: 'RUSH_TDS', 10: 'FPTS'},
#     'RB': {1: 'RUSH_ATT', 2: 'RUSH_YDS', 3: 'RUSH_TDS', 4: 'REC', 5: 'REC_YDS',
#              6: 'REC_TDS', 8: 'FPTS'}, 
#     'WR': {1: 'REC', 2: 'REC_YDS', 3: 'REC_TDS', 4: 'RUSH_ATT', 5: 'RUSH_YDS',
#              6: 'RUSH_TDS', 8: 'FPTS'}, 
#     'TE': {1: 'REC', 2: 'REC_YDS', 3: 'REC_TDS', 5: 'FPTS'},
#     'DST': {1: 'SACK', 2: 'INT', 3: 'FR', 4: 'FF', 5: 'DEF_TD', 6: 'SAFETY',
#              7: 'PA', 8: 'YDS_AGN', 9: 'FPTS'},
#     'K': {1: 'FG', 2: 'FGA', 3: 'XPT', 4: 'FPTS'}}

# # Need to make an additional dictionary for the index of FPTS of each position
# position_fpts_index = {'QB': 10, 'RB': 8, 'WR': 8,
#                                 'TE': 5, 'DST': 9, 'K': 4}

# # Create a lookup between a full team name and the abbreviation
# team_abrv_lookup = {'New England Patriots': 'NE',
#                     'Washington Football Team': 'WAS',
#                     'Dallas Cowboys': 'DAL',
#                     'Baltimore Ravens': 'BAL',
#                     'Buffalo Bills': 'BUF',
#                     'Chicago Bears': 'CHI',
#                     'Indianapolis Colts': 'IND',
#                     'Las Vegas Raiders': 'OAK',
#                     'Kansas City Chiefs': 'KC',
#                     'Los Angeles Chargers': 'LAC', 
#                     'Carolina Panthers': 'CAR',
#                     'Denver Broncos': 'DEN',
#                     'Atlanta Falcons': 'ATL',
#                     'Tennessee Titans': 'TEN',
#                     'Minnesota Vikings': 'MIN',
#                     'Los Angeles Rams': 'LAR',
#                     'Tampa Bay Buccaneers': 'TB',
#                     'Green Bay Packers': 'GB',
#                     'Seattle Seahawks': 'SEA',
#                     'New Orleans Saints': 'NO',
#                     'Arizona Cardinals': 'ARI',
#                     'Miami Dolphins': 'MIA',
#                     'San Francisco 49ers': 'SF',
#                     'Cleveland Browns': 'CLE',
#                     'Pittsburgh Steelers': 'PIT',
#                     'Philadelphia Eagles': 'PHI',
#                     'Jacksonville Jaguars': 'JAX',
#                     'Detroit Lions': 'DET',
#                     'New York Jets': 'NYJ',
#                     'Cincinnati Bengals': 'CIN',
#                     'Houston Texans': 'HOU',
#                     'New York Giants': 'NYG'}

# # To add to the dataframe we'll need a consistent format of the lists
# projection_format = {'NFL_WEEK': 0, 'PLAYER': 1, 'POSITION': 2, 'TEAM': 3, 'PASS_ATT': 4, 
#                      'CMP': 5, 'PASS_YDS': 6, 'PASS_TDS': 7, 'INTS': 8, 'RUSH_ATT': 9,
#                      'RUSH_YDS': 10, 'RUSH_TDS': 11, 'REC': 12, 'REC_YDS': 13, 'REC_TDS': 14,
#                      'SACK': 15, 'INT': 16, 'FR': 17, 'FF': 18, 'DEF_TD': 19, 'SAFETY': 20,
#                      'PA': 21, 'YDS_AGN': 22, 'FG': 23, 'FGA': 24, 'XPT': 25, 'FPTS': 26}

# # Create an empty dataframe to append data to (excluding Def and K)
# fp_projections = pd.DataFrame(columns = ['LEAGUE', 'PLAYER', 'POSITION', 'TEAM', 'PASS_ATT', 
#                                          'CMP', 'PASS_YDS', 'PASS_TDS', 'INTS', 
#                                          'RUSH_ATT', 'RUSH_YDS', 'RUSH_TDS', 'REC', 
#                                          'REC_YDS', 'REC_TDS', 'SACK', 'INT',
#                                          'FR', 'FF', 'DEF_TD', 'SAFETY', 'PA',
#                                          'YDS_AGN', 'FG', 'FGA', 'XPT', 'FPTS'])

# # A dictionary of the URLs to hit for each position
# fantasy_pros_projection_urls = {'QB': 'https://www.fantasypros.com/nfl/projections/qb.php',
#                                 'RB': 'https://www.fantasypros.com/nfl/projections/rb.php?scoring=HALF',
#                                 'WR': 'https://www.fantasypros.com/nfl/projections/wr.php?scoring=HALF',
#                                 'TE': 'https://www.fantasypros.com/nfl/projections/te.php?scoring=HALF',
#                                 'DST': 'https://www.fantasypros.com/nfl/projections/dst.php',
#                                 'K': 'https://www.fantasypros.com/nfl/projections/k.php'}

# # Iterate over dictionary of URLs to add data to dictionary
# for position, url in fantasy_pros_projection_urls.items():  
#     # Get the stat keys lookup for position
#     lookup = stats_lookup[position]
#     # Lookup the fpts index 
#     fpts_index = position_fpts_index[position]
#     # Web scrape Fantasy Pros for relevant information
#     response = requests.get(url)
#     soup = BeautifulSoup(response.text, "html.parser")
#     projections_list = list(list(list(list(list(list(list(list(list(list(list(soup.children)[2])[3])[15])[13])[1])[1])[1])[11])[1])[3])
#     nfl_week = int(find_between(str(list(list(list(soup.children)[2])[1])[1]), 'Week ', ' ' + position))
#     for num, item in enumerate(projections_list):
#         if item != '\n':
#             # Create an empty list to update the data for
#             player_projections = [0] * 27
#             # Add the league name to the list
#             player_projections[0] = 'NFL'
#             # Split the projection string into a list to parse
#             proj_list = list(item)
#             # filter out the '\n'
#             proj_list = [x for x in proj_list if x != '\n']
#             # Extract the player name
#             player_name = str(proj_list[0]).split("fp-player-name=")[1]
#             start, stop = [m.start() for m in re.finditer('"', player_name)][0:2]
#             player_name = player_name[start+1:stop]
#             player_projections[1] = player_name
#             # Add the position to the list
#             player_projections[2] = position
#             # Extract the team of the player
#             team = list(proj_list[0])[1].strip() if position != 'DST' else team_abrv_lookup[find_between(str(list(proj_list[0])[0]), '>', '<')]
#             player_projections[3] = team
#             for i, stat in enumerate(proj_list):
#                 if i in lookup.keys():
#                     if i != fpts_index:
#                         # For default format of stat results
#                         stat_name = lookup[i] # get the stat name from lookup
#                         list_index = projection_format[stat_name] # get the list index for this stat
#                         result = re.search('<td class="center">(.*)</td>', str(stat))
#                         stat_result = float(result.group(1))
#                         player_projections[list_index] = stat_result
#                     else:
#                         # There is a different output for fantasy points
#                         stat_name = lookup[i] # get the stat name from lookup
#                         list_index = projection_format[stat_name] # get the list index for this stat
#                         result = re.search('<td class="center" data-sort-value="(.*)">(.*)</td>', str(stat))
#                         stat_result = float(result.group(2))
#                         player_projections[list_index] = stat_result
#                 else:
#                     continue
#             # Append the list to the dataframe         
#             fp_projections.loc[len(fp_projections)] = player_projections

# # Remove rows where fpts is 0, since there are not needed
# fp_projections = fp_projections[fp_projections['FPTS'] != 0.0]

# # Preview output
# fp_projections.head()

## SportsLine Projections

### NFL

In [131]:
# Web scrape Fantasy Pros for relevant information
url = 'https://www.sportsline.com/nfl/expert-projections/simulation/'
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")

# Isolate soup to the table of interest
output = str(list(list(list(soup.children)[1])[1])[1])
# Format as json dictionary
output = output.replace("""<script id="__NEXT_DATA__" type="application/json">""","").replace("</script>","")
output = json.loads(output)
projections = output['props']['initialState']['fantasyState']['projectionsPageState']['data']['projections']

# Create a dataframe to append results to
NFL_SportsLine_Projections = pd.DataFrame(columns=['LEAGUE', 'PLAYER', 'POS', 'TEAM', 'GAME', 'FP', 'PASSYD', 
                                                   'RUSHYD', 'RECYD'])

# Loop over the projections and parse
for i, projection in enumerate(projections):
    # Create an empty list to update the data for
    player_projections = [0] * 9
    # Set LEAGUE columns
    player_projections[0] = 'NFL'
    # Loop over list of projections to scrape
    projectionFields = projection['projectionFields']
    # Create a counter to add additional fields to correct list position
    counter = 1
    for x in projectionFields:
        field = x['field']
        if field in NFL_SportsLine_Projections.columns:
            try:
                value = x['value']
                player_projections[counter] = value
            except KeyError:
                player_projections[counter] = 0
            counter += 1

    # Append the list to the dataframe         
    NFL_SportsLine_Projections.loc[len(NFL_SportsLine_Projections)] = player_projections

# Preview output            
NFL_SportsLine_Projections.head()

Unnamed: 0,LEAGUE,PLAYER,POS,TEAM,GAME,FP,PASSYD,RUSHYD,RECYD
0,NFL,Tom Brady,QB,TB,PHI@TB,25.68,297,3,0
1,NFL,Patrick Mahomes,QB,KC,PIT@KC,25.41,275,23,0
2,NFL,Joe Burrow,QB,CIN,LV@CIN,24.89,260,9,0
3,NFL,Josh Allen,QB,BUF,NE@BUF,22.69,230,43,0
4,NFL,Dak Prescott,QB,DAL,SF@DAL,21.95,268,10,0


### NBA

In [132]:
# Web scrape Fantasy Pros for relevant information
url = 'https://www.sportsline.com/nba/expert-projections/simulation/'
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")

# Isolate soup to the table of interest
output = str(list(list(list(soup.children)[1])[1])[1])
# Format as json dictionary
output = output.replace("""<script id="__NEXT_DATA__" type="application/json">""","").replace("</script>","")
output = json.loads(output)
projections = output['props']['initialState']['fantasyState']['projectionsPageState']['data']['projections']

# Create a dataframe to append results to
NBA_SportsLine_Projections = pd.DataFrame(columns=['LEAGUE', 'PLAYER', 'POS', 'TEAM', 'GAME', 'FP', 'PTS', 'MIN', 'FG',
                                                   'FGA', 'AST', 'TRB', 'DRB', 'ORB', 'BK', 'ST', 'TO', 'FT', 'FTP', 'FGP'])
                                                   
# Loop over the projections and parse
for i, projection in enumerate(projections):
    # Create an empty list to update the data for
    player_projections = [0] * 20
    # Set LEAGUE columns
    player_projections[0] = 'NBA'
    # Loop over list of projections to scrape
    projectionFields = projection['projectionFields']
    # Create a counter to add additional fields to correct list position
    counter = 1
    for x in projectionFields:
        field = x['field']
        if field in NBA_SportsLine_Projections.columns:
            try:
                value = x['value']
                player_projections[counter] = value
            except KeyError:
                player_projections[counter] = 0
            counter += 1

    # Append the list to the dataframe         
    NBA_SportsLine_Projections.loc[len(NBA_SportsLine_Projections)] = player_projections

# Preview output
NBA_SportsLine_Projections.head()

Unnamed: 0,LEAGUE,PLAYER,POS,TEAM,GAME,FP,PTS,MIN,FG,FGA,AST,TRB,DRB,ORB,BK,ST,TO,FT,FTP,FGP
0,NBA,Nikola Jokic,C,DEN,POR@DEN,54.3,26.5,33,10.2,18.2,7.7,12.8,9.8,3.0,0.8,1.5,2.8,4.4,0.81,0.56
1,NBA,James Harden,SG,BKN,OKC@BKN,50.9,26.1,37,7.8,18.9,9.9,8.2,7.2,1.0,0.7,1.2,5.1,7.4,0.87,0.41
2,NBA,Giannis Antetokounmpo,PF,MIL,GS@MIL,50.1,27.8,33,9.9,18.3,6.0,11.4,9.3,2.1,1.4,1.0,3.5,6.8,0.69,0.54
3,NBA,Ja Morant,PG,MEM,MIN@MEM,42.3,24.0,34,9.1,18.0,7.7,5.1,4.2,0.8,0.5,1.2,3.9,4.3,0.76,0.51
4,NBA,Stephen Curry,PG,GS,GS@MIL,40.9,25.5,35,8.3,19.4,5.9,5.0,4.6,0.4,0.4,1.1,2.8,4.0,0.92,0.43


In [133]:
# # Save the odds file as a csv for analysis
# NBA_SportsLine_Projections.to_csv('Data/NBA_SportsLine_Projections.csv', index=False)

## Connect Bet Lines to Projections 

### SportsLine 

In [136]:
# Get a list of the players in the projections tables
nba_players = NBA_SportsLine_Projections['PLAYER'].to_list()
nfl_players = NFL_SportsLine_Projections['PLAYER'].to_list()
SportsLine_players = nba_players + nfl_players

# For every row in bef_offers_df, check if the label contains the name of a player, if so add it to the Player column
for i, row in bet_offers_df.iterrows():
    betLabel = row["BetLabel"]
    player = [x for x in SportsLine_players if x in betLabel]
    if len(player) > 0:
        # Extract the player name from the list
        player = player[0]
        # Update the Player field in the df
        bet_offers_df.loc[i, "Player"] = player
        # Update the BetLabel to remove the player name to isolate the stat
        bet_offers_df.loc[i, "BetLabel"] = betLabel.replace(player, "").strip()
        
# Link the SportsLine projection fields with the bet label tracked
stat_label_map = {
    'Assists': 'AST',
    'Points': 'PTS',
    'Rebounds': 'TRB'
}

stat_label_keys = list(stat_label_map.keys())

# Create a column for the SportsLine Projection
for i, row in bet_offers_df.iterrows():
    # Collet needed variables
    BetLabel = row["BetLabel"]
    bet_parts = BetLabel.split("+")
    bet_parts = [x.strip() for x in bet_parts]
    bet_parts.sort()
    player = row["Player"]

    if bet_parts is not None:

        if len(bet_parts) == 3:

            # Check that the labels are those in the stat_label_map 
            # If not this is likely due to a miss in the Player matching that results in overlap in the BetLabel
            if list(bet_parts) == stat_label_keys:
        
                # Create a variable to store the NBA_SportsLine_Projection
                NBA_SportsLine_Projection = 0
                # Loop over the bet parts and add the projections
                for part in bet_parts:
                    # Use the player name and the statistic to look up the value in the NBA_SportsLine_Projections table
                    projection = NBA_SportsLine_Projections.loc[NBA_SportsLine_Projections.PLAYER == player, 
                                                                stat_label_map[part]].iloc[0]
                    # Add the projection to the total
                    NBA_SportsLine_Projection += projection
                # Set the projection in the bet_offers_df table
                bet_offers_df.loc[i, "SportsLine_Projection"] = NBA_SportsLine_Projection

        elif len(bet_parts) == 2:
            
            # Check that both the labels are in the stat_label_lookup
            # If not this is likely due to a miss in the Player matching that results in overlap in the BetLabel
            if (bet_parts[0] in stat_label_keys) & (bet_parts[1] in stat_label_keys) :
                # Create a variable to store the NBA_SportsLine_Projection
                NBA_SportsLine_Projection = 0
                # Loop over the bet parts and add the projections
                for part in bet_parts:
                    # Use the player name and the statistic to look up the value in the NBA_SportsLine_Projections table
                    projection = NBA_SportsLine_Projections.loc[NBA_SportsLine_Projections.PLAYER == player, 
                                                                stat_label_map[part]].iloc[0]
                    # Add the projection to the total
                    NBA_SportsLine_Projection += projection
                # Set the projection in the bet_offers_df table
                bet_offers_df.loc[i, "SportsLine_Projection"] = NBA_SportsLine_Projection

        elif len(bet_parts) == 1:

            if BetLabel in stat_label_map.keys():
                # Use the player name and the statistic to look up the value in the NBA_SportsLine_Projections table
                NBA_SportsLine_Projection = NBA_SportsLine_Projections.loc[NBA_SportsLine_Projections.PLAYER == player, 
                                                                           stat_label_map[BetLabel]].iloc[0]
                # Set the projection in the bet_offers_df table
                bet_offers_df.loc[i, "SportsLine_Projection"] = NBA_SportsLine_Projection

        else:
            
            continue
    
    else:
        
        continue

# Format columns
bet_offers_df['SportsLine_Projection'] = pd.to_numeric(bet_offers_df['SportsLine_Projection'], errors='coerce')
bet_offers_df['DK_Line'] = pd.to_numeric(bet_offers_df['DK_Line'], errors='coerce')
bet_offers_df['Outcome1_Label'] = bet_offers_df['Outcome1_Label'].astype(str)
bet_offers_df['Outcome2_Label'] = bet_offers_df['Outcome2_Label'].astype(str)

# Add a column for absolute value difference between the projection and line
bet_offers_df["Line2ProjDiff"] = abs(bet_offers_df['SportsLine_Projection'] - bet_offers_df['DK_Line'])
# Add columns with the implied probabilities of the odds
bet_offers_df['Outcome1_ImpliedProbability'] = bet_offers_df['Outcome1_Odds'].apply(impliedOdds)
bet_offers_df['Outcome2_ImpliedProbability'] = bet_offers_df['Outcome2_Odds'].apply(impliedOdds)
# Add a column for the bet "juice"
bet_offers_df['Bet_Juice'] = bet_offers_df['Outcome1_ImpliedProbability'] + bet_offers_df['Outcome2_ImpliedProbability'] - 100

# Drop rows which contain any NaN value in the selected columns
bet_offers_df = bet_offers_df.dropna(how='any', subset=['SportsLine_Projection', 'DK_Line'])

# Preview output
bet_offers_df.head()

Unnamed: 0,League,Game,StartDate,StartTime,Player,BetLabel,SportsLine_Projection,DK_Line,Outcome1_Label,Outcome1_Odds,Outcome2_Label,Outcome2_Odds,Line2ProjDiff,Outcome1_ImpliedProbability,Outcome2_ImpliedProbability,Bet_Juice
7,NBA,GS Warriors @ MIL Bucks,"Thu, Jan 13",07:30 PM,Grayson Allen,Assists,1.7,0.5,Over,-125,Under,-105,1.2,56,51.0,7.0
8,NBA,GS Warriors @ MIL Bucks,"Thu, Jan 13",07:30 PM,Grayson Allen,Points + Rebounds,16.1,14.5,Over,-150,Under,115,1.6,60,47.0,7.0
9,NBA,GS Warriors @ MIL Bucks,"Thu, Jan 13",07:30 PM,Grayson Allen,Points,12.8,12.5,Over,-110,Under,-120,0.3,52,55.0,7.0
10,NBA,GS Warriors @ MIL Bucks,"Thu, Jan 13",07:30 PM,Andrew Wiggins,Points + Rebounds,20.7,19.5,Over,-115,Under,-115,1.2,53,53.0,6.0
11,NBA,GS Warriors @ MIL Bucks,"Thu, Jan 13",07:30 PM,Andrew Wiggins,Rebounds,4.0,3.5,Over,-105,Under,-125,0.5,51,56.0,7.0


In [13]:
# # Save the odds file as a csv for analysis
# bet_offers_df.to_csv('Data/BetOdds.csv', index=False)

## NBA Stats - RapidAPI 

### NBA Teams 

In [10]:
# Hit RapidApi URL for NBA teams for in 'standard' league
url = "https://api-nba-v1.p.rapidapi.com/teams/league/standard"

headers = {
    'x-rapidapi-host': "api-nba-v1.p.rapidapi.com",
    'x-rapidapi-key': "40cfc5891cmshe48d38938873f74p12526ejsnb332fefe9905"
    }

response = requests.request("GET", url, headers=headers)

# Create a dataframe to append results to
NBA_Teams = pd.DataFrame(columns=['teamId', 'shortName', 'City', 'Nickname', 'fullName', 'Conference', 'Division'])

# Set the response as a JSON and extract the games section
teams = json.loads(response.text)
teams = teams['api']['teams']

# Loop over the games and parse needed fields
for team in teams:
    
    # Skip non-NBA Franchises
    nbaFranchise = team['nbaFranchise']

    if nbaFranchise == '1':
        
        # Grab needed fields
        teamId = team['teamId']
        shortName = team['shortName']
        city = team['city']
        nickname = team['nickname']
        fullName = team['fullName']
        conference = team['leagues']['standard']['confName']
        division = team['leagues']['standard']['divName']

        # Create list to add to dataframe
        nba_teams_list = [teamId, shortName, city, nickname, fullName, conference, division]

        # Append the list to the dataframe         
        NBA_Teams.loc[len(NBA_Teams)] = nba_teams_list

    else:
        
        continue
    
NBA_Teams

Unnamed: 0,teamId,shortName,City,Nickname,fullName,Conference,Division
0,1,ATL,Atlanta,Hawks,Atlanta Hawks,East,Southeast
1,2,BOS,Boston,Celtics,Boston Celtics,East,Atlantic
2,4,BKN,Brooklyn,Nets,Brooklyn Nets,East,Atlantic
3,5,CHA,Charlotte,Hornets,Charlotte Hornets,East,Southeast
4,6,CHI,Chicago,Bulls,Chicago Bulls,East,Central
5,7,CLE,Cleveland,Cavaliers,Cleveland Cavaliers,East,Central
6,8,DAL,Dallas,Mavericks,Dallas Mavericks,West,Southwest
7,9,DEN,Denver,Nuggets,Denver Nuggets,West,Northwest
8,10,DET,Detroit,Pistons,Detroit Pistons,East,Central
9,11,GSW,Golden State,Warriors,Golden State Warriors,West,Pacific


In [11]:
# Save the file as a csv for analysis
NBA_Teams.to_csv('Data/NBA_Teams.csv', index=False)

### NBA Games 

In [41]:
# Hit RapidApi URL for NBA games for 2021 season
url = "https://api-nba-v1.p.rapidapi.com/games/seasonYear/2021"

headers = {
    'x-rapidapi-host': "api-nba-v1.p.rapidapi.com",
    'x-rapidapi-key': "40cfc5891cmshe48d38938873f74p12526ejsnb332fefe9905"
    }

response = requests.request("GET", url, headers=headers)

# Create a dataframe to append results to
NBA_Games = pd.DataFrame(columns=['gameId', 'seasonYear', 'League', 'homeTeamId', 'awayTeamId', 'Date', 'startTime', 
                                  'status', 'homeScore', 'awayScore'])

# Set the response as a JSON and extract the games section
games = json.loads(response.text)
games = games['api']['games']

# Loop over the games and parse needed fields
for game in games:
    
    # Only care about NBA games (not G-league)
    league = game['league']
    
    if league == 'standard':
        
        # Grab needed fields
        seasonYear = game['seasonYear']
        gameId = game['gameId']
        startTimeUTC = game['startTimeUTC']
        arena = game['arena']
        city = game['city']
        status = game['statusGame']
        awayTeamId = game['vTeam']['teamId']
        homeTeamId = game['hTeam']['teamId']
        awayScore = game['vTeam']['score']['points']
        homeScore = game['hTeam']['score']['points']
        
        # Convert startTime to EST
        try:
            # Format string as datetime
            startTimeUTC = datetime.strptime(startTimeUTC, "%Y-%m-%dT%H:%M:%S.000Z")
            # Set UTC as timezone
            startTimeUTC = startTimeUTC.replace(tzinfo=timezone('UTC'))
            # Convert timezone to Eastern
            startTime = startTimeUTC.astimezone(timezone('US/Eastern'))
            # Split the date and time
            date = startTime.strftime("%Y-%m-%d")
            time = startTime.strftime("%H:%M")
        except ValueError:
            date = startTimeUTC
            startTime = 'TBD'
        
        # Create list to add to dataframe
        nba_games_list = [gameId, seasonYear, league, homeTeamId, awayTeamId, date, time, status, homeScore, awayScore]
        
        # Append the list to the dataframe         
        NBA_Games.loc[len(NBA_Games)] = nba_games_list
        
    else:
        continue

NBA_Games.head()

Unnamed: 0,gameId,seasonYear,League,homeTeamId,awayTeamId,Date,startTime,status,homeScore,awayScore
0,10796,2021,standard,17,4,2021-10-03,15:30,Finished,97,123
1,10797,2021,standard,38,27,2021-10-04,19:00,Finished,123,107
2,10798,2021,standard,2,26,2021-10-04,19:30,Finished,98,97
3,10799,2021,standard,20,1,2021-10-04,19:30,Finished,125,99
4,10800,2021,standard,22,23,2021-10-04,20:00,Finished,117,114


In [109]:
# Save the file as a csv for analysis
NBA_Games.to_csv('Data/NBA_Games.csv', index=False)

### NBA Players 

In [15]:
# RapidNBA's NBA Players URL
url = "https://api-nba-v1.p.rapidapi.com/players/league/standard"

headers = {
    'x-rapidapi-host': "api-nba-v1.p.rapidapi.com",
    'x-rapidapi-key': "40cfc5891cmshe48d38938873f74p12526ejsnb332fefe9905"
    }

response = requests.request("GET", url, headers=headers)

# Create a dataframe to append results to
NBA_Players = pd.DataFrame(columns=['playerId', 'teamId', 'fullName', 'firstName', 'lastName', 'Position', 'yearsPro', 
                                    'startNBA', 'College', 'Country', 'DateOfBirth', 'Height', 'Weight'])

# Set the response as a JSON and extract the games section
players = json.loads(response.text)
players = players['api']['players']

# Loop over the games and parse needed fields
for player in players:

    # Only grab active NBA players
    active = player['leagues']['standard']['active']
    leagues = player['leagues'].keys()
    
    if ('standard' in leagues) & (active == '1'):
    
        # Grab needed fields
        playerId = player['playerId']
        firstName = player['firstName']
        lastName = player['lastName']
        fullName = firstName + ' ' + lastName
        teamId = player['teamId']
        yearsPro = player['yearsPro']
        college = player['collegeName']
        country = player['country']
        startNba = player['startNba']
        dateOfBirth = player['dateOfBirth']
        height = player['heightInMeters']
        weight = player['weightInKilograms']
        position = player['leagues']['standard']['pos']

        # Create list to add to dataframe
        nba_players_list = [playerId, teamId, fullName, firstName, lastName, position, yearsPro, startNba, college, 
                            country, dateOfBirth, height, weight]

        # Append the list to the dataframe         
        NBA_Players.loc[len(NBA_Players)] = nba_players_list
        
    else:
        
        continue
        
NBA_Players.head()

Unnamed: 0,playerId,teamId,fullName,firstName,lastName,Position,yearsPro,startNBA,College,Country,DateOfBirth,Height,Weight
0,2,28,Quincy Acy,Quincy,Acy,F,6,2012,Baylor,USA,1990-10-06,2.01,108.9
1,4,19,Steven Adams,Steven,Adams,C,8,2013,Pittsburgh,New Zealand,1993-07-20,2.11,120.2
2,5,26,Arron Afflalo,Arron,Afflalo,G,0,0,,,,,
3,8,4,LaMarcus Aldridge,LaMarcus,Aldridge,C-F,15,2006,Texas-Austin,USA,1985-07-19,2.11,113.4
4,17,15,Justin Anderson,Justin,Anderson,F-G,5,2015,Virginia,USA,1993-11-19,1.96,104.8


In [16]:
# Save the file as a csv for analysis
NBA_Players.to_csv('Data/NBA_Players.csv', index=False)

### NBA Game Stats 

In [78]:
# Create a dataframe to append results to
NBA_GameStats = pd.DataFrame(columns=['gameId', 'playerId', 'teamId', 'Points', 'Position', 'Minutes', 'FGM', 'FGA',
                                     'FGP', 'FTM', 'FTA', 'TPM', 'TPA', 'TPP', 'offReb', 'defReb', 'totReb', 'assists',
                                      'pFouls', 'steals', 'turnovers', 'blocks', 'plusMinus'])

In [104]:
# Get a list of gameIds for completed games
completedGameIds = set(NBA_Games[NBA_Games['status'] == 'Finished']['gameId'].tolist())

# Find a list of gameIds already in NBA_GameStats
NBA_GameStats_Games = set(NBA_GameStats['gameId'].tolist())

# Set the list of gameIds to search to only those completed games that we don't already have data for in NBA_GameStats
gameIds = [x for x in completedGameIds if x not in NBA_GameStats_Games]

# Limited to 100 requests per day before getting charged
counter = 0
limit = 100

In [105]:
# Loop over the gameIds and hit RapidAPI for the stats
for game in gameIds:
    
    # Limited to 10 requests per minute, so add a lag
    time.sleep(6)

    # Increment the counter
    counter += 1

    # Hit RapidAPI for this game
    url = f"https://api-nba-v1.p.rapidapi.com/statistics/players/gameId/{game}"

    headers = {
        'x-rapidapi-host': "api-nba-v1.p.rapidapi.com",
        'x-rapidapi-key': "40cfc5891cmshe48d38938873f74p12526ejsnb332fefe9905"
        }

    response = requests.request("GET", url, headers=headers)


    # Set the response as a JSON and extract the games section
    gameStats = json.loads(response.text)
    try:
        gameStats = gameStats['api']['statistics']
    except KeyError:
        print(gameStats)
        continue

    # Loop over the stats and parse needed fields
    for player in gameStats:

        # Grab needed fields
        gameId = player['gameId']
        playerId = player['playerId']
        teamId = player['teamId']
        Points = player['points']
        Position = player['pos']
        Minutes = player['min']
        FGM = player['fgm']
        FGA = player['fga']
        FGP = player['fgp']
        FTM = player['ftm']
        FTA = player['fta']
        TPM = player['tpm']
        TPA = player['tpa']
        TPP = player['tpp']
        offReb = player['offReb']
        defReb = player['defReb']
        totReb = player['totReb']
        assists = player['assists']
        pFouls = player['pFouls']
        steals = player['steals']
        turnovers = player['turnovers']
        blocks = player['blocks']
        plusMinus = player['plusMinus']

        # Create list to add to dataframe
        nba_playerStats_list = [gameId, playerId, teamId, Points, Position, Minutes, FGM, FGA, FGP, FTM, FTA, TPM, TPA, 
                                TPP, offReb, defReb, totReb, assists, pFouls, steals, turnovers, blocks, plusMinus]

        # Append the list to the dataframe         
        NBA_GameStats.loc[len(NBA_GameStats)] = nba_playerStats_list

In [106]:
NBA_GameStats

Unnamed: 0,gameId,playerId,teamId,Points,Position,Minutes,FGM,FGA,FGP,FTM,...,TPP,offReb,defReb,totReb,assists,pFouls,steals,turnovers,blocks,plusMinus
0,10815,44,17,6,,15:13,2,4,50.0,2,...,0.0,0,2,2,0,4,0,3,1,-21
1,10815,126,17,14,C,24:55,5,14,35.7,3,...,33.3,1,7,8,2,0,2,0,2,-10
2,10815,286,17,6,,19:46,3,3,100,0,...,0.0,0,7,7,1,2,0,1,1,-9
3,10815,1007,17,11,SG,26:52,5,12,41.7,0,...,20.0,0,3,3,2,1,0,1,0,-15
4,10815,1867,17,11,SF,23:40,3,7,42.9,3,...,50.0,1,2,3,3,1,1,5,0,-1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
18320,10103,21,17,16,,20:33,6,11,54.5,0,...,57.1,0,2,2,0,3,0,0,3,16
18321,10103,44,17,3,,5:30,1,3,33.3,0,...,33.3,0,1,1,0,1,0,0,1,3
18322,10103,157,17,6,,5:30,2,2,100,0,...,100,0,1,1,1,0,0,0,0,3
18323,10103,250,17,0,,0:00,0,0,0.0,0,...,0.0,0,0,0,0,0,0,0,0,0


In [112]:
# Save the file as a csv for analysis
NBA_GameStats.to_csv('Data/NBA_GameStats.csv', index=False)

## Upload Data to SQLite

In [17]:
# Connect to SQLite
conn = sqlite3.connect('Data/PropAnalysis.db')

# Create the connection to the SQLite database
engine = create_engine('sqlite:///Data/PropAnalysis.db', echo=True)
sqlite_connection = engine.connect()

# Save bet_offers_df dataframe to SQLite
sqlite_table = "bet_offers_df"
bet_offers_df.to_sql(sqlite_table, sqlite_connection, if_exists='replace')

# Save NBA_Teams dataframe to SQLite
sqlite_table = "NBA_Teams"
NBA_Teams.to_sql(sqlite_table, sqlite_connection, if_exists='replace')

# Save NBA_Games dataframe to SQLite
sqlite_table = "NBA_Games"
NBA_Games.to_sql(sqlite_table, sqlite_connection, if_exists='replace')

# Save NBA_Players dataframe to SQLite
sqlite_table = "NBA_Players"
NBA_Players.to_sql(sqlite_table, sqlite_connection, if_exists='replace')

# Save NBA_GameStats dataframe to SQLite
sqlite_table = "NBA_GameStats"
NBA_GameStats.to_sql(sqlite_table, sqlite_connection, if_exists='replace')

# Close connections when done
conn.close()
sqlite_connection.close()

2022-01-15 11:59:10,599 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("NBA_Players")
2022-01-15 11:59:10,600 INFO sqlalchemy.engine.Engine [raw sql] ()
2022-01-15 11:59:10,602 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("NBA_Players")
2022-01-15 11:59:10,603 INFO sqlalchemy.engine.Engine [raw sql] ()
2022-01-15 11:59:10,604 INFO sqlalchemy.engine.Engine SELECT name FROM sqlite_master WHERE type='table' ORDER BY name
2022-01-15 11:59:10,605 INFO sqlalchemy.engine.Engine [raw sql] ()
2022-01-15 11:59:10,606 INFO sqlalchemy.engine.Engine PRAGMA main.table_xinfo("NBA_Players")
2022-01-15 11:59:10,607 INFO sqlalchemy.engine.Engine [raw sql] ()
2022-01-15 11:59:10,609 INFO sqlalchemy.engine.Engine SELECT sql FROM  (SELECT * FROM sqlite_master UNION ALL   SELECT * FROM sqlite_temp_master) WHERE name = ? AND type = 'table'
2022-01-15 11:59:10,610 INFO sqlalchemy.engine.Engine [raw sql] ('NBA_Players',)
2022-01-15 11:59:10,611 INFO sqlalchemy.engine.Engine PRAGMA main.foreign_ke