## Betting Lines Scrape

This notebook's primary function is to scrape the website sportsbookreview.com for gambling lines (both totals and spreads) corresponding to each game in the 2014-2018 NBA seasons. I took the data previously collected from basketball-reference.com, and used the dates when games were played to query the website and collect each day's betting lines. I then put this historical data in a JSON file.

In [1]:
import json
import requests
from bs4 import BeautifulSoup
import time
import csv
import pandas as pd

#### dataframe_loader: 

- returns a dataframe containing one years games from a list of jsons containing each teams games for the season

In [2]:
def dataframe_loader(years_games):
    years_stats = []
    for game in years_games:
        with open(f'{game}') as g:
            years_stats.append(json.load(g))
    all_games_year = [team for game_list in years_stats for game in game_list for team in game]
    df_year = pd.DataFrame(all_games_year)
    return df_year

In [3]:
gl_2014 = !ls *_2014.json
gl_2015 = !ls *_2015.json
gl_2016 = !ls *_2016.json
gl_2017 = !ls *_2017.json
gl_2018 = !ls *_2018.json

In [4]:
df_2014 = dataframe_loader(gl_2014)
df_2015 = dataframe_loader(gl_2015)
df_2016 = dataframe_loader(gl_2016)
df_2017 = dataframe_loader(gl_2017)
df_2018 = dataframe_loader(gl_2018)



In [7]:
df_all = df_2014.append([df_2015, df_2016, df_2017, df_2018], ignore_index=True)

In [9]:
df_all['date'] = df_all[0].map(lambda x: x[:8])

In [13]:
df_all.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,13,14,15,16,17,18,19,20,21,date
0,201311010ATL,TOR,away,240,40,88,0.455,7,23,0.304,...,10,32,42,15,5,6,12,25,95,20131101
1,201311010ATL,ATL,home,240,36,77,0.468,10,23,0.435,...,7,30,37,24,7,3,13,12,102,20131101
2,201311090ATL,ORL,away,240,38,82,0.463,8,17,0.471,...,9,27,36,22,9,2,17,24,94,20131109
3,201311090ATL,ATL,home,240,43,90,0.478,6,19,0.316,...,7,32,39,36,9,6,12,15,104,20131109
4,201311130ATL,NYK,away,240,37,88,0.42,12,34,0.353,...,8,29,37,24,11,3,3,19,95,20131113


In [11]:
date_list = [date for date in df_all['date'].unique()]

In [12]:
len(date_list)

816

#### get_betting_lines: 
- Function takes in a date, queries sportsbookreview.com for the corresponding page, and returns the day's betting lines labeled by team

In [62]:
def get_betting_lines(date):
    info_list = []
    betting_page = requests.get(f'https://www.sportsbookreview.com/betting-odds/nba-basketball/merged/?date={date}')
    time.sleep(2)
    betting_page = BeautifulSoup(betting_page.text, 'html.parser')
    teams_list = []
    for row in betting_page.find_all('div', {'class': 'eventLine-value'}):
        teams_list.append(row.text)
    betting_lines = []
    for item in betting_page.find_all('div', {'class': 'event-holder holder-complete'}):
        for line in item.find('div', {'class': 'el-div eventLine-book'}):
            betting_lines.append(line.text)
    betting_lines = [line.replace('\xa0', ' ') for line in betting_lines]
    
    date_list = [date for item in betting_lines]
    zipped_teams_lines = zip(date_list, teams_list, betting_lines)
    return list(zipped_teams_lines)

In [65]:
all_lines = []
for date in date_list:
    all_lines.append(get_betting_lines(date))
    

In [72]:
with open('all_gambling_lines.json', 'w') as f:
    json.dump(all_lines, f)