# NBA Game Ratings
#### Developed by Dan McDonough
#### February 8, 2021

The purpose of this notebook is to answer the question: "What's the best NBA game on TV right now?"

I often asked myself this question during grad school when I would throw on some league pass in the background while completing work.

I hope you find this notebook helpful in answering this question or offering ideas for pulling together NBA data.

## 1. Simulate Model of Random Game Scenarios

To start, I needed a way to rate games based on their watchability. The approach I decided on was to simulate various game scenarios and then manually rate them to create a labeled data set to model. I choose this route rather than developing a rule set, because I wanted the model to attempt to learn my subjective preferences and predict them for future game situations.

In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import random
from scipy.stats import binom
from scipy.stats import norm
import statsmodels.api as sm
import warnings
warnings.filterwarnings("ignore")
import time
from IPython.display import clear_output
import datetime

I choose to simulate game scenarios with various time remaining, scoring margins, and quality of matchup. What I envisioned for matchup grades are:

A - Both teams are playoff teams
<br> B - One team is a playoff team, the other is a playoff contender
<br> C - Both teams are playoff contenders
<br> D - One team is a playoff contender the other is tanking
<br> E - Both teams are tanking

In [2]:
np.random.seed(0)
minutes = np.random.randint(0,48,100)

# I used the mean and standard deviation of historical point spread data to simulate lead
np.random.seed(0)
lead = abs(np.random.normal(3.09,6.43,100).round())

# I wanted to express matchup as an ordinal scale rather than an interval scale so I made this a categorical variable
np.random.seed(0)
matchup_dict = {1:'A',2:'B',3:'C',4:'D',5:'E'}
matchups = pd.Series(np.random.randint(1,6,100)).apply(lambda x: matchup_dict[x])

sim_df = pd.DataFrame({'Matchup':matchups,'Minutes':minutes,'Lead':lead})

For my rankings, these are admittedly (and intentionally) subjective. Here is what I had in mind with my ratings:

5 - You should probably stop second-screening and watch the game
<br> 4 - This is a good game worth watching
<br> 3 - Ok
<br> 2 - Not great
<br> 1 - This really isn't worth having on, go throw on some music instead

In [3]:
ratings = pd.Series([
1,3,3,1,1,3,3,4,2,4,
4,2,4,4,5,5,2,3,4,1,
1,4,1,4,1,3,5,4,1,1,
3,3,5,3,3,4,2,1,2,2,
3,1,3,2,4,1,5,1,3,2,
4,3,5,5,3,5,3,3,2,2,
5,1,3,1,5,5,3,1,1,1,
3,1,1,1,5,2,4,3,1,4,
5,4,5,5,1,1,4,3,1,3,
4,1,3,3,4,4,4,2,3,2])

sim_df = pd.concat([sim_df,ratings],axis=1)
sim_df.columns = ['Matchup','Minutes','Lead','Rating']


def model_prep(model_df, train_df=None):
    # add an interaction term between minutes and lead
    model_df['Min_Lead'] = model_df['Minutes'] * model_df['Lead']

    # dummy matchups and normalize numeric variables
    if train_df is None:
        model_df = pd.concat([pd.get_dummies(model_df['Matchup']),model_df[['Minutes','Lead','Min_Lead']]],axis=1)
        for column in ['Minutes','Lead','Min_Lead']:
            model_df[column] = (model_df[column] - model_df[column].mean())/model_df[column].std()
            
    else:
        # need alternate way to dummy matchups
        dummy_df = pd.get_dummies(model_df['Matchup'])
        for letter in ['A','B','C','D','E']:
            if letter in dummy_df.columns:
                pass
            else:
                dummy_df[letter] = 0
        
        
        model_df = pd.concat([dummy_df,model_df[['Minutes','Lead','Min_Lead']]],axis=1)
        for column in ['Minutes','Lead','Min_Lead']:
            model_df[column] = (model_df[column] - train_df[column].mean())/train_df[column].std()
            
    return model_df

sim_df_x = model_prep(sim_df)

Adding an interaction term between minutes and lead helped improve adjusted R-squared

In [4]:
lm = sm.OLS(sim_df['Rating'],sm.add_constant(sim_df_x)).fit()
print(lm.summary())

# Calculating min/max of predicted train values so I can later normalize predictions to a ten-point scale
maxval = lm.predict().max()
minval = lm.predict().min()

                            OLS Regression Results                            
Dep. Variable:                 Rating   R-squared:                       0.783
Model:                            OLS   Adj. R-squared:                  0.766
Method:                 Least Squares   F-statistic:                     47.30
Date:                Mon, 08 Feb 2021   Prob (F-statistic):           8.57e-28
Time:                        21:03:43   Log-Likelihood:                -98.928
No. Observations:                 100   AIC:                             213.9
Df Residuals:                      92   BIC:                             234.7
Df Model:                           7                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          2.3798      0.057     41.893      0.0

## 2. Compile Team Data

In order to grade the quality of the matchup, I need to pull current standings data to assess playoff probabilities.

In [5]:
# pull standings from basketball reference
url = "https://www.basketball-reference.com/leagues/NBA_2021_standings.html"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")

# specify total games this season so that I can calculate games remaining per team
totalgames = 72

table_ids = {'east':'confs_standings_E','west':'confs_standings_W'}

def standings_frame(soup, totalgames, idval):
    table = soup.findAll(id=idval)[0]
    teams = table.findAll('a')
    teamlist = []
    for i in range(len(teams)):
        teamlist.append(str(teams[i]).split(">")[1].split("<")[0])
    
    data = table.findAll('td')
    winlist = []
    losslist = []
    winpctlist = []
    [winlist.append(int(str(data[x*7]).split(">")[1].split("<")[0])) for x in range(15)]
    [losslist.append(int(str(data[1+x*7]).split(">")[1].split("<")[0])) for x in range(15)]
    [winpctlist.append(float(str(data[2+x*7]).split(">")[1].split("<")[0])) for x in range(15)]

    conf_frame = pd.DataFrame({'team':teamlist,'win':winlist,'loss':losslist,'win_pct':winpctlist})
    conf_frame['team_short'] = conf_frame['team'].apply(lambda x: x.split(' ')[-1])
    
    # adjust for two-word name for portland
    conf_frame['team_short'] = conf_frame['team_short'].apply(lambda x: 'Trail Blazers' if x == 'Blazers' else x)
    conf_frame['remaining'] = totalgames - (conf_frame['win'] + conf_frame['loss'])
    
    return conf_frame

east_frame = standings_frame(soup, totalgames, idval=table_ids['east'])
west_frame = standings_frame(soup, totalgames, idval=table_ids['west'])

Next, I want to predict the probability of making the playoffs. I've chosen to use a very simple heuristic of calculating the binomial probabilty of winning a certain number of games, using the season-to-date win percentage as the true probability.

In [6]:
def calculate_probabilities(conf_frame, totalgames, direction):
    # the purpose of this function is to solve for the number of games need to make the playoffs
    
    # initialize variables
    if direction == 'up':
        wins_needed = 0
    else:
        wins_needed = int(totalgames)
        
    current_gap = 0
    while (True):
        # calculate required number of wins to make the playoffs
        conf_frame['required'] = wins_needed-conf_frame['win']    
        
        # floor at zero for teams that have already made it
        conf_frame['required'].apply(lambda x: max(x,0))
        
        # set probabilities to zero for each iteration
        conf_frame['prob'] = 0

        # calculate probabilities at current iteration
        for i in range(len(conf_frame)):
            conf_frame.iloc[i,7] = (1-binom.cdf(conf_frame.iloc[i]['required']-1,\
                                                conf_frame.iloc[i]['remaining'],conf_frame.iloc[i]['win_pct'])).round(2)

        # calculate how far off the sum of probabilities is from the target of 800%
        prior_gap = int(current_gap)
        current_gap = 8 - conf_frame['prob'].sum()

        # increment games needed until you reach a stopping point
        if direction == 'up':
            if current_gap < 0:
                wins_needed += 1
            else:
                break
                
        else:
            if current_gap > 0:
                wins_needed -= 1
            else:
                break

            
    return current_gap, conf_frame

In [7]:
def setup_prob(conf_frame, totalgames):
    # calculate incrementing both upwards and downwards and choose the better solution
    
    current_gap_up, frame_up = calculate_probabilities(conf_frame, totalgames, 'up')
    current_gap_down, frame_down = calculate_probabilities(conf_frame, totalgames, 'down')

    if current_gap_up <= current_gap_down:
        return frame_up
    else:
        return frame_down

In [8]:
combined_frame = pd.concat([setup_prob(east_frame, totalgames),setup_prob(west_frame, totalgames)],axis=0)

In [9]:
combined_frame

Unnamed: 0,team,win,loss,win_pct,team_short,remaining,required,prob
0,Philadelphia 76ers,17,7,0.708,76ers,48,15,1.0
1,Milwaukee Bucks,15,8,0.652,Bucks,49,17,1.0
2,Brooklyn Nets,14,11,0.56,Nets,47,18,1.0
3,Boston Celtics,12,10,0.545,Celtics,50,20,0.99
4,Indiana Pacers,12,12,0.5,Pacers,48,20,0.9
5,Atlanta Hawks,11,12,0.478,Hawks,49,21,0.8
6,Charlotte Hornets,11,13,0.458,Hornets,48,21,0.66
7,New York Knicks,11,14,0.44,Knicks,47,21,0.52
8,Toronto Raptors,10,13,0.435,Raptors,49,22,0.48
9,Cleveland Cavaliers,10,14,0.417,Cavaliers,48,22,0.33


## 3. Scrape Scoreboard

Next up I need to scrape the current NBA scoreboard.

In [10]:
def pull_scoreboard():
    url = "https://www.cbssports.com/nba/scoreboard/"
    response = requests.get(url)
    soup = BeautifulSoup(response.text, "html.parser")
    games = soup.find_all(attrs={"class": "live-update"})
    return games

In [11]:
def parse_current_time(game):
    # returns number of minutes remaining in the game
    
    current_time = game.find_all(attrs={"class": "game-status emphasis"})

    if str(current_time[0]).split('<span>')[0].split('\n')[1].strip() == 'Halftime':
        final_minutes = 24
    elif str(current_time[0]).split('<span>')[0].split('\n')[1].strip().split(' ')[0] == 'End':
        quarter = int(str(current_time[0]).split('<span>')[0].split('\n')[1].strip().split(' ')[1])
        final_minutes = (4-quarter )*12
    else:
        span_split = str(current_time[0]).split('<span>')
        quarter = int(span_split[0][-1])
        span_split2 = str(current_time[0]).split('/span> ')
        time_split = span_split2[1].split(':')
        if len(time_split)==1:
            minutes = 0
            seconds = int(span_split2[1].split('.')[0])
        else:
            minutes = int(time_split[0])
            seconds = int(time_split[1].split('\n')[0])

        final_minutes = (4-quarter)*12 + minutes + round((seconds/60))
    return final_minutes

In [12]:
def parse_score(game):
    # returns team scores and size of lead
    
    away_score = int(game.findAll(attrs={'class':'in-progress-table'})[0].findAll('td')[5].string)
    home_score = int(game.findAll(attrs={'class':'in-progress-table'})[0].findAll('td')[11].string)
    lead = abs(away_score - home_score)
    return away_score, home_score, lead

In [13]:
def parse_channel(game):
    # returns channel for national broadcasts or 'league pass"
    
    channel = game.find_all(attrs={"class": "broadcaster"})[0].string.split('\n')[1].strip()
    if len(channel) > 0:
        if channel == 'NBAt':
            return "NBA TV"
        else:
            return channel
    else:
        return "League Pass"

In [14]:
def create_score_frame(games, combined_frame):
    # parse html into dataframe
    
    score_minutes = []
    away_scores = []
    home_scores = []
    score_leads = []
    away_teams = []
    home_teams = []
    away_probs = []
    home_probs = []
    status = []
    channels = []
    tipoffs = []

    for game in games:
            current_time = game.find_all(attrs={"class": "game-status emphasis"})
            if len(current_time) > 0:
                score_minutes.append(parse_current_time(game))
                away_score, home_score, lead = parse_score(game)
                away_scores.append(away_score)
                home_scores.append(home_score)
                score_leads.append(lead)
                status.append('Current')
                tipoffs.append('NA')

            else:
                pre_time = game.find_all(attrs={"class": "game-status pregame-date"})

                if len(pre_time) > 0:
                    tipoffs.append(str(pre_time[0].string).split('\n')[1].strip())
                    score_minutes.append(48)
                    away_scores.append(0)
                    home_scores.append(0)
                    score_leads.append(0)
                    status.append('Pre')
                else:
                    score_minutes.append(0)
                    away_score, home_score, lead = parse_score(game)
                    away_scores.append(away_score)
                    home_scores.append(home_score)
                    score_leads.append(lead)
                    status.append('Post')
                    tipoffs.append('NA')
            away_team = game.find_all(attrs={"class": "in-progress-table"})[0].find_all("a")[1].string
            away_teams.append(away_team)
            home_team = game.find_all(attrs={"class": "in-progress-table"})[0].find_all("a")[3].string
            home_teams.append(home_team)

            away_probs.append(combined_frame[combined_frame['team_short']==away_team]['prob'].values[0])
            home_probs.append(combined_frame[combined_frame['team_short']==home_team]['prob'].values[0])
            channels.append(parse_channel(game))

    game_df = pd.DataFrame({'Status':status,'Minutes':score_minutes,'Lead':score_leads,'Away Team':away_teams,'Home Team':home_teams,\
                            'Away Prob':away_probs,'Home Prob':home_probs,'Away Score':away_scores,'Home Score':home_scores,\
                           'Channel':channels,'Tipoff':tipoffs})
    
    return game_df

After scraping the scoreboard I need to score the matchups. I've applied the rule set below to match the original descriptions I described in the simulated game ratings.

In [15]:
def matchup(x):
    away_prob = x['Away Prob']
    home_prob = x['Home Prob']
    if min(away_prob,home_prob) >= 0.9:
        return 'A'
    elif ((max(away_prob,home_prob) >= 0.9) & (min(away_prob,home_prob) < 0.9) & (min(away_prob,home_prob) >= .05)):
        return 'B'
    elif ((max(away_prob,home_prob) < .9) & (min(away_prob,home_prob) >= .05)):
        return 'C'
    elif ((max(away_prob,home_prob) >= .05) & (min(away_prob,home_prob) < .05)):
        return 'D'
    elif (max(away_prob,home_prob) < .05):
        return 'E'

## 4. Rate Games

In [16]:
def score_games(game_df, sim_df, lm):
    # rate games
    
    game_df['Matchup'] = game_df.apply(matchup, axis=1)

    score_df = model_prep(game_df, sim_df)
    game_df['Rating'] = lm.predict(sm.add_constant(score_df, has_constant='add'))
    return game_df

def adjust_rating(x):
    # normalize ratings to ten-point scale
    
    return round((x-minval)/(maxval-minval)*10,1)

def display_frame(game_df, pre=False):
    # create output dataframe format

    if pre==False:
        game_df.sort_values(by='Rating', ascending=False, inplace=True)
        return_df = game_df[['Matchup','Minutes','Away Team','Home Team','Away Score',\
                                                      'Home Score','Channel','Rating']]
        return_df.columns = ['Matchup','Min Left','Away Team','Home Team','Away Score','Home Score','Channel','Rating']
        return return_df
    else:
        return_df = game_df[['Matchup','Tipoff','Away Team','Home Team','Channel','Rating']]
        return return_df

## 5. Refresh Scoreboard

In [17]:
def final_scoreboard(sim_df, combined_frame, lm):
    # scrape scoreboard data, rate games, and format dataframe for display

    games = pull_scoreboard()
    game_df = create_score_frame(games, combined_frame)


    # post games
    current_df = game_df[game_df['Status']=='Current'].copy()
    if len(current_df) > 0:
        current_df = score_games(current_df, sim_df, lm)
        current_df['Rating'] = current_df['Rating'].apply(adjust_rating)
        current_df = display_frame(current_df)

    # pre games
    pre_df = game_df[game_df['Status']=='Pre'].copy()
    if len(pre_df) > 0:
        pre_df['Matchup'] = pre_df.apply(matchup, axis=1)
        pre_df = score_games(pre_df, sim_df, lm)
        pre_df['Rating'] = pre_df['Rating'].apply(adjust_rating)
        pre_df = display_frame(pre_df, True)

    # post games
    post_df = game_df[game_df['Status']=='Post'].copy()
    if len(post_df) > 0:
        post_df = score_games(post_df, sim_df, lm)
        post_df['Rating'] = post_df['Rating'].apply(adjust_rating)
        post_df = display_frame(post_df)

    now = datetime.datetime.now()
    print("Last Updated At: ",now.strftime('%I:%M'),'\n')

    if len(current_df) > 0:
        print('Current Games:')
        display(current_df)

    if len(pre_df) > 0:
        print('\nUpcoming Games:')
        display(pre_df)

    if len(post_df) > 0:
        print('\nFinished Games:')
        display(post_df)

In [18]:
def run_scoreboard(sim_df, combined_frame,lm ):
    # ask user to specify how often they want to refresh the scoreboard and run continuously
    
    i = 0
    while (True):
        i += 1
        if i <= 1:
            refresh_rate = int(input("Home often (in minutes) do you want to refresh?"))
        if i > 1:
            time.sleep(refresh_rate*60)
        clear_output()
        final_scoreboard(sim_df, combined_frame, lm)

In [19]:
run_scoreboard(sim_df, combined_frame, lm)

Last Updated At:  09:03 

Current Games:


Unnamed: 0,Matchup,Min Left,Away Team,Home Team,Away Score,Home Score,Channel,Rating
1,C,10,Raptors,Grizzlies,101,97,League Pass,7.9
3,C,20,Warriors,Spurs,66,67,League Pass,7.6
4,B,24,Cavaliers,Suns,61,64,League Pass,6.7
0,D,9,Wizards,Bulls,87,80,League Pass,6.3
5,B,36,Bucks,Nuggets,37,42,NBA TV,5.1
2,E,20,Timberwolves,Mavericks,61,75,League Pass,1.0



Upcoming Games:


Unnamed: 0,Matchup,Tipoff,Away Team,Home Team,Channel,Rating
7,B,10:00pm,Thunder,Lakers,League Pass,4.1



Finished Games:


Unnamed: 0,Matchup,Min Left,Away Team,Home Team,Away Score,Home Score,Channel,Rating
6,C,0,Rockets,Hornets,94,119,League Pass,1.1


KeyboardInterrupt: 