# NBA Win Probability Model

Win probability models have become more popular in recent years. [inpredictable.com](https://stats.inpredictable.com/nba/wpCalc.php) has a tremendous and well-documented NBA win probability model. ESPN and other outlets have comparable models. Michael Beuoy from `inpredictable.com` actually wrote a nice article comparing the performance of his model and ESPN's [here](http://www.inpredictable.com/2018/01/judging-win-probability-models.html).

From a basketball perspective, a win probability model can inform on-court decision-making, especially if teams have some sense of how plays, tactics, or strategies affect win probability. The following analysis walks through how to construct a win probability model from NBA play-by-play data using the PlayByPlay class from the [py_ball](https://github.com/basketballrelativity/py_ball) Python package. Since the `inpredictable.com` model uses game time, point differential, possession, and Vegas point spread, this model will leverage the same inputs.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.patches import Arc
import itertools
from requests import get
import json

from PIL import Image
import time

from py_ball import playbyplay

HEADERS = {'Connection': 'close',
           'Host': 'stats.nba.com',
           'Origin': 'http://stats.nba.com',
           'Upgrade-Insecure-Requests': '1',
           'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2)' + \
                         'AppleWebKit/537.36 (KHTML, like Gecko) ' + \
                         'Chrome/66.0.3359.117 Safari/537.36'}

pd.options.mode.chained_assignment = None  # Disabling pandas SetWithCopyWarnings

In [2]:
game_id = '0021800749'
plays = playbyplay.PlayByPlay(headers=HEADERS,
                              endpoint='playbyplayv2',
                              game_id=game_id)

play_df = pd.DataFrame(plays.data['PlayByPlay'])
play_df.head(35)

Unnamed: 0,EVENTMSGACTIONTYPE,EVENTMSGTYPE,EVENTNUM,GAME_ID,HOMEDESCRIPTION,NEUTRALDESCRIPTION,PCTIMESTRING,PERIOD,PERSON1TYPE,PERSON2TYPE,...,PLAYER3_ID,PLAYER3_NAME,PLAYER3_TEAM_ABBREVIATION,PLAYER3_TEAM_CITY,PLAYER3_TEAM_ID,PLAYER3_TEAM_NICKNAME,SCORE,SCOREMARGIN,VISITORDESCRIPTION,WCTIMESTRING
0,0,12,2,21800749,,,12:00,1,0,0,...,0,,,,,,,,,7:11 PM
1,0,10,4,21800749,Jump Ball Zizic vs. Bryant: Tip to Green,,12:00,1,4,5,...,201145,Jeff Green,WAS,Washington,1610613000.0,Wizards,,,,7:11 PM
2,58,1,7,21800749,,,11:43,1,5,0,...,0,,,,,,2 - 0,-2,Green 7' Turnaround Hook Shot (2 PTS),7:11 PM
3,98,2,9,21800749,MISS Zizic 2' Cutting Layup Shot,,11:27,1,4,0,...,0,,,,,,,,,7:11 PM
4,0,4,10,21800749,Zizic REBOUND (Off:1 Def:0),,11:26,1,4,0,...,0,,,,,,,,,7:11 PM
5,72,1,11,21800749,Zizic 1' Putback Layup (2 PTS),,11:24,1,4,0,...,0,,,,,,2 - 2,TIE,,7:11 PM
6,6,2,12,21800749,,,10:59,1,5,0,...,0,,,,,,,,MISS Satoransky 1' Driving Layup,7:12 PM
7,0,4,13,21800749,Osman REBOUND (Off:0 Def:1),,10:57,1,4,0,...,0,,,,,,,,,7:12 PM
8,98,2,14,21800749,MISS Osman 1' Cutting Layup Shot,,10:45,1,4,0,...,0,,,,,,,,,7:12 PM
9,0,4,15,21800749,,,10:41,1,5,0,...,0,,,,,,,,Green REBOUND (Off:0 Def:1),7:12 PM


The play-by-play data will be able to provide score differential, time remaining, and possession. However, the Vegas point spread information is missing. Fortunately, [sportsdatabase.com](http://sportsdatabase.com/) contains point spread information for NBA games. The data can be accessed via an API that accepts queries in the form of the Sports Data Query Language (SDQL) syntax. The following function provides a wrapper for the API that pulls point spread information for a given date.

In [3]:
def sportsdb_api_wrapper(game_date):
    """ API wrapper for pulling data from
    sportsdatabase.com for games on the given date

    @param game_date (str): Date in 'YYYYMMDD' format

    Returns:

        resp_df (DataFrame): DataFrame containing
            metadata from NBA games on the day provided
    """

    BASE_URL = "http://api.sportsdatabase.com/nba/query.json?" + \
               "sdql=date%2Cday%2Cseason%2Cteam%2Co:team%2Cpoints%2Co:points" + \
               "%2Crest%2Co:rest%2Cline%2Ctotal%2Covertime%40date%3D{game_date}+and+site%3Dhome" + \
               "&output=json&api_key=guest"
    headers = {'Connection': 'close',
               'Upgrade-Insecure-Requests': '1',
               'User-Agent': HEADERS['User-Agent']}

    api_response = get(BASE_URL.format(game_date=game_date), headers=headers)
    api_response = api_response.text.replace('json_callback(','')
    api_response = api_response.replace(');\n','')
    api_response = api_response.replace('\t','')
    api_response = api_response.replace("\'",'"')

    json_resp = json.loads(api_response)
    col_names = json_resp['headers']
    values = json_resp['groups'][0]

    resp_df = pd.DataFrame(values['columns']).T
    resp_df.columns = col_names
    return resp_df

In [4]:
test_json = sportsdb_api_wrapper('20190129')

In [5]:
test_json

Unnamed: 0,date,day,season,team,o:team,points,o:points,rest,o:rest,line,total,overtime
0,20190129,Tuesday,2018,Cavaliers,Wizards,116,113,1,1,6.5,219.0,0
1,20190129,Tuesday,2018,Lakers,Seventysixers,105,121,1,2,8.0,229.5,0
2,20190129,Tuesday,2018,Magic,Thunder,117,126,1,1,5.5,222.0,0
3,20190129,Tuesday,2018,Nets,Bulls,122,117,0,1,-6.5,220.0,0
4,20190129,Tuesday,2018,Pistons,Bucks,105,115,3,1,7.5,218.0,0
5,20190129,Tuesday,2018,Rockets,Pelicans,116,121,1,2,-10.0,234.0,0
6,20190129,Tuesday,2018,Spurs,Suns,126,124,1,1,-13.0,226.0,0
