<img src="ORLPHX.jpg" style="float: right; width: 40%">

### Improving an estimator for the Over/Under

My current prediction method takes into account offensive rating, defensive rating, and pace of play (see stats.nba.com's [glossary](https://www.nba.com/stats/help/glossary) for how these are defined). However, it does not take into account lineup changes. Here is an example of where this can be an issue: The Phoenix Suns - Orlando Magic current over/under is set at 209.5 (now at 212.5) by BetMGM (see [Yahoo Sports](https://sports.yahoo.com/nba/orlando-magic-phoenix-suns-2024111821/)) This is much lower than my estimator.


A quick run-down on terminology:

    - OFF_RATING = POINTS / 100 POSSESSIONS
    - DEF_RATING = POINTS GIVEN UP / 100 POSSESSIONS 
    - PACE       = POSSESSIONS / 48 MINUTES

The reason to scale this way is that some games go into overtime, and some teams play fast. We want a statistic that takes this into account. The reason why looking at *per 100 possessions* is that teams average pretty close to 100 possessions per game (usually a bit under). Taking these `formulas` into account, we can create estimates for game totals.

In [40]:
import pandas as pd
import nba_api.stats.endpoints as e
import warnings
warnings.filterwarnings('ignore')
pd.set_option("display.max_columns",None)
pd.set_option("display.max_rows",None)
pd.options.display.width = 0
pd.options.display.max_colwidth = 100

In [41]:
A = e.LeagueDashTeamStats(season="2024-25",measure_type_detailed_defense="Advanced",date_from_nullable='2024-10-22',date_to_nullable='2024-11-18').get_data_frames()[0]
A.loc[:,['TEAM_ID','TEAM_NAME','GP','W','L','MIN','OFF_RATING','DEF_RATING','PACE','POSS']].set_index("TEAM_ID").iloc[[21,23]]

Unnamed: 0_level_0,TEAM_NAME,GP,W,L,MIN,OFF_RATING,DEF_RATING,PACE,POSS
TEAM_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1610612753,Orlando Magic,14,8,6,672.0,108.3,103.5,98.75,1379
1610612756,Phoenix Suns,14,9,5,682.0,112.3,113.4,98.46,1400


In [47]:
o1,o2,d1,d2,p1,p2 = 108.3,112.3,103.5,113.4,98.75,98.46
p = (p1+p2)/2
o = (o1+o2)/2
d = (d1+d2)/2
estimator = p*(o+d)/100
print("over/under",round(estimator,2))

over/under 215.7



 There are many factors that my estimator is not taking into account, one of those being lineup changes.
 For the Magic, Banchero has been out since **Oct 31** and Wendell Carter Jr since **Nov 3rd**.
 For the Suns, Durant has been out since **Nov 8th** and Bradley Beal since **Nov 12**.
 Let's try and take this into account by changing how I get the offensive ratings, defensive ratings, and paces for each team.
 I can adjust the `date_to` and `date_from` parameters to obtain more recent data only


In [48]:
A = e.LeagueDashTeamStats(season="2024-25",measure_type_detailed_defense="Advanced",date_from_nullable='2024-11-03',date_to_nullable='2024-11-18').get_data_frames()[0]
A.loc[:,['TEAM_ID','TEAM_NAME','GP','W','L','MIN','OFF_RATING','DEF_RATING','PACE','POSS']].set_index("TEAM_ID").iloc[[21,23]]

Unnamed: 0_level_0,TEAM_NAME,GP,W,L,MIN,OFF_RATING,DEF_RATING,PACE,POSS
TEAM_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1610612753,Orlando Magic,8,5,3,384.0,106.2,99.4,97.25,776
1610612756,Phoenix Suns,8,4,4,389.0,112.8,117.7,97.17,788


In [50]:
o1,o2,d1,d2,p1,p2 = 106.2,112.8,99.4,117.7,97.25,97.17
o = (o1+o2)/2
d = (d1+d2)/2
p = (p1+p2)/2
estimator = p*(o+d)/100
print("over/under",round(estimator,2))

over/under 211.97



 As you can see, our new guess for over/under is a little better.
 But there are some other factors coming into play here
 One, both teams are traveling west. The Suns played in Minnesota yesterday, and the Magic are coming off a homestand in Orlando
 Moving all the way to Phoenix Arizona and changing timezones should affect them, at least a little bit.
 So the discrepancy that's left between 211.96 and the current line (209.5, now 212.5) can be due to multiple factors
 It could be due to sharp bettors trying to get in early on the under bet, or it could be due to the sportsbooks calculating in timezone travel and days rest, among other factors
 As a NBA betting nerd, I am excited to find out what actually happens.
 If anything, I expect each team to have about 97 possessions during regulation time (they clearly both average that many in 48 minutes).

### Another way to estimate the over/under using OnOffDetails

In [51]:
def get_on_off(team_id):
    df = e.TeamPlayerOnOffDetails(team_id=team_id,season='2024-25',measure_type_detailed_defense="Advanced").get_data_frames()
    ON,OFF = df[1],df[2]
    return pd.concat([ON,OFF])

PHX_ID = 1610612756
ORL_ID = 1610612753
PHX = get_on_off(PHX_ID)
ORL = get_on_off(ORL_ID)

In [52]:
BANCHERO_ID = 1631094
WENDELL_ID = 1628976
A = ORL['VS_PLAYER_ID'].isin([BANCHERO_ID,WENDELL_ID])
cols = ['TEAM_ABBREVIATION','VS_PLAYER_NAME','COURT_STATUS','MIN','OFF_RATING','DEF_RATING','PACE']
ORL.loc[A,cols]

Unnamed: 0,TEAM_ABBREVIATION,VS_PLAYER_NAME,COURT_STATUS,MIN,OFF_RATING,DEF_RATING,PACE
1,ORL,"Banchero, Paolo",On,182.0,115.5,107.8,101.83
5,ORL,"Carter Jr., Wendell",On,158.0,115.5,105.1,101.59
1,ORL,"Banchero, Paolo",Off,490.0,105.1,101.0,98.19
5,ORL,"Carter Jr., Wendell",Off,514.0,105.5,102.4,98.39


The Magic play a lot slower without Banchero in their lineup.

They are averaging 101.83 possessions per 48 minutes (PACE) when he's ON the floor and 98.19 when OFF.

In [53]:
orl_pace = (98.19+98.39)/2
orl_drtg = (101.0+102.4)/2
orl_ortg = (105.1+105.5)/2

In [54]:
PHX
KD_ID = 201142
BEAL_ID = 203078
A = PHX['VS_PLAYER_ID'].isin([KD_ID,BEAL_ID])
cols = ['TEAM_ABBREVIATION','VS_PLAYER_NAME','COURT_STATUS','MIN','OFF_RATING','DEF_RATING','PACE']
PHX.loc[A,cols]

Unnamed: 0,TEAM_ABBREVIATION,VS_PLAYER_NAME,COURT_STATUS,MIN,OFF_RATING,DEF_RATING,PACE
1,PHX,"Beal, Bradley",On,313.0,113.2,114.5,99.81
6,PHX,"Durant, Kevin",On,349.0,113.7,111.0,99.5
1,PHX,"Beal, Bradley",Off,369.0,108.5,109.5,99.99
6,PHX,"Durant, Kevin",Off,333.0,109.5,114.7,98.46


Without KD or without Beal, the suns PACE does not change, but their offensive ratings and defensive ratings do. Since *both* players are out, it's a little harder to calculate the pace of play. I am just going to take the average ratings of when they are off the floor, even though having them both out of the lineup probably makes the a lot worse.

In [55]:
import numpy as np
phx_pace = np.mean([99.99,98.46]) 
phx_drtg = np.mean([109.5,114.7])
phx_ortg = np.mean([108.5,109.5])

In [60]:
# estimator for over/under
pace = np.mean([orl_pace,phx_pace])
o = np.mean([orl_ortg,phx_ortg])
d = np.mean([orl_drtg,phx_drtg])
estimator = p*(o+d)/100
print("over/under",round(estimator,2))

over/under 208.08


This estimate is lower. This may be due to small sample size, or...

### One more thing: The possibility of OT

The frequency of overtime games that have spreads set near -4.5 is about $6$\%. This is *not to be ignored*! 

Our current estimate for the *over/under* assumes the game is $48$ minutes. This is because 

PACE = POSSESSIONS / 48 minutes  

In [61]:
ot_estimator = .94*estimator+.06*(53/48)*estimator
print("over/under",round(ot_estimator,2))

over/under 209.38


This is my final number. Hence, i'll take the under &#x1F600;