## NBA Moneyline Model

Since sportsbook odds are simply implied probabilities, I am building a logistic regression model to "price" the bets myself. By comparing the odds the model gives a team to win with the moneyline odds Vegas gives them to win, I can uncover value if my probability > sportsbook probability.

#### Libraries

In [None]:
! pip install nba_api scikit-learn

In [2]:
from nba_api.stats.static import teams
from nba_api.stats.endpoints import leaguegamefinder
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
import pandas as pd

#### Data

I decided to use the following target variables for each team: Net Rating (NRtg), Simple Rating System (SRS), Effective Field Goal Percentage (eFG%), Offensive Rebound Percentage (ORB%), Turnover Percentage (TOV%), Defensive Rebound Percentage (DRB%), and whether the team was home or away (inputted as binary 1 (home) or 0 (away)). 

I pulled the data from basketball-reference.com. Here is what it looks like:

In [11]:
df = pd.read_csv('nba_24-25_advanced_team_stats.csv')
team_stats = df[['Team', 'SRS','NRtg', 'OeFG%','OTOV%','OORB%', 'DRB%']].copy()
team_stats['Team'] = team_stats['Team'].apply(lambda x: x.replace('*', ''))
print(team_stats.head())

                     Team    SRS  NRtg  OeFG%  OTOV%  OORB%  DRB%
0   Oklahoma City Thunder  12.70  12.8  0.560   10.3   24.2  74.6
1          Boston Celtics   8.28   9.5  0.561   10.8   25.7  76.0
2     Cleveland Cavaliers   8.81   9.5  0.578   11.6   25.9  74.8
3  Minnesota Timberwolves   5.15   5.1  0.554   13.0   25.8  75.1
4    Los Angeles Clippers   4.84   4.8  0.554   13.4   24.4  77.5


Now, we can start to build up the training and testing dataset. As an avid Bucks fan living in Boston, I will be betting on the Milwaukee Bucks at home in their last regular season matchup against the Boston Celtics as an example for this notebook. I begin by pulling data for every single regular season game.

In [12]:
team_id = '1610612749' #Bucks corresponding team ID to pull data

gamefinder = leaguegamefinder.LeagueGameFinder(team_id_nullable=str(team_id),
                                        season_type_nullable='Regular Season')
games = gamefinder.get_data_frames()[0]
print(games.head())

  SEASON_ID     TEAM_ID TEAM_ABBREVIATION        TEAM_NAME     GAME_ID  \
0     22025  1610612749               MIL  Milwaukee Bucks  1522500063   
1     22025  1610612749               MIL  Milwaukee Bucks  1522500048   
2     22025  1610612749               MIL  Milwaukee Bucks  1522500030   
3     22025  1610612749               MIL  Milwaukee Bucks  1522500016   
4     22025  1610612749               MIL  Milwaukee Bucks  1522500006   

    GAME_DATE      MATCHUP WL  MIN  PTS  ...  FT_PCT  OREB  DREB  REB  AST  \
0  2025-07-18  MIL vs. MIA  L  201   92  ...   0.810    13    31   44   16   
1  2025-07-16    MIL @ CHI  L  199   96  ...   0.900     9    26   35   23   
2  2025-07-13    MIL @ LAC  L  201   91  ...   0.826     5    24   29   24   
3  2025-07-12  MIL vs. CLE  L  201   83  ...   0.652    12    18   30   16   
4  2025-07-10  MIL vs. DEN  W  199   90  ...   0.565     3    24   27   25   

   STL  BLK  TOV  PF  PLUS_MINUS  
0    9    9   17  17        -1.0  
1    6    6   15

Now, we can filter the last 200 games and grab the opponent, home/away, and result data in chronological order.

In [14]:
#map team abbreviation to mascot
team_dict = {'GSW': 'Warriors', 'CHI': 'Bulls', 'CLE': 'Cavaliers', 
            'ATL': 'Hawks', 'BOS': 'Celtics', 'BKN': 'Nets', 'CHA': 'Hornets', 
            'DAL': 'Mavericks', 'DEN': 'Nuggets', 'DET': 'Pistons', 
            'HOU': 'Rockets', 'IND': 'Pacers', 'LAC': 'Clippers', 
            'LAL': 'Lakers', 'MEM': 'Grizzlies', 'MIA': 'Heat', 'MIL': 'Bucks', 
            'MIN': 'Timberwolves', 'NOP': 'Pelicans', 'NYK': 'Knicks',
            'OKC': 'Thunder', 'ORL': 'Magic', 'PHI': '76ers', 'PHX': 'Suns',
            'POR': 'Blazers', 'SAC': 'Kings', 'SAS': 'Spurs', 
            'TOR': 'Raptors','UTA': 'Jazz', 'WAS': 'Wizards'}

def getOpponent(str):
    if 'vs.' in str:
        opp_index = str.find('.')
        opponent = str[opp_index+2:]
        return opponent, 1
    elif '@' in str:
        opp_index = str.find('@')
        opponent = str[opp_index+2:]
        return opponent, 0

opponents = []
win_loss = []
home_away = []
for index, row in games.head(200).iterrows():
    matchup = games.loc[index, 'MATCHUP']
    opponent, homeAway = getOpponent(matchup)
    mascot = team_dict[opponent]
    opponents.append(mascot)
    home_away.append(homeAway)
    outcome = games.loc[index, 'WL']
    if outcome == 'W':
        win_loss.append(1)
    else:
        win_loss.append(0)
game_log = pd.DataFrame(
    {
        'Opponent': opponents[::-1],
        'H1/A0': home_away[::-1],
        'W/L': win_loss[::-1]
    }
)
print(game_log.head())
print(f'Rows: {len(game_log)}')

  Opponent  H1/A0  W/L
0    Magic      1    1
1    76ers      1    0
2  Wizards      0    1
3    Magic      0    1
4     Nets      1    1
Rows: 200


Now, we just need to fill in the feature column data from the basketball-reference csv file in cell 3.

In [16]:
for team in team_stats['Team']:
    fullName = team.split()
    mascot = fullName[-1]
    team_stats.loc[team_stats['Team'] == team, 'Team'] = mascot
nrtg = []
srs = []
eFG = []
off_reb_pct = []
turnover_pct = []
def_reb_pct = []
for opposition in game_log['Opponent']:
    temp = team_stats.loc[team_stats['Team'] == opposition]
    row = temp.iloc[0].to_dict()
    nrtg.append(row['NRtg'])
    srs.append(row['SRS'])
    eFG.append(row['OeFG%'])
    off_reb_pct.append(row['OORB%'])
    turnover_pct.append(row['OTOV%'])
    def_reb_pct.append(row['DRB%'])
features = {'NRtg': nrtg, 'SRS': srs, 'eFG%': eFG, 'ORB%': off_reb_pct,
            'TOV%': turnover_pct, 'DRB%': def_reb_pct}
for key in features:
    game_log[key] = features[key]
print(game_log.head())

  Opponent  H1/A0  W/L  NRtg    SRS   eFG%  ORB%  TOV%  DRB%
0    Magic      1    1  -0.2  -0.70  0.510  25.3  12.9  77.0
1    76ers      1    0  -6.3  -6.29  0.527  23.4  12.3  71.8
2  Wizards      0    1 -12.3 -12.14  0.512  22.7  13.6  71.9
3    Magic      0    1  -0.2  -0.70  0.510  25.3  12.9  77.0
4     Nets      1    1  -7.3  -6.95  0.516  24.5  13.7  75.0


Lastly, we create the dataset that we will make a prediction on.

In [35]:
opponent = 'BOS'
prediction_opponent = team_dict[opponent.upper()]
venue = 1 #1 for home, 0 for away

opp_temp = team_stats.loc[team_stats['Team'] == prediction_opponent]
opp_row = opp_temp.iloc[0].to_dict()
predict_data = pd.DataFrame(
    {
        'H1/A0': [venue],
        'NRtg': [opp_row['NRtg']],
        'SRS': [opp_row['SRS']],
        'eFG%': [opp_row['OeFG%']],
        'ORB%': [opp_row['OORB%']],
        'TOV%': [opp_row['OTOV%']],
        'DRB%': [opp_row['DRB%']]
    }
)
print(predict_data)

   H1/A0  NRtg   SRS   eFG%  ORB%  TOV%  DRB%
0      1   9.5  8.28  0.561  25.7  10.8  76.0


#### Training the Model

We will use scikit-learn's train_test_split function to split the data into a training and testing set that we can fit the model on. Note that since past results are sequential, shuffle should be set to False when splitting the data.

In [None]:
feature_cols = ['H1/A0', 'NRtg', 'SRS', 'eFG%', 'ORB%', 'TOV%', 'DRB%']
X = game_log[feature_cols] #features
y = game_log['W/L'] #target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=False)
log_reg = LogisticRegression(C=0.5, solver='saga', max_iter=10**5)
log_reg.fit(X_train, y_train)

train_accuracy = log_reg.score(X_train, y_train)
test_accuracy = log_reg.score(X_test, y_test)

print(f"Training Accuracy: {round(train_accuracy, 3)}")
print(f"Test Accuracy: {round(test_accuracy, 3)}")

Training Accuracy: 0.669
Test Accuracy: 0.65


As we can see, the model performs quite well. Since home teams historically have had an advantage, we can use picking the home team every time as a baseline, which yields around a 55% accuracy, compared to our 65% accuracy. Additionally, the similarities between the training and testing accuracy indicate that the model is able to generalize to new data.

#### Making the Prediction

In [37]:
result = log_reg.predict_proba(predict_data)
win_prob = result[0, 1]
print(f'Win Probability: {round(win_prob, 3)}')

Win Probability: 0.486


We see that the model gives the Bucks a 48.6% chance of winning at home against the Boston Celtics based on the team's historical performances. 

#### Finding Value against the Sportsbooks

As previously mentioned, sportsbook odds are implied probabilities and it can be easily calculated as such.

For negative odds:
$$
\text{Implied Probability} = \frac{|\text{Odds}|}{|\text{Odds}| + 100}
$$

For positive odds:
$$
\text{Implied Probability} = \frac{100}{\text{Odds} + 100}
$$

BetMGM gave the Milwaukee Bucks +165 odds to win in the matchup from November 2024, thus quoting them at a 37.7% win probability. 

Therefore, we can calculate exactly how much of an edge we have against the book using expected value. Betting $1 at +165 odds means you receive $2.65, making a $1.65 profit. Let $p=0.486$ be our calculated probability of winning the bet; the expected value is
$$
EV = (p*2.65) + ((1-p)*0) \approx 1.29
$$

So for a $1 bet, we are expected to receive approximately $1.29 or $0.29 profit. Therefore, this is a valuable bet that should be taken.