## NBA accurate odds calculator (and tester)

This is the first version of my NBA accurate odds calculator. More complex and precise editions of this software have been developed, including throwing in a little bit of machine learning and accounting for individual games stats such as field goal pct, rebounds and so on (accomplished by scraping the NBA's official scorer's report for each specific game). Nevertheless, the core idea and procedures has remained the same ones regardless of the version. If you wish to learn more about those future editions, do not hesitate to contact me.

This project only uses the data provided in the betting_expanded.csv. The csv consists of ten columns.
Home Team / Away Team and Home Points / Away Points are self-explanatory.
Home Odds and Away Odds are the betting odds provided by Odds Portal in the european format (which simply states the money you'll end up having if you've bet $1 and win). All bets are betting the winner of the game.
Home and Away Pct indicate the home and away team's winning percentatge (for home and road games respectively) up to (and including) that row's game.
Home and Away Ndf indicate the average home and away's net point diferential at home and on the road respectively up to (and including) that row's game.

Although the code is not shown here, getting the csv data is not complicated. I've used the betting data provided by oddsportal.com (which is pretty easy to scrape) and afterwards run another simple script to add the winning percentage and net differential columns to the dataframe.

## Handling for the linear regressions

The key to calulate the mathematically exact odds for each team in a specific game is to correctly assess the team's strenght against the rest of the league. In this version of the software this is done with a linear regression which plots the mean ndf (net points differential) of the oposing team against the net diferential of the game.

For example, supose that A is playing at home against B. B has, up to this point, on average lost by 4 on the road (away ndf of -4); and the game finishes with A winning by 7. On A's home regression plot we'll mark the point with coordinates (7,4), and on B's away regression we would mark (-7,A's home ndf).

It is reasonable to expect that the points will, more or less, draw a descending line, as you are expected to perform better against bad teams that have a below 0 ndf than against good teams that boast a tremendous ndf.

This method is also useful for another thing: with a linear regression comes a variance from the line, a darn good measure of the inconsistency of the team.

In [1]:
import pandas as pd
import numpy as np

In [2]:
df = pd.read_csv('data/betting_expanded.csv')

In [3]:
teams = ['Bucks','Raptors','Celtics','Heat','Pacers','76ers','Nets','Magic','Wizards','Hornets','Bulls','Knicks',
         'Pistons','Hawks','Cavaliers','Lakers','Clippers','Nuggets','Jazz','Thunder','Rockets','Mavericks','Grizzlies',
         'Trail Blazers','Pelicans','Kings','Spurs','Suns','Timberwolves','Warriors']

In [4]:
home_ndf_regs, away_ndf_regs = {},{}
home_games_fetched, away_games_fetched = {},{}
for team in teams:
    home_ndf_regs[team], away_ndf_regs[team] = [None] * 41, [None] * 41
    home_games_fetched[team], away_games_fetched[team] = 0,0

In [9]:
for i,game in df.iterrows():
    home_game_update = update_ndf_regs(game,True,home_ndf_regs[game["Home Team"]][home_games_fetched[game["Home Team"]]],home_games_fetched[game["Home Team"]])
    away_game_update = update_ndf_regs(game,False,away_ndf_regs[game["Away Team"]][away_games_fetched[game["Away Team"]]],away_games_fetched[game["Away Team"]])
    
    home_ndf_regs[game["Home Team"]][home_games_fetched[game["Home Team"]]] = home_game_update
    away_ndf_regs[game["Away Team"]][away_games_fetched[game["Away Team"]]] = away_game_update
    
    home_games_fetched[game["Home Team"]] += 1
    away_games_fetched[game["Away Team"]] += 1
    

In [5]:
def update_ndf_regs(game,home,regression,n):
    if home:
        x, y, team = game["Away Ndf"], game["Home Points"] - game["Away Points"], game["Home Team"]
    elif not home:
        x,y, team = game["Home Ndf"], game["Away Points"] - game["Home Points"], game["Away Team"]
        
        
    if n == 0:
        if home:
            return {"sum_x": x, "sum_y": y, "sum_x2":x*x,"sum_xy":x*y,"m":0.0,"b":y,"std":0.0,"mean":y}
        elif not home:
            return {"sum_x": x, "sum_y": y, "sum_x2":x*x,"sum_xy":x*y,"m":0.0,"b":y,"std":0.0,"mean":y}
    else:
        
        rels = fetch_relevants(home,n,team)
        n, mean, std = update_std((n,rels["mean"],rels["std"]),y)
        
        
        if home: return append_relevants(rels,x,y,n,mean,std)
        elif not home: return append_relevants(rels,x,y,n,mean,std)
        

In [6]:
def fetch_relevants(home,n,team):
    if home:
        return  home_ndf_regs[team][n-1]
    elif not home:
        return away_ndf_regs[team][n-1]


In [7]:
def append_relevants(rels,x,y,n,mean,std):
    m_a = rels["sum_xy"] + x*y
    m_b = (rels["sum_x"] + x) * (rels["sum_y"] + y)
    m_c = rels["sum_x2"] + x*x
    m_d = pow(rels["sum_x"]+x,2)

    m = 0
    if (n*m_c - m_d) != 0: m = (n*m_a - m_b)/(n*m_c - m_d)
    b = (rels["sum_y"] - m*(rels["sum_x"] + x))/n
    
    return {"sum_x": rels["sum_x"] + x, "sum_y": rels["sum_y"]+y, "sum_x2":rels["sum_x2"] + x*x, "sum_xy": rels["sum_xy"] + x*y,
            "m": m, "b": b, "mean":mean,"std":std}

In [8]:
def update_std(existing,new):
    (n, mean, old_std) = existing
    dif = old_std*old_std * (n + 1)
    n += 1
    delta1 = new - mean
    mean += delta1 / n
    delta2 = new - mean
    dif += delta1 * delta2
    return (n,mean,np.sqrt(dif/n))


## Iterate through df and append expected ndfs

In [10]:
home_games_fetched, away_games_fetched = {},{}
for team in teams:
    home_games_fetched[team], away_games_fetched[team] = 0,0

In [12]:
df['Home Expected Ndf'] = df.apply(lambda row: expected_ndf(row,True), axis=1)
df['Away Expected Ndf'] = df.apply(lambda row: expected_ndf(row,False), axis=1)

In [11]:
def expected_ndf(row, home):
    if home:
        team, target_ndf = row["Home Team"], row['Away Ndf']
        target_line = home_ndf_regs[team][home_games_fetched[team]]
        home_games_fetched[team] += 1
        return target_line['m'] * target_ndf + target_line['b']

    elif not home:
        team, target_ndf = row["Away Team"], row['Home Ndf']
        target_line = away_ndf_regs[team][away_games_fetched[team]]
        away_games_fetched[team] += 1
        return target_line['m'] * target_ndf + target_line['b']
        
    
    
    

## Same for std columns

In [13]:
home_games_fetched, away_games_fetched = {},{}
for team in teams:
    home_games_fetched[team], away_games_fetched[team] = 0,0

In [15]:
df['Home Std'] = df.apply(lambda row: expected_std(row,True),axis=1)
df['Away Std'] = df.apply(lambda row: expected_std(row,False),axis=1)

In [14]:
def expected_std(row,home):
    expected_std = 0
    if home:
        expected_std = home_ndf_regs[row['Home Team']][home_games_fetched[row['Home Team']]]['std']
        home_games_fetched[row['Home Team']] += 1
    elif not home:
        expected_std = away_ndf_regs[row['Away Team']][away_games_fetched[row['Away Team']]]['std']
        away_games_fetched[row['Away Team']] += 1
        
    return expected_std / 2

## fair odds

In [18]:
home_games_fetched, away_games_fetched = {},{}
for team in teams:
    home_games_fetched[team], away_games_fetched[team] = 0,0

In [19]:
df['Home Fair Odds'] = df.apply(lambda row: fair_odds(row,True),axis = 1)
df['Away Fair Odds'] = df.apply(lambda row: fair_odds(row,False),axis = 1)

In [51]:
def fair_odds(row,home):
    expected_ndf = row['Home Expected Ndf'] - row['Away Expected Ndf']
    mean_std = (row['Home Std'] + row['Away Std']) / 2
    
    home_wins = True if expected_ndf >= 0 else False
    
    if mean_std != 0:
        fair_winning_probability = 1/2 * (1 + erf(-abs(expected_ndf)/mean_std / 2))
    
    else:
        fair_winning_probability = 0.99
    
    if fair_winning_probability == 1: fair_winning_probability = .99
    if fair_winning_probability == 0: fair_winning_probability = .01
    
    if home_wins and home: return 1 / (1 - fair_winning_probability)
    if home_wins and not home: return 1/fair_winning_probability
    if not home_wins and home: return 1 / fair_winning_probability
    if not home_wins and not home: return 1 / (1 - fair_winning_probability)

In [17]:
import math
def erf(x):
    sign = 1 if x >= 0 else -1
    x = abs(x)

    a1 =  0.254829592
    a2 = -0.284496736
    a3 =  1.421413741
    a4 = -1.453152027
    a5 =  1.061405429
    p  =  0.3275911

    t = 1.0/(1.0 + p*x)
    y = 1.0 - (((((a5*t + a4)*t) + a3)*t + a2)*t + a1)*t*math.exp(-x*x)
    return sign*y

## Testing against benchmark (Vegas)

In [50]:
threshold, start, leverage_factor = 1, 700, 10
wins, losses, money, total_bet = 0,0,0, 0
for i, game in df.iterrows():
    home_threshold = above_threshold(game['Home Fair Odds'],threshold)
    away_threshold = above_threshold(game['Away Fair Odds'],threshold)
    
    if home_threshold < game['Home Odds'] and i > start:
        #bet home team
        if game['Home Points'] > game['Away Points']:
            wins += 1
            money += leverage(home_threshold,game['Home Odds']) * (game['Home Odds'] - 1)
            total_bet += leverage(home_threshold,game['Home Odds'])
            
        else:
            losses += 1
            money -= leverage(home_threshold,game['Home Odds'])
            total_bet += leverage(home_threshold,game['Home Odds'])
            
    if away_threshold < game['Away Odds'] and i > start:
        #bet away team
        if game['Home Points'] < game['Away Points']:
            wins += 1
            money += leverage(home_threshold,game['Away Odds']) * (game['Away Odds'] - 1)
            total_bet += leverage(home_threshold,game['Away Odds'])
            
        else:
            losses += 1
            money -= leverage(home_threshold,game['Away Odds'])
            total_bet += leverage(home_threshold,game['Away Odds'])
            
print('Wins, Losses: ',wins,losses, '(',round(wins/(wins+losses),3),') %')
print('Average Bet size: ', total_bet/(wins+losses))
print('Total Money won: ',money)
print('Average bet result: +',money/(wins+losses))
print('Percentage of games bet: ',round(100*(wins+losses)/(len(df)-start),3), ' %')
print('Percentage of money bet won: ', round(100*money/total_bet,3),' %')

Wins, Losses:  131 67 ( 0.662 ) %
Average Bet size:  12.645540029799092
Total Money won:  1571.599963086846
Average bet result: + 7.937373550943667
Percentage of games bet:  37.429  %
Percentage of money bet won:  62.768  %


In [21]:
def above_threshold(value, threshold):
    return (value-1)*(threshold+1)+1

In [22]:
def leverage(fair,odds):
    return leverage_factor * odds / fair

In [71]:
def difference_std(reg,xs,ys):
    difs = []
    for i,x in enumerate(xs):
        expected_y = reg[0] * x + reg[1]
        difs.append(ys[i]-expected_y)
    return np.std(difs)