# Looking for differences in predictions between vegas odds and 538

Before Nate Silver became famous for his political predictions, one of his most notable accomplishment was developing a robust baseball analysis/prediciton, [PECOTA](https://en.wikipedia.org/wiki/PECOTA). To this day, FiveThirtyEight still has a great amount of sports predictions on their [website](https://projects.fivethirtyeight.com/2021-mlb-predictions/games/).

Given Silver's Expereince, Skill, and Previous Sucsess in sports analytics. I want to explore two questions:
 - Is 538's Sports Predictions more accurate than Sports betting institutions?
 - If it is more accurate, can a person generate a profit, by levereging 538?
 
To explore these questions, I plan on:
 - Comparing baseball outcome predictions made by 538 to those implied in Draft Kings moneylines
 - Calculating and comparing the error rates of 538's and Draft Kings predictions (measured like a logistic regression error)
 - Looking at the expected outcomes when multiplying 538's probabilitys by Draft King's moneyline payouts
 - Looking at the short/medium/long term profits (or losses) by betting on games with positive expected outcomes

In [86]:
import pandas as pd

### First, a breakdown of how a moneyline is calculated into percentages

#### For negative moneylines (Favorites to win)

( –1 * negative_moneyline_odds ) / ((–1 * negative_moneyline_odds) + 100)

for example, lets say someone has a moneyline of -140, meaning that if someone bets 140, they will win a profit of 100

The equation becomes:

(-1 * -140) / ((-1 * -140) + 100)

This simplifies to:

(140)/(240)

which becomes

58.33%

#### For positive moneylines (Underdogs)

100 / (positive_moneyline_odds + 100)

for example, lets say someone has a moneyline of +140, meaning that if someone bets 100, they will win a profit of 140

The equation becomes:

100 / (140 + 100)

This simplifies to:

(100)/(240)

which becomes

41.67%

In [87]:
## Lets build functions that gives percentages based off of a moneyline
def get_pct_from_moneyline(moneyline):
    
    moneyline = int(moneyline)
    
    ## If the moneyline is negative
    if moneyline < 0:
        
        pct = (-1 * moneyline) / ((-1 * moneyline) + 100)

    ## If the moneyline is positive
    elif moneyline > 0:
        
        pct = 100 / (moneyline + 100)

    pct = 100 * round(pct, 4)
    return pct

In [88]:
## we read the dataset
dat = pd.read_csv("sample_mlb_odds_data.csv")

## then apply our odds function to our draftkings moneyline to get our draftkings raw probability percentage
dat["raw_home_odds_dk"] = dat["home_moneyline_dk"].apply(get_pct_from_moneyline)
dat["raw_away_odds_dk"] = dat["away_moneyline_dk"].apply(get_pct_from_moneyline)

## sports book odds add up to 100 + the padding percentage, or their estimated profit margin
dat["dk_total_padding_val"] = dat["raw_home_odds_dk"] + dat["raw_away_odds_dk"] - 100

## This padding percentage is (THEORETICALLY) added equally on both sides, or at least in very close amounts
## We rescale the raw percentages back to a 100% total scale to get our final percentage
dat["home_odds_dk"] = dat["raw_home_odds_dk"] / (dat["raw_home_odds_dk"] + dat["raw_away_odds_dk"])
dat["home_odds_dk"] = round(dat["home_odds_dk"], 4) * 100

dat["away_odds_dk"] = dat["raw_away_odds_dk"] / (dat["raw_home_odds_dk"] + dat["raw_away_odds_dk"])
dat["away_odds_dk"] = round(dat["away_odds_dk"], 4) * 100

dat.head(5)

Unnamed: 0,Match_id,date,home_team,away_team,home_odds_538,away_odds_538,home_moneyline_dk,away_moneyline_dk,raw_home_odds_dk,raw_away_odds_dk,dk_total_padding_val,home_odds_dk,away_odds_dk
0,1,5/26,Angels,Orioles,60,40,-140,120,58.33,45.45,3.78,56.21,43.79
1,2,5/26,Reds,Brewers,47,53,-105,-115,51.22,53.49,4.71,48.92,51.08
2,3,5/26,White Sox,Blue Jays,47,53,-135,115,57.45,46.51,3.96,55.26,44.74
3,4,5/26,Diamondbacks,Phillies,43,57,150,-170,40.0,62.96,2.96,38.85,61.15
4,5,5/26,Cardinals,Pirates,60,40,-180,155,64.29,39.22,3.51,62.11,37.89


In [89]:
## Now lets look at differences between 538's probabilities and draftkings
dat['diff_pct_home'] = dat["home_odds_538"] - dat["home_odds_dk"]
dat['diff_pct_away'] = dat["away_odds_538"] - dat["away_odds_dk"]


dat[["date", "home_team", "away_team", "diff_pct_home" , "diff_pct_away"]].head(12)

Unnamed: 0,date,home_team,away_team,diff_pct_home,diff_pct_away
0,5/26,Angels,Orioles,3.79,-3.79
1,5/26,Reds,Brewers,-1.92,1.92
2,5/26,White Sox,Blue Jays,-8.26,8.26
3,5/26,Diamondbacks,Phillies,4.15,-4.15
4,5/26,Cardinals,Pirates,-2.11,2.11
5,5/26,Twins,Red Sox,9.73,-9.73
6,5/26,Rangers,Indians,-4.02,4.02
7,5/26,Nationals,Marlins,0.0,0.0
8,5/26,Giants,Mets,-3.18,3.18
9,5/26,Dodgers,Padres,5.82,-5.82


#### Proof of concept

For this proof of concept, I am going off of the heuristic of only betting on matches where fivethirtyeight estimates the probability of a team winning by at least 5 percent more than draft kings.

For today (8/26), I will be placing a 1$ bet on:
 - The Blue Jays beating the White Sox
 - The Twins beeating the Red Sox
 - The Dodgers beating the Padres

#### Expected odds of these bets

In [90]:
def bet_outcome(bet, moneyline):
    
    moneyline = int(moneyline)
    
    ## if the moneyline is negative (betting favortie)
    if moneyline < 0:
        
        outcome = (bet * 100) / abs(moneyline)
        
    ## if the moneyline is positive (underdog)
    if moneyline > 0:
        
        outcome = (bet * moneyline) / 100

    outcome = bet + round(outcome , 2)
        
    return outcome

#### My previous heuristic made a pretty large error

I forgot to consider the moneyline when deciding who to bet on, while there was a >5% difference between 538's probability and draftkings, the actual payout of the bet would have been awful.

#### Basic Heuristic 2.0
I am only betting on teams that have an expected outcome of 1.05$ (on a 1 dollar bet) or higher. 

This means I will be betting on:
 - The Blue Jays beating the White Sox
 - The Diamondbacks beating the Phillies
 - The Minnesota Twins beating the Red Sox
 - the Dodgers beating the Padres

In [91]:
## We get our winnings columns by applying our bet_ouctome function to the original moneyline
dat["winnings_on_1_dollar_home"] = dat["home_moneyline_dk"].apply(lambda ml: bet_outcome(1, ml))
dat["winnings_on_1_dollar_away"] = dat["away_moneyline_dk"].apply(lambda ml: bet_outcome(1, ml))

## And get our expected outcome by multipling the winnings by the probability
dat["expected_outcome_home_538"] = dat["winnings_on_1_dollar_home"] * dat["home_odds_538"] / 100
dat["expected_outcome_away_538"] = dat["winnings_on_1_dollar_away"] * dat["away_odds_538"] / 100


dat[["date","home_team","away_team","expected_outcome_home_538","expected_outcome_away_538"]].head(11)

Unnamed: 0,date,home_team,away_team,expected_outcome_home_538,expected_outcome_away_538
0,5/26,Angels,Orioles,1.026,0.88
1,5/26,Reds,Brewers,0.9165,0.9911
2,5/26,White Sox,Blue Jays,0.8178,1.1395
3,5/26,Diamondbacks,Phillies,1.075,0.9063
4,5/26,Cardinals,Pirates,0.936,1.02
5,5/26,Twins,Red Sox,1.295,0.8253
6,5/26,Rangers,Indians,0.874,1.0354
7,5/26,Nationals,Marlins,0.955,0.955
8,5/26,Giants,Mets,0.8967,1.02
9,5/26,Dodgers,Padres,1.0614,0.84


In [92]:
dat.to_csv("sample_data_calculations_8_26.csv")