
# Don't Be a Sucker
## Part 4
by Casey Durfee <csdurfee@gmail.com>
Copyright 2025

## The Paradox of Mediocrity

If the lines are unbiased, then choosing your bets by flipping a coin would lead to winning 50% of the time. Losing substantially more or less than 50% is equally improbable, for a coin or a human. 

Imagine an "unlucky" bettor who lost 60% of the time. They would actually be the greatest of all time, because you could just take the opposite side of their bets and win 60% of the time.

If you assume the lines are fair, bad bettors don't go broke because they lose a lot. They go broke because they can't win often enough to beat the vig, or risk too much on each bet. They may do worse than flipping a coin, but not a lot worse. 

If a coin wins 50% of the time and somebody who is really into gambling also wins 50% of the time, then it's not really a game of skill. It's not a thing you can get better at by learning the lore and following the conventional wisdom.

Being bad at betting means still being right half the time. Being good at betting means being right 56% of the time. I'm not sure how you would tell the difference, except on large volumes of bets. Bettors who lose always seem to be able to generate a story that explains why the bet didn't win. I'll have a lot more to say about these stories. But first, some math.

### Handicapper vs Coin

Say that a skilled handicapper who wins 56% of the time decides on the bets they like, then they do another set of picks by flipping a coin and track who wins. There are usually around 7 NBA games a night. The vig is the standard -110.

1) What percent of the time will the handicapper get more the wins than coin? 
2) What percent of the time will the coin get more wins? 
3) What percent of the time will they tie?
4) What percent of the time does the coin have a winning day? 
5) What percent of the time does the handicapper have a winning day? 

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.stats

rng = np.random.default_rng(2718)

In [2]:
WIN_PCT = .56
FAIR_COIN = .5
NIGHTS = 100000
GAMES_A_NIGHT = 7

VIG = 1.1
MIN_WIN_PCT = .524 # breakeven win rate vs the vig

def handi_vs_coin(games, unit="days", emotional_support_bet=False):
    handi_wins = handi_losses = handi_ties = handi_winning_days = coin_winning_days = 0
    handi_overall_wins = handi_overall_losses = coin_overall_wins = coin_overall_losses = 0

    for x in range(NIGHTS):
        handicapper_record = [(rng.random() < WIN_PCT) for x in range(games)] # 1 for win, 0 for loss
        coin_record = [(rng.random() < FAIR_COIN) for x in range(games)]

        if emotional_support_bet:
            handicapper_record.append((rng.random() < FAIR_COIN))

        handi_day_wins = sum(handicapper_record)
        handi_day_losses = len(handicapper_record) - handi_day_wins
        handi_overall_wins += handi_day_wins
        handi_overall_losses += handi_day_losses

        coin_day_wins = sum(coin_record)
        coin_day_losses = len(coin_record) - coin_day_wins
        coin_overall_wins += coin_day_wins
        coin_overall_losses += coin_day_losses

        # who won the day -- the handicapper or the coin?
        if handi_day_wins > coin_day_wins:
            handi_wins += 1
        elif handi_day_wins == coin_day_wins:
            handi_ties += 1
        else:
            handi_losses += 1

        # who had a winning day (including cost of the vig)?
        if handi_day_wins > (MIN_WIN_PCT * games):
            handi_winning_days += 1
        if coin_day_wins > (MIN_WIN_PCT * games):
            coin_winning_days += 1 

    handi_profit_loss = handi_overall_wins - (VIG * handi_overall_losses)
    coin_profit_loss = coin_overall_losses - (VIG * coin_overall_losses)

    print(f"w/l/t%: {handi_wins/1000}-{handi_losses/1000}-{handi_ties/1000}")
    print(f"handicapper winning {unit}: {handi_winning_days/1000}%, coin winning {unit}: {coin_winning_days/1000}%")
    print(f"handicapper overall record: {handi_overall_wins}-{handi_overall_losses} ({100*handi_overall_wins/(handi_overall_wins+handi_overall_losses):.2f} %)")
    print(f"handicapper profit: {handi_profit_loss:.2f}")
    print(f"coin overall record: {coin_overall_wins}-{coin_overall_losses}")
    print(f"coin profit: {coin_profit_loss}")
    

In [None]:
handi_vs_coin(7)

1. The handicapper beats the coin 49% of the time.
2. The coin beats the handicapper 31% of the time.
3. They tie 20% of the time.
4. The coin has a winning day 50% of the time.
5. The handicapper has a winning day 63% of the time.

Even though the coin has winning days half the time, it's still losing money long-term. Even though the handicapper is winning 56% of the time, they are having winning days 63% of the time.

### Getting Good Odds on Odd Goods

The fact that 7 is odd is significant. If there are an even number of bets per day, the proportion of winning days is going to change dramatically. Hitting 4/7 or better is going to be easier than 5/8 or better, or 4/6 or better. Almost half the time, someone who went 4/7 is going to lose that 8th bet and not have a winning day.

Let's see how this plays out over a range of games per day.

In [None]:
for x in range(3,9):
    print(f">>>>>>>>>> betting {x} games")
    handi_vs_coin(x)
    print("\n")

As the sample size increases, the handicapper wins more often against the coin. There is no such thing as luck, only small sample size.

The odd number of days is signficant. Both the handicapper and the coin look more skilled (gauged by number of winning days) when taking an odd number of bets. Even at 3 games a day, the skilled bettor is winning 59% of the days.

Let's look at a 49 game sample, roughly a week's worth of games, and 201 games, roughly a month.

In [None]:
handi_vs_coin(49, unit="weeks")

The coin is still beating the bettor 24% of the time, and having winning weeks 39% of the time. Even at this sample size, there's a significant difference between 49 games and 48 as far as % winning days.

In [None]:
handi_vs_coin(48, unit="weeks")

here's roughly a month of games. the coin is still putting up winning months 24% of the time, while losing massively.

In [None]:
handi_vs_coin(200, unit="months")

The handicapper still has losing months 16% of the time on 200 bets.

There's a nugget of betting psychology in this. If you bet every day and wanted to optimize for the number of days per week you have a winning day, you should take an odd number of bets. Even a coin can have winning days half the time.

No, this isn't one weird trick that statisticans hate. It all comes out in the wash. None of this changes the overall winning percentage or whether the bettor makes money in the long run, as you can see.

### The emotional support bet

Say the handicapper from the above example has 6 bets they want to make. Each will win 56% of the time. They want to have as many winning days as possible. What if they took a seventh bet by flipping a coin?

In [None]:
handi_vs_coin(6)

In [None]:
handi_vs_coin(6, emotional_support_bet=True)

TODO: fix these numbers

In [None]:
39031.31 / 46885.20

In [None]:
(60.895 - 46.312)/ 46.312

The strategy works! The bettor's winning percent drops to 55%, which cuts into profits but increases the number of winning days by 30%.


### Our pal, the binomial distribution
Why are 63% of days winning days for the skilled handicapper? Shouldn't it be 56%, to match their win rate? Where are those "free" winning days coming from? Shouldn't the coin have losing days more than 50% of the time with the vig figured in?

These might seem like silly, obvious questions, or they might not. Either way, stick with me. I swear there's a point to this.

The number of winning bets in a week comes from something called the binomial distribution. It tells us the probability of a certain number of wins out of a number of trials, given a certain rate of success. If you flip a fair coin 10 times, how likely is it that you get more than 7 heads?

The binomial is the same thing as what I was approximating above -- flipping a coin with the `rng.random()` function and adding up the binary results.

First, 7 coin flips (p=.5). the x axis is number of successes and the y axis is the probability of that number.

In [None]:
coin = scipy.stats.binom(GAMES_A_NIGHT, FAIR_COIN)

coin_df = pd.DataFrame({
    'x': [x for x in range(GAMES_A_NIGHT+1)],
    'y': [coin.pmf(x) for x in range(GAMES_A_NIGHT+1)]

})
plt.bar(coin_df.x, coin_df.y)


Notice how nice and symmetrical it is. going 3-4 and 4-3 are equally likely. The mean is clearly 3.5

What about the bettor, with a 56% chance of winning each bet??

In [None]:
bettor = scipy.stats.binom(GAMES_A_NIGHT, WIN_PCT)

binomial_df = pd.DataFrame({
    'x': [x for x in range(GAMES_A_NIGHT+1)],
    'y': [bettor.pmf(x) for x in range(GAMES_A_NIGHT+1)]

})
plt.bar(binomial_df.x, binomial_df.y)

The curve gets shifted to the right a little bit and now it's no longer symmetrical. Now the most likely outcome is 4. Since 4/7=57.1%, that makes sense. 2 and 6 are now roughly equally likely, instead of 2 and 5. though 2 is more likely than 6, and 3 is more likely than 5. 

If you think about it, it can't be symmetrical anymore. Our bell curve got shifted to the right because we increased the winning percentage. 

The distribution people think of, if they have ever thought about distributions, is the normal distribution.

We can plot the equivalent normal distribution over the top of the binomial outcomes. You can see, it's pretty close.  But there are some problems. 
1. the normal distribution is symmetrical and the binomial distribution isn't. 
2. the normal distribution has an infinite range -- for instance, it would give a non-zero chance to getting 8 heads in 7 coin flips. 
3. The normal distribution is continuous, which means it can give us the probability of getting 3.3 heads in 7 coin flips.

In [None]:
mean = 7 * .56
sd = np.sqrt(7 * .56 * (1-.56))

x = np.linspace(0, 7, 200)
plt.plot(x, scipy.stats.norm.pdf(x, mean, sd), color="r")
plt.bar(binomial_df.x, binomial_df.y)

Just looking at the graph above, you can probably convince yourself that if we had a huge number of trials, the binomial distribution will look more and more like the normal distribution. But there can be significant differences when the number of outcomes is small.

### Are you normal?

I think we have a cognitive bias towards thinking everything is a normal distribution (the classic bell curve), where the left tail and right tail are the same.  

That's true for average folks and stats newbies alike. It's great! You don't have to remember the difference between left-skewed and right-skewed, for one thing. And the equation for the normal distribution has $\pi$ in it. Yeah, it's your old buddy from middle school geometry class! Small world, isn't it?

The binomial distribution can be approximated by the normal distribution (or the poisson) under certain conditions. Approximations are fine when the number of trials are large, but prop bets are usually based on a fairly small number, where the difference can matter. Life is a game of inches and the inches are all around you.

### Being binomial doubles the chances of misunderstanding
Prop bets are where instead of betting on a team, you bet on an individual player's stats. For instance, whether they will get more or less than 7.5 rebounds. In order to understand these bets, we have to model them as a binomial distribution. They're not going to be symmetrical, so naive statistical intuition and the normal distribution will fail us.

To model it as a binomial, we determine the proability a player gets a rebound on the average possession (the rebounding rate), and the number of possessions the player is in the game for. 

What would that mean if you're thinking about taking a prop bet like Giannis over/under 9 rebounds? Just from a geometry standpoint, which is the slightly better deal in the area closest to the middle of the curve?

That's actually a trick question. I showed you a binomial distribution that was heavier on the left (making the under a possibly better value), but they can be heavier on the right as well.

Let's say we're looking at a prop bet on CJ McCollum over 3.5 rebounds. 

![cj mccollum prop bet from covers.com](img/cj.png)

We can pull his rebounds per 100 posessions rate from basketball-reference, then estimate the number of possessions by comparing it to his rebounds per game stat. Obviously, this is an extremely rough model, but we gotta start somewhere.

Notice this time that the binomial distribution is a little taller on the right -- 4 bigger than 2, 5 bigger than 1, etc. You have to squint pretty hard to see it as a normal distribution.

In [None]:
reb_per_100 = 5.3 # per 100 possessions numbers from bbref
reb_per_possession  = reb_per_100/ 100 
reb_per_game = 3.6
poss_per_game = round(100 * (reb_per_game/reb_per_100))

cj2 = scipy.stats.binom(poss_per_game, reb_per_possession)


cj2_df = pd.DataFrame({
    'x': [x for x in range(0,10)],
    'y': [cj2.pmf(x) for x in range(0,10)]

})
plt.bar(cj2_df.x, cj2_df.y)

the mean matches his rebounds per game, as expected.

In [None]:
cj2.mean()

In [None]:
# probability of going under
prob_under = sum(cj2_df[cj2_df.x < 3.5].y)
prob_under

Based on this simple model, there's a 51.1% chance of it going under. If it was 50% then the fair price would be be the same regardless of whether you take the over or the under, but if it's imbalanced, the more likely side should get a smaller payout when they do win. This is sort of like paying a higher vig on the bet. You have to put up more money than you win.

[TODO: fix this awful prose]

We can convert between the probability and the American style money line. In this case, a break-even price for the under would be -104, and correspondingly +104 for the over.

In [None]:
def convert_proba(proba):
    
    if proba > .5:
        money_line = -100 * (proba/(1-proba))
    else:
        money_line = 100 * ((1-proba) / proba)
    return round(money_line)


convert_proba(prob_under)

So, under 3.5 rebounds at -105 is a pretty fair line, based on our assumptions. We'd still lose money over the long run, though, because of the tiny difference between -104 and -105. If the line were instead +105 for the under, it would be a bet we could expect to make money on.

Even small changes to our assumptions can drastically change how good of a deal the bet is. In this game, he was playing against Chicago, which is a very fast paced team.  If we decided that CJ will play 10% more possessions than usual, due to faster pace, or a teammate being injured, or some other factor, the model changes:

In [None]:
reb_per_100 = 5.3 # per 100 possessions numbers from bbref
reb_per_possession  = reb_per_100/ 100 
reb_per_game = 3.6

poss_per_game = round(1.1 * 100 * (reb_per_game/reb_per_100)) # increase number of possessions by 10%

cj3 = scipy.stats.binom(poss_per_game, reb_per_possession)


cj3_df = pd.DataFrame({
    'x': [x for x in range(0,10)],
    'y': [cj3.pmf(x) for x in range(0,10)]

})
plt.bar(cj3_df.x, cj3_df.y)

In [None]:
new_proba = sum(cj3_df[cj3_df.x < 3.5].y)

new_proba

In [None]:
convert_proba(new_proba)

Under these assumptions, the fair price for under 3.5 rebounds would be +131, making the -105 offer a terrible deal.


To rationally be betting props you'd have to be doing all this binomial nonsense to figure out if the vig is fair or not on every single bet. It's a multiple step calculation process with lots of assumptions. 

I'd guess there are apps and websites that can help, but it's still betting on math homework more than it is betting on sports.

It's not really a bet on whether CJ McCollum will have a good game tonight. He played poorly in the last game but got 4 rebounds, so the over would have paid. McCollum got 38 points the game before that, and only got 3 rebounds. He scored 50 points a month ago and got 3 rebounds.

It's really a bet on whether the player gets more minutes than average, the team plays at a certain pace, what the opponent's rebounding rate is, etc. 

If you're a sicko like me, that could be fun, but do you think people are doing this big math problem every time they place a prop bet? 

Of course not. They're probably just taking the over regardless of price, because overs are more fun. Rooting against CJ McCollum getting his 4th rebound at the end of a meaningless game where he already scored 50 points is pathological.

The sportsbooks aggressively push overs as well. Here's a screenshot from the ESPN Bet sportsbook:

![a screen only offering over prop bets](img/prop_overs2.png)

This is their main NBA betting page. They offer the ability to pick the overs on prop bets, but not the unders. You have to dig deep if you wnat to take them. I think they have legally allow bets on both sides, but they sure don't make it easy.

If this is your first day in a capitalist country, welcome. If not, you should be suspicious about which side is a better value if they're only pushing one side of these bets.

The vig on prop bets is high enough on these bets that I don't know if savvy bettors could make money through arbitrage opportunities, and keep the lines honest. What I've seen is that when a line on a prop bet moves, the books also increase the size of the vig.

The big idea I'm trying to convey is that one side of a prop bet will pretty much always be *slightly* more than 50%. Our naive mental model of probabilities assumes that things are symmetrical, but it's not, because the results have to be whole numbers. CJ McCollum is going to get 3 or 4 rebounds tonight, not 3.6. 

The smaller the counts, the crazier the vig. Here's one for Brook Lopez over/under 1.5 assists.

![brook lopez assists](img/brolo.png)

+155/-190 isn't even close to 50/50. When you're betting the point spreads, no bet is really all that crazy since both sides should win about 50% of the time. But if the odds are imbalanced, that changes both the math and psychology of betting. More on that later.

In [1]:
def convert_line(line):
    if line < 0:
        return abs(line)/(abs(line)+100)
    else:
        return 100/(100+line)
    
convert_line(155)

0.39215686274509803

In [2]:
convert_line(-190)

0.6551724137931034

The sum of these two probabilities is greater than one, which is how the sportsbook makes money. For instance, over 1.5 +155 bet might only win 35% of the time. However, it is priced like it will win 39% of the time. That difference is essentially the vig - the percent of the amount bet that the sportsbook will have at the end of the night, assuming equal amounts of money on both sides.

Bet365 has the line at +155/-190, which is roughly equivalent to a -110 vig.

In [5]:
convert_line(155) + convert_line(-190) - 1

0.047329276538201404

However, Caesar's has it at +139/-192, which is much more lucrative for the sportsbook.

In [6]:
convert_line(139) + convert_line(-192) - 1

0.0759442884163466

This makes it equivalent to a bet at -115 vig.

$winPct - ((1-winPct) * vig) = -7.59$

$ 50 - (50 * vig) = -7.59$

$ vig = 57.59/50 = 1.15$

Next up: Which teams get the most money bet on them? Does this tell us anything about how sports betting works?