# An Illustration of the Gambler's Fallacy
### Flip a fair coin a large number of times.  Early on there may seem to be an excess of Heads or an excess of Tails.  For example, maybe you observe:  
$$H, T, T, H, H, H, T, H, H, H, H, H, H, T, H, T, H, T, T, T, H, H, H, H, H, . . .   $$
### It looks like an excess of Heads here in the first dozen or two flips.  If the game is to correctly guess H or T, one might be tempted to start betting steadily on T because T "should" be more likely to come up next. 
### That is the gambler's fallacy-- the belief that we should expect a similar excess of Tails to balance out this apparent early excess of Heads.  Clearly H is in the lead right now, so at some point it should flip and T should take the lead, right?  And so for large number of flips, H and T should each be in the lead about 50% of the time, no?  But that is not so   . . . 

In [76]:
import sys ; import os; import random ; from datetime import datetime
import numpy as np ; np.random.seed(datetime.now().microsecond)

In [77]:
# specify a large number of times to flip the coin
T = 1e6

# flip the coin T times
flips = np.random.randint(0,2,size=T).tolist()

# show the first several flips; declare "0" to be heads, "1" to be tails
print("The first several flips are: ", flips[:11])

The first several flips are:  [0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1]


In [78]:
# for large T the number of heads vs. number of tails should be nearly 50-50
perc_heads = 100.*(1.-sum(flips)/T) 
print("Percentage heads = %.3f%%" %          perc_heads   )
print("Percentage tails = %.3f%%" % ( 100. - perc_heads ) )

Percentage heads = 50.025%
Percentage tails = 49.975%


In [79]:
# No surprise so far, that is just as it should be . . . 

In [80]:
# Now it gets more interesting.  Determine at each time t = 0, 1, 2, ... , T 
# whether heads or tails was IN THE LEAD.  After going through all the flips,
# determine how much out of the total time was heads or tails in the lead.
n_heads = 0
n_tails = 0

# lists for keeping track of the race
heads_count = list()
tails_count = list()
leader = list()

for flip in flips:
    
    if flip == 0:
        n_heads += 1   # count a zero as heads
    else:
        n_tails += 1
    
    # keep track of number of heads and tails after each flip
    heads_count.append(n_heads)
    tails_count.append(n_tails)
        
    # handle 0th flip to start leader list
    if n_heads > n_tails:
        leader.append(0)
    elif n_tails > n_heads:
        leader.append(1)
    else:
        leader.append(leader[-1])

# interim results (comment out for large T!)
# print("flips       = ", flips)
# print("heads_count = ", heads_count, end='\n\n')
# print("flips       = ", flips)
# print("tails_count = ", tails_count, end='\n\n')
# print("leader      = ", leader,end='\n\n')

# compute the percentage of time tails spent as the leader
tails_in_the_lead = 100. * sum(leader) / T

print("Heads was in the lead for %.1f%% of the time" % (100.-tails_in_the_lead))
print("Tails was in the lead for %.1f%% of the time" %       tails_in_the_lead )

Heads was in the lead for 92.4% of the time
Tails was in the lead for 7.6% of the time


### Re-run several times-- usually, one will be in the lead the vast majority of the time.  Almost never will you find that Heads and Tails each spend about 50% of the time in the lead.