# Problem: coin flips and tail probability 

Assume we flip a fair coin $n$ times. 

We expect to get in average $n/2$ heads and $n/2$ tails.

What is the probability that we get $3n/4$ heads or more? 

## Simulation approach

Let us first get some insights on this question by simulating the problem; let us first produce a possible outcome and count how many heads and tails we get 

In [1]:
import numpy as np
from math import exp

n = 100 # set n=40 for simulation to reduce computational time for the simulation approach 
# set n=100 when computing the bounds and the analytical value of the error probability
p = 1/2
tosses=np.random.choice( ['h','t'],p=[1-p,p], size=n)
heads=list(tosses).count('h')
print(tosses)
print(heads)

['t' 't' 't' 'h' 't' 't' 't' 't' 't' 't' 'h' 'h' 'h' 'h' 'h' 't' 'h' 'h'
 'h' 'h' 'h' 't' 't' 'h' 't' 'h' 'h' 't' 'h' 't' 't' 't' 't' 'h' 'h' 't'
 'h' 'h' 't' 'h' 'h' 't' 't' 't' 't' 't' 't' 't' 'h' 'h' 'h' 't' 'h' 'h'
 't' 't' 'h' 't' 't' 'h' 't' 'h' 't' 'h' 'h' 't' 'h' 't' 't' 't' 'h' 'h'
 'h' 't' 'h' 't' 't' 't' 't' 't' 't' 't' 'h' 'h' 'h' 'h' 'h' 't' 'h' 'h'
 't' 'h' 'h' 't' 'h' 'h' 't' 't' 't' 'h']
48


Let us now generate a large number of outcomes and count how many times the number of heads exceed $0.75*n$. How many runs do we need? 

In [7]:
runs = int(1e5) #number of runs 
count =0
for i in range(0,runs):
    tosses=np.random.choice( ['h','t'],p=[1-p,p], size=n)
    heads=list(tosses).count('h')
    count += (heads>= 0.75*n)

print("number of runs satisfying the constraint = ",count)
print("estimated probability = {:.2e}".format(count/runs)) 

number of runs satisfying the constraint =  0
estimated probability = 0.00e+00


We see that calculating this probability numerically is quite challenging and we need a lot of samples to get a results that is statistically significant

## Analytical evaluation of the tail probability 
Let us now try to evaluate this probability using the expression for the binomial PMF

In [5]:
from math import exp,ceil,comb
a=ceil(3*n/4)

prob =0

for i in range(a,n+1,1):
    prob += comb(n,i)
    
prob=prob/(2**n)
print("exact probability ={:.2e}".format(prob)) 

exact probability =2.82e-07


Note that this probability decreases rapidly as we increase $n$ 

A good rule of thumb is that, to estimate accurately a target probaility $\epsilon$ we need $100/\epsilon$ runs using the simulation approach

## Markov's inequality 
Let us now check the tightness of the bound obtained using Markov's inequality
Since $\mathbb{E}[X]=n/2$, the bound we get is $2/3$ independently of the value of $n$. 

In [11]:
print("estimated probability = {:.2e}".format(2/3))

estimated probability = 6.67e-01


This bound is very loose for this scenario, but is not surprising. After all, Markov's inequality exploits only our knowledge of the mean of $X$. Markov's inequality is an important tool to establish more sophisticated bounds.

## Chebyshev's inequality

Since $\mathbb{V}\text{ar}[X]= n/4$, Chebyshev's inequality implies that the tail probability can be upper-bounded by $4/n$. This bound now decreases with $n$, which is according to our observations. However, the bound is still rather loose 

In [12]:
print("estimated probability = {:.2e}".format(4/n))

estimated probability = 4.00e-02


## Chernoff's bound

In [3]:
from math import exp 

mu = n/2
delta = 1/2

ub1 = (exp(delta)/((1+delta)**(1+delta)))**mu
ub2 = (exp(-mu * delta**2 /3))
print('first Chernoff bound = {:.2e}'.format(ub1))
print('second Chernoff bound = {:.2e}'.format(ub2))

first Chernoff bound = 4.47e-03
second Chernoff bound = 1.55e-02


Although the bound decays exponentially with $n$, it is not very good for $n=100$ because of the constant multiplying $n$

## Chernoff's bound for independent $\text{Bern}(1/2)$ random variables

In [5]:
mu = n/2
delta = 1/2
ub = (exp(-mu * delta**2))
print('improved Chernoff bound = {:.2e}'.format(ub))

improved Chernoff bound = 3.73e-06


Note how much tighter this bound is!