### The Euro Coin problem

Belgian euro coin spun 250 times. Head 140 times, tails 110 times.
One stats lecturer claimed:   
"Looks suspicious to me, if it was unbiased, the chance of getting such an extreme result would be less than 7%"

So, do these data give evidence that the coin is biased rather than fair? 

First, estimate the probability that the coin lands face up. Second, evaluate whether the data support the hypothesis that the coin is biased.

In [1]:
import os
import sys
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)
    
from thinkbayes import Pmf, Suite, Percentile, CredibleInterval
from thinkplot import Pmf as Plot_Pmf 
from thinkplot import Show as Plot_Show 

The likelihood function is relatively easy. If $H_x$ is true, the probability of heads is x/100 and the probability of tails is 1-x/100.

In [2]:
class Euro(Suite):
    def Likelihood(self, data, hypo):
        x = hypo
        if data == 'H':
            return x/100.0
        else:
            return 1 - x/100.0
    

In [3]:
# Make a suite of hypotheses. Each is the same. 100 uniform priors.
suite = Euro(range(0, 101))

# We make some data to update it with. 
dataset = 'H' * 140 + 'T' * 110

print(dataset)

HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT


In [4]:
# Update all of the 100 hypotheses, one at a time, with each piece of data. i.e. update with H, then another H, so on...
for data in dataset:
    suite.Update(data)

# The posterior results are shown below

<img src="thinkbayeseuro.png">

Once each of the 100 prior hypotheses have received all of the data, it gives 100 posterior values. 

### Summarizing the posterior

Again, there are several ways to summarize the posterior distribution. 
One option is to find the most likely value in the posterior distribution.  
There is a function in thinkbayes.py

In [5]:
def MaximumLikelihood(pmf):
    """Returns the value with the highest probability."""
    prob, val = max((prob,val) for val, prob in pmf.Items())
    return val


In [6]:
b = MaximumLikelihood(suite) # takes the probability mass function as the argument
print (b)

56


Gives 56, which is also the observed % of heads, 140/250 = 56%.   
This suggests (correctly) that the observed percentage is the maximum likelihood estimator for the population

We might also summarize the posterior by computing the mean and median

In [9]:
print('Mean', suite.Mean())

# Returns the value that corresponds to percentile p. Percentil(suite, p)
print('Median', Percentile(suite, 50))

Mean 55.952380952380956
Median 56


Finally, we can compute a credible interval

In [8]:
# Computes the central credible interval
print('CI', CredibleInterval(suite, 90))

CI (51, 61)


Back to the question - is the coin fair? With a posterior credible interval not including 50%, it suggests not.
But this this does not answer the question of whether the data can give evidence that the coin is biased rather than fair. To answer that we have to be more precise about what it means to say that data consitute evidence for a hypothesis - subject of the next chapter. 

### One Last Thing

Since we want to know whether the coin is fair, it might be tempting to ask for the probability that x is 50%

In [10]:
print(suite.Prob(50))

0.02097652612954468


This value is almost meaningless - the decision to evaluate 101 hypotheses was arbitrary. We could have divided the range into more or fewer pieces, and if so, the prob for any given hypothesis would be greater or less.

### Swamping the priors

It is reasonable to choose a prior that gives higher probability to values of x near 50% and lower probability to extreme values.

As an example, constructing a triangular prior.

<img src="thinkbayeseuro2.png">

In [11]:
def TrianglePrior():
    suite = Euro()
    for x in range(0,51):
        suite.Set(x,x)
    for x in range(51, 101):
        suite.Set(x, 100-x)
    suite.Normalize()

Even with substantially different prior, the posterior distributions are very similar (practically the same).
This is an example of swamping the priors: with enough data, people who start with different priors will tend to converge on the same posterior

### The Beta distribution

This optimation can solve problems even faster.   
So far we have used a Pmf object to represent a discrete set of values for x. This time we will use a continuous distribution: the beta distribution.

The beta distribution is defined on the interval 0 - 1, so it is a natural choice for describing proportions and probabilities.

If you use a binomial likelihood function to update your prior, the beta distribution is a **conjugate prior**. This is when prior and posterior distributions are in the same family.

If it is a beta-distribution you can update with **two** additions.

Taking advantage of this using thinkbayes.py:

In [12]:
class Beta(object):
    def __init__(self, alpha=1, beta=1):
        self.alpha = alpha
        self.beta = beta

# By default __init__ makes a uniform distribution. Update performs a Baysian update:
    def Update(self, data):
        heads, tails = data
        self.alpha += heads
        self.beta +=tails
        
# data is a pair of integers representing the number of heads and tails.

So, we have yet another way to solve the Euro problem:

In [15]:
from thinkbayes import Beta

beta = Beta()
beta.Update((140,110))
# Beta also provides Mean, which computes a simple function of alpha and beta:
print(beta.Mean())

0.5595238095238095


This gives a posterior mean 56%, which is the same result we got from using Pmfs.

Beta also provides EvalPdf, which evaluates the PDF of the beta distribution