# Bayesian stats

All the content of this notebook has been mostly extracted from the book "Think Bayes" 
Bayesian Statistics Made Simple by **Allen B. Downey**



## The cokie problem

We’ll get to Bayes’s theorem soon, but I want to motivate it with an example called the cookie problem.

Suppose there are two bowls of cookies. 
* Bowl 1 contains 30 vanilla cookies and 10 chocolate cookies. 
* Bowl 2 contains 20 of each.

Now suppose you choose one of the bowls at random and, without looking, select a cookie at random. The cookie is vanilla. What is the probability that it came from Bowl 1?


This is a conditional probability; we want p(Bowl 1|vanilla), but it is not obvious how to compute it. If I asked a different question—the probability of a vanilla cookie given Bowl 1—it would be easy:

p(vanilla|Bowl 1) = 3/4
Sadly, p(A|B) is not the same as p(B|A), but there is a way to get from one
to the other: Bayes’s theorem.

In the context of Bayes’s theorem, it is natural to use a Pmf to map from each hypothesis to its probability. In the cookie problem, the hypotheses are B1 and B2. In Python, I represent them with strings:

In [19]:
from thinkbayes import Pmf
pmf = Pmf()
pmf.Set('Bowl 1',0.5)
pmf.Set('Bowl 2',0.5)

This distribution, which contains the priors for each hypothesis, is called the **prior distribution**. 

To update the distribution based on new data (the vanilla cookie), we mul- tiply each prior by the corresponding likelihood. The likelihood of drawing a vanilla cookie from Bowl 1 is 3/4. The likelihood for Bowl 2 is 1/2.

In [20]:
pmf.Mult('Bowl 1', 0.75) # prob de sacar una vanilla cokie del bowl1 es 30/40 = 3/4
pmf.Mult('Bowl 2', 0.5) # prob de sacar una vanilla cokie del bowl2 es 20/40 = 2/4 = 1/2

Mult does what you would expect. It gets the probability for the given hy- pothesis and multiplies by the given likelihood.
After this update, the distribution is no longer normalized, but because these hypotheses are mutually exclusive and collectively exhaustive, we can renormalize:


In [21]:
pmf.Normalize()

0.625

The result is a distribution that contains the posterior probability for each hypothesis, which is called (wait now) the posterior distribution.
Finally, we can get the posterior probability for Bowl 1:

In [22]:
print pmf.Prob('Bowl 1')

0.6


And the answer is 0.6.