# Bayesian stats

All the content of this notebook has been mostly extracted from the book "Think Bayes" 
Bayesian Statistics Made Simple by **Allen B. Downey**

## Conjoint probability

**Conjoint probability** is a fancy way to say the probability that two things are true. I write **p(A and B)** to mean the probability that A and B are both true.
If you learned about probability in the context of coin tosses and dice, you might have learned the following formula:

    p(A and B) = p(A) p(B)           WARNING: not always true

For example, if I toss two coins, and A means the first coin lands face up, and B means the second coin lands face up, then p(A) = p(B) = 0.5, and sure enough, p(A and B) = p(A) p(B) = 0.25.

But this formula only works because in this case **A and B are independent**.

In general, the probability of a conjunction is
    
    p(A and B) = p(A) p(B|A)
    
## The cokie problem

We’ll get to Bayes’s theorem soon, but I want to motivate it with an example called the cookie problem.

Suppose there are two bowls of cookies. 
* Bowl 1 contains 30 vanilla cookies and 10 chocolate cookies. 
* Bowl 2 contains 20 of each.

Now suppose you choose one of the bowls at random and, without looking, select a cookie at random. The cookie is vanilla. What is the probability that it came from Bowl 1?


This is a conditional probability; we want p(Bowl 1|vanilla), but it is not obvious how to compute it. If I asked a different question—the probability of a vanilla cookie given Bowl 1—it would be easy:

    p(vanilla|Bowl 1) = 3/4
    
Sadly, p(A|B) is not the same as p(B|A), but there is a way to get from one
to the other: Bayes’s theorem.

**Bayes’s theorem**

At this point we have everything we need to derive Bayes’s theorem. We’ll start with the observation that conjunction is commutative; that is

    p(A and B) = p(B and A) for any events A and B.

Next, we write the probability of a conjunction:

    p(A and B) = p(A) p(B|A)

Since we have not said anything about what A and B mean, they are interchangeable. Interchanging them yields:

    p(B and A) = p(B) p(A|B)


That’s all we need. Pulling those pieces together, we get 

    p(B) p(A|B) = p(A) p(B|A)
    
Which means there are two ways to compute the conjunction. If you have p(A), you multiply by the conditional probability p(B|A). Or you can do it the other way around; if you know p(B), you multiply by p(A|B). Either way you should get the same thing.
Finally we can divide through by p(B):

    p(A|B) = p(A) p(B|A) / p(B)

And that’s **Bayes’s theorem!** It might not look like much, but it turns out to
be surprisingly powerful.
For example, we can use it to solve the cookie problem. I’ll write B1 for the hypothesis that the cookie came from Bowl 1 and V for the vanilla cookie. Plugging in Bayes’s theorem we get

    p(B1|V) = p(B1) p(V|B1) / p(V)
    
* p(B1|V) probability of being Bowl 1 given that the cookie is vainilla (posterior)
* p(B1) prob of picking bowl 1 (0.5)
* p(V|B1) prob of taking a vanilla cookie from bowl 1 (30/40 = 3/4) (Prior)
* p(V) prob of taking any vanilla cookie ((30+20)/80 = 5/8)


    p(B1|V) = (0.5)*(3/4)/(5/8) = 3/5 = 0.6

In the context of Bayes’s theorem, it is natural to use a Pmf to map from each hypothesis to its probability. In the cookie problem, the hypotheses are B1 and B2. In Python, I represent them with strings:

In [19]:
from thinkbayes import Pmf
pmf = Pmf()
pmf.Set('Bowl 1',0.5)
pmf.Set('Bowl 2',0.5)

This distribution, which contains the priors for each hypothesis, is called the **prior distribution**. 

To update the distribution based on new data (the vanilla cookie), we mul- tiply each prior by the corresponding likelihood. The likelihood of drawing a vanilla cookie from Bowl 1 is 3/4. The likelihood for Bowl 2 is 1/2.

In [20]:
pmf.Mult('Bowl 1', 0.75) # prob de sacar una vanilla cokie del bowl1 es 30/40 = 3/4
pmf.Mult('Bowl 2', 0.5) # prob de sacar una vanilla cokie del bowl2 es 20/40 = 2/4 = 1/2

Mult does what you would expect. It gets the probability for the given hy- pothesis and multiplies by the given likelihood.
After this update, the distribution is no longer normalized, but because these hypotheses are mutually exclusive and collectively exhaustive, we can renormalize:


In [21]:
pmf.Normalize()

0.625

The result is a distribution that contains the posterior probability for each hypothesis, which is called (wait now) the posterior distribution.
Finally, we can get the posterior probability for Bowl 1:

In [22]:
print pmf.Prob('Bowl 1')

0.6


And the answer is 0.6.