# Exercise 2.1

In The Bayesian framework I said that the solution to the cookie problem generalizes to the case where we draw multiple cookies with replacement.

But in the more likely scenario where we eat the cookies we draw, the likelihood of each draw depends on the previous draws.

Modify the solution in this chapter to handle selection without replacement. Hint: add instance variables to Cookie to represent the hypothetical state of the bowls, and modify Likelihood accordingly. You might want to define a Bowl object.

- Bowl 1 has 30 vanilla cookies and 10 chocolate cookies
- Bowl 2 has 20 vanilla cookies and 20 chocolate cookies

### Theory

Probability of picking at random a cookie is $\frac{50}{80}=\frac{5}{8}$

We can check this with the law of total probabilities. Either bowl can be picked with 50% chance.

$P(V)=\sum_i{P(V|B_i)P(B_i)} = P(V|B_1)P(B_1) + P(V|B_2)P(B_2) = \frac{30}{40}\frac{1}{2}+\frac{20}{40}\frac{1}{2}=\frac{30+20}{40+40}$

Where V denotes a vanilla cookie and $B_i$ is the bowl

From the problem statement, what is the probability that a vanilla cookie comes from bowl 1

$P(B_1|V)=\frac{P(V|B_1)P(B_1)}{P(V)} = \frac{P(V|B_1)P(B_1)}{\sum_i{P(V|B_i)P(B_i)}} = \frac{(3/4)(1/2)}{(5/8)}=\frac{3}{5}$

and bowl 2

$P(B_2|V)=\frac{P(V|B_2)P(B_2)}{P(V)} = \frac{P(V|B_2)P(B_2)}{\sum_i{P(V|B_i)P(B_i)}} =\frac{(1/2)(1/2)}{(5/8)}=\frac{2}{5}$

For the problem at hand, we first pick a bowl at random, 50% each, pick at random a vanilla cookie, eat it and pick another vanilla cookie at random from the same bowl. What is the probability that both cookies come from bowl 1.

$P(B_1|V_1,V_2)=\frac{P(B_1, V_1, V_2)}{P(V_1, V_2)}$

$P(B_1, V_1, V_2)=P(V_2|B_1,V_1)P(B_1,V_1)=P(V_2|B_1,V_1)P(B_1|V_1)P(V_1)$

Now for the denominator it looks like we need to do a fair bit of calculations but that's not so bad. We either picked bowl 1 or 2, so we just have to calculate the same kind of probability as above but for both bowl.

$P(V_1,V_2)=\sum_{i \in {1,2}}{P(B_i,V_1,V_2)}=\sum_{i \in {1,2}}{P(V_2|B_i,V_1)P(B_i|V_1)P(V_1)}$

$P(B_1|V_1,V_2)=\frac{P(V_2|B_1,V_1)P(B_1|V_1)P(V_1)}{\sum_{i \in {1,2}}{P(V_2|B_i,V_1)P(B_i|V_1)P(V_1)}}=
\frac{P(V_2|B_1,V_1)P(B_1|V_1)}{\sum_{i \in {1,2}}{P(V_2|B_i,V_1)P(B_i|V_1)}} = 
\frac{\frac{29}{39}\frac{3}{5}}{\frac{29}{39}\frac{3}{5}+\frac{19}{39}\frac{2}{5}}=\frac{87}{125}$

$P(B_2|V_1,V_2)=\frac{38}{125}$

### Practice

That's all fun and done with the theory. Let's put that in practice. We just need to keep track of the cookies that we ate.

In [13]:
class Bowl(object):
    def __init__(self, **mix):
        self.mix = mix
        
    def update(self, data):
        self.mix[data] -= 1
        
    def prob(self, data):
        return float(self.mix[data])/sum(self.mix.values())

In [24]:
# %load code/cookie2.py
"""
This code was originally created by Allen B. Downey 
for "Think Bayes" and subsequently modified by me 
for the purpose of this exercise.
"""

from code.thinkbayes import Pmf

class Cookie(Pmf):
    """A map from string bowl ID to probablity."""

    def __init__(self, hypos):
        """Initialize self.

        hypos: sequence of string bowl IDs
        """
        Pmf.__init__(self)
        self.mixes = {}
        for i, hypo in enumerate(hypos):
            key = "Bowl {}".format(i+1)
            self.mixes[key] = hypo
            self.Set(key, 1)
        self.Normalize()

    def Update(self, data):
        """Updates the PMF with new data.

        data: string cookie type
        """
        for hypo in self.Values():
            like = self.Likelihood(data, hypo)
            self.Mult(hypo, like)
            self.mixes[hypo].update(data)
        self.Normalize()

    def Likelihood(self, data, hypo):
        """The likelihood of the data under the hypothesis.

        data: string cookie type
        hypo: string bowl ID
        """
        bowl = self.mixes[hypo]
        like = bowl.prob(data)
        return like



In [37]:
hypos = [
    Bowl(**dict(vanilla=30, chocolate=10)), 
    Bowl(**dict(vanilla=20, chocolate=20))
]
pmf = Cookie(hypos)

In [38]:
pmf.Update('vanilla')

In [39]:
for hypo, prob in pmf.Items():
    print hypo, "{:.2f}%".format(prob*100)

Bowl 2 40.00%
Bowl 1 60.00%


In [40]:
pmf.Update('vanilla')

In [42]:
for hypo, prob in pmf.Items():
    print hypo, "{:.2f}%".format(prob*100)

Bowl 2 30.40%
Bowl 1 69.60%


This bit of code is the practical version of the theory above. Doing both is an interesting exercise in solidifying your comprehension of the basic Bayes manipulations.

----

## Some more fun

While I'm at it, why not ask a different question but with the same setup.
1. Pick a bowl at random without identifying it
2. Choose a cookie at random from that bowl, observe what kind and eat it
3. Choose a bowl at random without identifying it
4. Choose a cookie at random from that second bowl

What is the probability that the second cookie will be vanilla? You never know from which bowl the cookies came. So I am looking for $P(V_2|V_1)$.

Let's do it with the law of total probabilities here as well

$P(V_2|V_1)=\frac{P(V_2V_1)}{P(V_1)}$

Now for the numerator it requires quite a bit of calculations. The idea is to calculate the probability that both cookies are vanilla wether they come from bowl 1 or bowl 2 independently. That mean that the first cookie can come from either the first or the second bowl and same thing for the second cookie. We have to consider all the combinations and that, my friend, is a pain.

I'll denote the cookie 1 coming from bowl $k$ by $B_{V_1;k}$ and similarly for cookie 2. So here we go

$P(V_1,V_2)=\sum_j{\sum_i{P(V_1,V_2,B_{V_1;i},B_{V_2;j})}}=\sum_j{\sum_i{P(V_2|V_1,B_{V_1;i},B_{V_2;j})P(B_{V_1;i}|V_1,B_{V_2;j})P(V_1|B_{V_2;j})P(B_{V_2;j})}}$

That is the full extension of the chain rule, but some of it is independent. For example $P(B_{V_1;i}|V_1,B_{V_2;j})$, knowing that the second vanilla cookie comes from the bowl $j$ is not providing any information regarding the probability of the first bowl being bowl $i$. So that leaves you with $P(B_{V_1;i}|V_1)$.

Same kind of logic for $P(V_1|B_{V_2;j})$

Reduced to its simplest form
$\sum_j{\sum_i{P(V_2|V_1,B_{V_1;i},B_{V_2;j})P(B_{V_1;i}|V_1)P(V_1)P(B_{V_2;j})}}$

$P(V_2|V_1)=\frac{P(V_2V_1)}{P(V_1)}=\frac{\sum_j{\sum_i{P(V_2|V_1,B_{V_1;i},B_{V_2;j})P(B_{V_1;i}|V_1)P(V_1)P(B_{V_2;j})}}}{P(V_1)}
=\sum_j{\sum_i{P(V_2|V_1,B_{V_1;i},B_{V_2;j})P(B_{V_1;i}|V_1)P(B_{V_2;j})}}$

$\frac{1}{2}\left(\frac{3}{5}\left(\frac{29}{39}+\frac{20}{40}\right)+\frac{2}{5}\left(\frac{30}{40}+\frac{19}{39}\right)\right) $

In [45]:
0.5*(3./5*(29./39+20./40) + 2./5*(30./40+19./39))

0.6205128205128205