# The Cookie Problem

In chapter 1 the cookie problem was outlined. In this notebook we want to see if we can find a computational approach to solving the problem.

Let's recap for revision:

Consider having two bowls, A and B, each with its own probability of drawing either a vanilla(v) or chocolate(c) cookie as shown in the table below,

bowl | vanilla | chocolate
--- | --- | --- 
A  | 30 | 10
B  | 20 | 20

If you pick a bowl at random and choose a vanilla cookie, what bowl is most likely?

If we use the table method from the end of chapter 1, we get a table like this

hypothesis | prior | liklihood | prior \* liklihood | posterior
--- | --- | --- | --- | ---
A  | 1/2 | 3/4 | 3/8 | 0.6
B  | 1/2 | 2/4 | 2/8 | 0.4

Where the total liklihood is 3/8 + 2/8 = 5/8

## My own solution code

Although it is nice that Prof. Downey provides us with a package we can use. I also like to create my own messy code!

In [1]:
import pandas as pd

# Create set of bowls
bowl_list =  ["Bowl_A", "Bowl_B"]

# Create initial dataframe
df_dict = {"Bowls": bowl_list,
           "Priors": [1/len(bowl_list) for i in bowl_list]}

df_bowls = pd.DataFrame(df_dict)

df_bowls

Unnamed: 0,Bowls,Priors
0,Bowl_A,0.5
1,Bowl_B,0.5


In [2]:
# Get the liklihood of drawing from each bowl
df_bowls["Liklihoods"] = pd.Series([3/4, 1/2])

df_bowls

Unnamed: 0,Bowls,Priors,Liklihoods
0,Bowl_A,0.5,0.75
1,Bowl_B,0.5,0.5


In [3]:
# Get the unnormalised priors
df_bowls["P_times_L"] = df_bowls.Priors * df_bowls.Liklihoods

df_bowls

Unnamed: 0,Bowls,Priors,Liklihoods,P_times_L
0,Bowl_A,0.5,0.75,0.375
1,Bowl_B,0.5,0.5,0.25


In [4]:
# Get the total liklihood
tot_liklihood = df_bowls.P_times_L.sum()

df_bowls["Posterior"] = df_bowls["P_times_L"] / tot_liklihood

df_bowls

Unnamed: 0,Bowls,Priors,Liklihoods,P_times_L,Posterior
0,Bowl_A,0.5,0.75,0.375,0.6
1,Bowl_B,0.5,0.5,0.25,0.4


We can see that some quick computations with pandas can help us to create the table we are looking for. We could also easily add a third bowl if we wanted and it would be pretty easy as long as we knew the probability of drawing a vanilla cookie from that bowl.

## Book Solution

Above is my own code where I think about how I would solve this problem myself by hand. Beneath is the solution given by Prof. Downey:

In [5]:
from thinkbayes2 import Pmf

# Create instance of pmf
pmf = Pmf()
pmf.Set('Bowl A', 0.5)
pmf.Set('Bowl B', 0.5)  # Note that these contain the prior distributions

pmf

Pmf({'Bowl A': 0.5, 'Bowl B': 0.5})

Now that we've set our prior distributions, it's time to update those priors with the likihoods of drawing a vanilla cookie.

In [6]:
# Update priors with new data that we drew vanilla
pmf.Mult('Bowl A', 3/4)
pmf.Mult('Bowl B', 1/2)

pmf

Pmf({'Bowl A': 0.375, 'Bowl B': 0.25})

In [7]:
0.375 / (0.375+0.25)

0.6

However, now the values aren't normalised, we need to normalise them now in order to account for the fact that we could draw the vanilla cookie from the other jar.

We're allowed to do this step because our hypotheses are mutually exclusive -you cant draw from a little of A and a little of B at the same time- and they are collectively exhaustive, there is no mysterious bowl C.

In [8]:
# Normalise data
pmf.Normalize()

print(pmf.Prob("Bowl A"))

0.6000000000000001


We can see here that we are still using the table method for above, but we've got a simple version in code now!

## The Bayesian Famework 

Once again, the code here is from the book but I want to implement it because it's fun.

Prof. Downey wants to create a general cookie class that we can update as necessary.

In [9]:
class Cookie(Pmf):
    """
    Creates general class for the cookie problem.
    """
    
    mixes = {'Bowl A': dict(vanilla=0.75, chocolate=0.25),
             'Bowl B': dict(vanilla=0.5, chocolate=0.5)}
    
    def __init__(self, hypos):
        Pmf.__init__(self)  # This has to be done to initialise the Pmf part of the class
        for hypo in hypos:
            self.Set(hypo, 1)  # Note that his is possible because we are inheriting from the Pmf class
        self.Normalize()
        
    def Update(self, data):
        """
        Updates liklihoods based on new data and normalises automatically.
        """
        for hypo in self.Values():
            like = self.Likelihood(data, hypo)
            self.Mult(hypo, like)
        self.Normalize()
        
    def Likelihood(self, data, hypo):
        mix = self.mixes[hypo]
        like = mix[data]
        return like

In [10]:
# Define our hypothesis
hypos = ["Bowl A", "Bowl B"]

# Create instance of our class 
cookies = Cookie(hypos)

In [11]:
# Update our bowl based on vanilla cookie
cookies.Update('vanilla')

# Print posterior probability for each hypothesis
[print(hypo, prob) for hypo, prob in cookies.Items()]

Bowl A 0.6000000000000001
Bowl B 0.4


[None, None]

The most fun part about this example is that as we continue to draw vanilla cookies, the probability that it comes from A rises since it had more cookies to start with. Which is pretty interesting.