# Bayes and the Cookie Problem

“Suppose you have two bowls of cookies. Bowl 1 contains 30 Vanilla cookies and 10 chocolate chip cookies. Bowl 2 contains 20 of each.
Now suppose you choose one of the bowls at random and, without looking, select a cookie at random. The cookie is Vanilla. What is the probability that it came from Bowl 1?”

Excerpt From: Allen B. Downey. “Think Bayes.” iBooks. 

$p(A\ and\ B) = p(B\ and\ A)$

we can write this as a conditional probability

$p(A\ and\ B) = p(A)p(B | A)$

but also:

$p(B\ and\ A) = p(B)p(A | B)$

But since we agreed they are the same we can write

$p(B)p(A|B) = p(A)p(B|A)$

Now we have a powerful equality that we can use a little bit of algebra to solve for all kinds of situations.  Bayes theorem is  simply solving the above for $P(A|B)$

## Bayes Theorem

$P(A|B) = \frac{P(A)P(B|A)}{P(B)}$

Now lets put this into action using the substitution that:  A = Bowl1 and B = Vanilla

Making the substutions gives us:

$P(bowl1|vanilla) = \frac{P(bowl1)P(vanilla|bowl1)}{P(vanilla)}$

The above question uses a conditional probability we want the probability of bowl1 given that the cookie is vanilla $P(Bowl1 | Vanilla)$   

But, What if we asked the question the other way around?  What is the probability that the cookie is vanilla given that the bowl is bowl1?  This is easy because we know the mix of cookies in bowl one is 30 + 10 = 40 cookies with a 30 out of 40 chance that we pick vanilla... So: $P(Vanilla | Bowl1) = .75$  


However $P(A|B)$ is not the same as $P(B|A)$.  

But we can get there using Bayes theorem.

So, what is $P(bowl1 | vanilla)$?   Lets put this into Bayes equation:  $P(bowl1 | vanilla) = \frac{P(bowl1)P(vanilla | bowl1)}{P(vanilla)}$

* P(bowl1) is the probability we choose bowl 1 which is .5
* P(vanilla|bowl1) is the probabiltiy of choosing vanilla given that we choose from bowl 1 which is .75
* P(vanilla) is the probability of choosing vanilla in general which since there are 50 vanilla and 30 chocoloate is 50/80 or .625


In [1]:
(.5 * .75) / .625

0.6

### The diachronic interpretation

The probability of some Hypothesis given some body of Data

$P(H|D) = \frac{P(H)P(D|H)}{P(D)}$


* P(H) - probability of the hypothesis before we see data **prior**
* P(H|D) - What we want to know/compute **posterior**
* P(D|H) - The probability of the data under the hypothesis **likelihood**
* P(D) - The probability of the data under any hypothesis **normalizing constant**


For example the prior in our cookie situtaion is that it is bowl 1 which before we know anything is 50-50.

### The Montey Hall Problem

* [Big Deal of the Day](https://www.youtube.com/watch?v=T5QYTrDReTo)
* [Explained](https://www.youtube.com/watch?v=mhlc7peGlGg)



#### A quick simulation

In [4]:
# Always Keep the original door
import random
wins = 0
losses = 0
trials = 1000000
for i in range(trials):

    winner = random.randint(0,2)
    doors = [False, False, False]
    doors[winner] = True
    
    pick = random.randint(0,2)
    
    if doors[pick]:
        wins = wins + 1
    else:
        losses = losses + 1

print("Total Wins = ", wins)
print("Total Losses = ", losses)
print("Winning Percentage = ", float(wins) / trials)


Total Wins =  333517
Total Losses =  666483
Winning Percentage =  0.333517


In [5]:
# this loop illustrates the case where the contestant always swaps
# In this case the only time you lose is if you picked the right door the first time.
wins = 0
losses = 0
trials = 1000000
for i in range(trials):

    winner = random.randint(0,2)
    doors = [False, False, False]
    doors[winner] = True
    
    pick = random.randint(0,2)
    
    if doors[pick]:
        losses = losses + 1
    else:
        wins = wins + 1

print("Total Wins = ", wins)
print("Total Losses = ", losses)
print("Winning Percentage = ", float(wins) / trials)


Total Wins =  666013
Total Losses =  333987
Winning Percentage =  0.666013


### A Bayesian Classifier

* Classic example is Spam Filtering
* https://en.wikipedia.org/wiki/Naive_Bayes_spam_filtering


## Asking the right question

* What is the probability that this message is spam given that it contains a particular word.  For example:  What is the probability that this message is spam if it contains the word viagra?

$ P(Spam|viagra) = \frac{P(Spam)P(viagra|Spam)}{P(viagra)}$


* But emails contain lots of words....


$p = \frac{p_1p_2...p_n}{p_1p_2p_n + (1-p_1)(1-p_2)...(1-p_n)}$

Where

* $p_1$ is the probability that the message is spam given that it contains word 1
* $p_2$ is the probability that the message is spam given that it contains word 2
* $p_n$ is the probability that the message is spam given that it contains word n

* $(1-p_1)$ is the probability that the message is not spam given word 1
