# Chapter 2: 
___

### Distributions

In [1]:
from thinkbayes import Pmf  # supporting library of the book

In [2]:
pmf = Pmf()  # creates an empty probability mass function

for x in range(1, 7):
    pmf.Set(x, 1/6.0)  # assigns a probability to a value

In [3]:
list(pmf.keys())

[1, 2, 3, 4, 5, 6]

In [4]:
pmf = Pmf()

for w in 'hello how are you my friend truth is you are pretty my friend'.split():
    pmf.Incr(w, 1)  # increases "probability" associated with each word by 1.
                    # actually it is the frequency (a non-normalized probability)

In [7]:
pmf.Prob('you')  # returns the frequecy of the word

2

In [8]:
pmf.Normalize()

13

In [9]:
pmf.Prob('you')  # returns the probability of the word

0.15384615384615385

In the context of Bayesianism, it is natural to use a PMF to map a hypothesis to its probability.

PMF \ $ H_i => P(H_i)$

### The cookie problem

In [10]:
# info bowl 1: number of cookies per type
vanillas_b1 = 30
chocos_b1 = 10

# info bowl 2: number of cookies per type
vanillas_b2 = 20
chocos_b2 = 20

# 1) knowledge: I take a bowl at random, from which I take a cookie at random
# 2) New data: A vanilla cookie came out.

# 3) Inference: What is the probability that the vanilla cookie came from bowl 1?

In the cookie problem we have 2 hypothesis: 
- $H_1$ : vanilla cookie from bowl 1
- $H_2$ : vanilla cookie from bowl 2

In [33]:
pmf = Pmf()
# encode the prior distributions (before we know what cookie came out)
pmf.Set('Bowl_1', 0.5)  # H1
pmf.Set('Bowl_2', 0.5)  # H2

To update the prior distribution based on **new data** (i.e., we got a vanilla cookie) we multiply each prior by its corresponding **likelihood**:

In [34]:
tag_VANILLA = "vanilla"
tag_CHOCOLATE = "chocolate"

likelihood_H1 = vanillas_b1 / (vanillas_b1 + chocos_b1)
likelihood_H2 = vanillas_b2 / (vanillas_b2 + chocos_b2)

print(f"Likelihood vanilla cookie from bowl 1: {likelihood_H1}")
print(f"Likelihood vanilla cookie from bowl 2: {likelihood_H2}")

pmf.Mult('Bowl_1', likelihood_H1)
pmf.Mult('Bowl_2', likelihood_H2)

Likelihood vanilla cookie from bowl 1: 0.75
Likelihood vanilla cookie from bowl 2: 0.5


After this update, the distribution is no longer normalized, but since we are dealing with **MECE** hypothesis, we can re-normalize:

In [35]:
pmf.Normalize()  # should return 0.625

0.625

The result is a distribution that contains the **posterior probability** for each hypothesis:

In [37]:
pmf.Prob('Bowl_1'), pmf.Prob('Bowl_2')

(0.6000000000000001, 0.4)

Let's rewrite the previous code with classes, to make it more general:

In [38]:
class Cookie(Pmf):
    """ 
    PMF that maps hypotheses to their probabilities.
    Stores the priors and posteriors for each hypothesis given.
    """
    def __init__(self, hypos):
        """ Gives each hypothesis the same prior probability """
        Pmf.__init__(self)
        for hypo in hypos:
            self.Set(hypo, 1)
        self.Normalize()

In [39]:
# We have 2 hypotheses:
hypos = ['bowl_1', 'bowl_2']

pmf = Cookie(hypos)

In [40]:
pmf.GetDict()

{'bowl_1': 0.5, 'bowl_2': 0.5}

In [41]:
class Cookie(Pmf):
    """ 
    PMF that maps hypotheses to their probabilities.
    Stores the priors and posteriors for each hypothesis given.
    :param hypos:
    """
    mixes = {
        'bowl_1': {tag_VANILLA: likelihood_H1, tag_CHOCOLATE:1 - likelihood_H1},
        'bowl_2': {tag_VANILLA: likelihood_H2, tag_CHOCOLATE:1 - likelihood_H2}
    }
    def __init__(self, hypos):
        """
        Gives each hypothesis the same prior probability 
        :param hypos: sequence of string bowl IDs
        """
        Pmf.__init__(self)
        for hypo in hypos:
            self.Set(hypo, 1)
        self.Normalize()
        
    def Likelihood(self, data, hypo):
        """ 
        Returns likelihood of obtaning 'data' given that 'hypo' is true
        :param data: string cookie type
        :param hypo: string bowl ID
        """
        mix = self.mixes.get(hypo)  # from H, get mix of cookies for bowl of that H
        like = mix.get(data)  # get the likelihood of observing the data, if H is true
        return like
    
    def Update(self, data):
        """ 
        Takes some data and updates the probabilities, looping for each H 
        :param data: string cookie type
        """
        for hypo in self.Values():
            like = self.Likelihood(data, hypo)
            self.Mult(hypo, like)
        self.Normalize()

In [42]:
pmf = Cookie(hypos)

new_data = 'vanilla'

pmf.Update(data=new_data)

In [43]:
for hypo, prob in pmf.Items():
    print("Posterior for", hypo,":", prob)

Posterior for bowl_1 : 0.6000000000000001
Posterior for bowl_2 : 0.4


This method has the advantage that it generalizes well to other new data:

In [44]:
datapoints = [tag_VANILLA, tag_CHOCOLATE, tag_VANILLA]  # succesive extraction of cookies

for p in datapoints:
    pmf.Update(p)

In [45]:
for hypo, prob in pmf.Items():
    print("Posterior for", hypo,":", prob)

Posterior for bowl_1 : 0.627906976744186
Posterior for bowl_2 : 0.37209302325581395


<!-- . -->



### The Monty Hall Problem