## Computational Statistics

### Distributions

Count the number of times each word appears in a sequence

In [1]:
# This tells Python of that additional module import path. 
import os
import sys
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)
    
from bin.thinkbayes import Pmf

# Creates an instance of class Pmf (pmf) to represent the distribution of outcomes for a six-sided die:
# class Pmf inherits from _DictWrapper (an object which contains a dictionary)
pmf = Pmf()

# Set --> {1: 1/6.0, 2:1/6.0, 3:1/6.0....}
for x in [1,2,3,4,5,6]:
    pmf.Set(x,1/6.0)  # Set function is within the _DictWrapper class. So pmf inherits it.

In [18]:
# How to access values from pmf? You need to use the right methods. i.e.
# 
print (pmf.Values()) # just gives the keys. 
print (pmf.Items()) # gives the key: value pairs in the dictionary
print (pmf.Render()) # create items for plotting

dict_keys(['Bowl 2', 'Bowl 1'])
dict_items([('Bowl 2', 0.4), ('Bowl 1', 0.6000000000000001)])
<zip object at 0x7f57d2dae308>


In [20]:
# help(pmf) # A list of the available classes and methods.

#### The Cookie Problem

In [3]:
pmf = Pmf()

# Hypothesis B1 and B2 (Bowl 1 and Bowl 2).
# This is the prior distribution (contains the priors for each hypothesis)
pmf.Set('Bowl 1', 0.5) # p(B1)
pmf.Set('Bowl 2', 0.5) # p(B2)


To update the distribution based on new data (vanilla cookie) we multiply each prior by the corresponding likelihood.
Now we have new data - A vanilla cookie! - we can update each of B1 and B2, i.e. determining p(B1|D) and p(B2|D).      
So for B1, this would be:       
    p(B1|D) = prior\*Prob of Vanilla from B1/ Prob of Vanilla from either bowl                    
    p(B1|D) = p(B1)\*p(D|B1)/p(D)                   
    p(B1) = 1/2 (there are two bowls)                   
    p(D|B1) = 3/4 (ratio is 30:10 vanilla to choc)                      
    p(D) = 5/8 (80 cookies altogether in both bowls, 50 are vanilla)                        
    So:           
    posterior = (1/2*3/2)/(5/8)               
    
    
    p(B2|D) = prior*Prob of Vanilla from B1/ Prob of Vanilla from either bowl

The likelihood of drawing a vanilla cookie from Bowl 1 is 3/4 and Bowl 2 is 1/2.


In [4]:
# Mult get the probability for the given hypothesis and multiplies by the given likelihood
pmf.Mult('Bowl 1', 0.75)
pmf.Mult('Bowl 2', 0.5)

After this update, the distribution is no longer normalized, but because these hypotheses are mutally exclusive and collectively exhaustive, we can renormalize:

In [5]:
pmf.Normalize()

0.625

The result is a distribution that contains the posterior probability for each hypothesis, now called the POSTERIOR DISTRIBUTION

In [21]:
# Get the posterior probability for Bowl 1.
print (pmf.Prob('Bowl 1'))

print (pmf.Values())

0.6000000000000001
dict_keys(['Bowl 2', 'Bowl 1'])
