# Introduction to Bayesian Statistics

---

### LEARNING OBJECTIVES

After this lesson, you will be able to:

- Derive and explain Bayess theorem
- Explain the components of the Bayesian "world view" -- posterior, prior, likelihood

## Frequentist vs. Bayesian

**Frequentists** believe the "true" distribution is fixed (and not known). We can infer more more about this "true" distribution by engaging in sampling, testing for effects, and studying relevant parameters of the population.

**Bayesians** believe that data informs us about the distribution, and as we receive more data our view of the distribution can be updated, further confirming or denying our previous beliefs (but never in certainty).

---


### Interpretations of probability

**FREQUENTIST PROBABILITY** 

Probability is the true number of "successes" or "positive ocurrences" measured across the hypothetical infinite number of samples/events/trials:

### $$p = \lim_{n \to +\infty} \frac{k}{n}$$

where:
$p$ is the probability of an occurance.
$k$ is the number of occurances.
$n$ is the number of events.

---



### A review of (Bayesian) terminology 

**Probability**: a number between 0 and 1, inclusive, representing "a degree of belief in a fact or prediction".

**Conditional probability**: a probability of something given some background information; the conditional probability of A given that B is true is written $P\left(\;A\;|\;B\;\right)$

**Marginal probability**: the (non-conditional) probability of something occurring, $P\left(\;B\;\right)$

**Joint probability**: the probability that two things are true; joint probability of A and B is written $P\left(\;A\;\cap\;B\right)$



**BAYESIAN PROBABILITY**

Probability is a representation of our uncertainty given what we know and believe to be true. Given a number of observed positive occurances over a number of events *and our prior belief about the true probability of positive occurances,* what is the *distribution of the true probability*?

### $$P\left(\;true\;|\;observed\;\right) = \frac{P\left(\;observed\;|\;true\;\right)}{P(\;observed\;)} P\left(\;true\;\right)$$

where:

$P\left(\;true\;|\;observed\;\right)$ is the **posterior probability** or **conditional probability**. This is the probability of an occurence given what we observed.

$P\left(\;observed\;|\;true\;\right)$ is the **likelihood,** which is the probability of what we observed  given our prior belief about the probability of occurance. 

${P(\;observed\;)}$ is the **marginal probability** of the observed data. 

$P\left(\;true\;\right)$ is the **prior probability** belief. It is what you thought the probability was before observing the events.

---

## Bayes' theorem

Some of you might recognize the above formula as Bayes' theorem. Typically Bayes' theorem is written:

### $$P\left(\;A\;|\;B\;\right) = \frac{P\left(\;B\;|\;A\;\right)P\left(\;A\;\right)}{P(\;B\;)}$$

Where:

$A$ and $B$ are anything that take probabilities (which is essentially everything). $P(B|A)$ and $P(A|B)$ are the probabilities of $B$ conditional on $A$ and vice versa.



This is just another way of writing:

### $$P\left(\;A\;\right)P\left(\;B\;|\;A\;\right) = P\left(\;B\;\right)P\left(\;A\;|\;B\;\right)$$

Which is derived from the fact that:

### $$P\left(\;A\;\cap\;B\right) = P\left(\;A\;\right)P\left(\;B\;|\;A\;\right) = P\left(\;B\;\right)P\left(\;A\;|\;B\;\right)$$

Where $P\left(\;A\;\cap\;B\right)$ is the probability of $A$ *and* $B$.

---

### Denominator of Bayes' theorem: the "total probability"

![](./assets/images/output_27_0.png)
---

In the picture, each $A_1,..., A_5$ includes a piece of the center oval. In this example the oval represents $B$.

Basic probability defines the following relation: $$P(A|B) = \frac{ A \cap B }{B}$$ 

Intuitively, the relation indicates that $P(A|B)$ is a ratio of the part of A that is common with B, *over the entirety of $B$*. 

Therefore, **the total probability can be thought of as the exhaustive sum of all probabilities on sets that share elements with B**. This equals simply the probability of B in our set of events.

So what is the purpose of the total probability with respect to the rest of Bayes formula? **In essence, it "normalizes" the numerator into a quantity between 0 and 1,** ensuring the left side of the formula is a probability.

---

### Solving probability using Bayes' theorem is easy when you know $P(B)$

Let's say we have two coins. Coin **FAIR** and coin **RIGGED**

    coin FAIR has a 50% chance of flipping heads.
    coin RIGGED has 99% chance of flipping heads.
    
Your friend chooses one of the two coins at random. He flips the coin and gets heads. 

What is the probability that the coin flipped was **FAIR**?

> Check: what are the point probabilities for the prior, likelihood, and marginal probability fo the data?

In [None]:
import numpy as np

# Our hypothesis is our belief that the coin flipped 
# was fair before we saw the outcome. 
# 0.5 since he chose at random.
hypothesis_fair = 0.5 # this is our prior.

# probability that we would get heads given our hypothesis 
# was true, that the coin is the fair one:
prob_flip_given_fair = 0.5 # this is the likelyhood

# total probability of getting heads:
# (0.99 + 0.5) / 2
prob_heads = (149./200.) # this is the marginal

# solve for the probability our hypothesis is true given the flip:
hypothesis_true = (prob_flip_given_fair * hypothesis_fair) / prob_heads


print hypothesis_true

---

## Bayes' theorem in the context of statistical modeling

We can also interpret the equations above in the context of statistical modeling, which we've been doing tons of in this class:

### $$P\left(\;model\;|\;data\;\right) = \frac{P\left(\;data\;|\;model\;\right)}{P(\;data\;)} P\left(\;model\;\right)$$

Or in plain english:

**What is the probability of our model being true, given the data we have? This depends on the likelihood of the observed data given our model and the data itself, as well as our prior belief that this model is true.**

---


### Bayes theorem breakdown

![](./assets/images/bayes-rule-e1350930203949.png)

### Computational solutions with Bayes' theorem

Consider two shoppers' baskets in an e-commerce store

Basket 1 has 30 cans of seltzer and 10 cans of V8. Basket 2 has 20 cans of each

You picked one basket at random and selected can, which was seltzer. 

What's the probability it came from basket 1?

This is a very simple case, but we can start to employ **prior distributions**, giving us **posterior distributions**.

In [7]:
hypo_dist = {'Basket1': .5, 'Basket2': .5} # Priors
likelihood_dist = {'Basket1': .75, 'Basket2': .5} # Likelihood
marginal_prob = 5/8.0 # Our normalizing constant, the marginal probability

print (hypo_dist['Basket1'] * likelihood_dist['Basket1']) / marginal_prob
print (hypo_dist['Basket2'] * likelihood_dist['Basket2']) / marginal_prob



0.6
0.4


We more often use functions to calculate distributions:

In [8]:
from scipy import stats
stats.uniform.rvs(0,1, size = 2) # Uniform likelihood of picking out a seltzer

array([ 0.40924741,  0.69209744])

### Independent practice: the train problem

"A railroad numbers its locomotives in order 1...N. You see a locomotive with the number 60. Estimate how many locomotives the railroad has." What's the prior? What's the likelihood?

The prior is what we knew (or will assume) about N before our observation of data.

The likelihood is the probability of seeing the data for any given value of N.

How can you write a likelihood function for this problem?

References and sources modeled off of:

http://ipython-books.github.io/featured-07/

http://stats.stackexchange.com/questions/31867/bayesian-vs-frequentist-interpretations-of-probability

http://jakevdp.github.io/blog/2014/03/11/frequentism-and-bayesianism-a-practical-intro/

https://simple.wikipedia.org/wiki/Bayes%27_theorem

https://en.wikipedia.org/wiki/Central_limit_theorem

http://www.cogsci.ucsd.edu/classes/SP07/COGS14/NOTES/binomial_ztest.pdf

https://en.wikipedia.org/wiki/Prior_probability#Uninformative_priors

https://arbital.com/p/bayes_rule/?l=1zq

https://betterexplained.com/articles/an-intuitive-and-short-explanation-of-bayes-theorem/

http://www.yudkowsky.net/rational/bayes/

http://people.stern.nyu.edu/wgreene/MathStat/Notes-2-BayesianStatistics.pdf

http://stats.stackexchange.com/questions/58564/help-me-understand-bayesian-prior-and-posterior-distributions

http://pages.uoregon.edu/cfulton/posts/bernoulli_trials_bayesian.html

http://chrisstrelioff.ws/sandbox/2014/12/11/inferring_probabilities_with_a_beta_prior_a_third_example_of_bayesian_calculations.html

https://www.chrisstucchio.com/blog/2013/magic_of_conjugate_priors.html
---