# Module 2 Part 3:  The Bayesian framework and Random Variables

# Table of Contents

This module consists of 3 parts:

**Part 1** - Introduction to Probability

**Part 2** - Probability Distributions

**Part 3** - the Bayesian framework and Random Variables

Each part is provided in a separate notebook file. It is recommended that you follow the order of the notebooks.
   
Part 3:
* [Bayes' Theorem](#Bayes_Theorem)
    * [The Bayesian framework](#The_Bayesian_framework)


* [Properties of Random Variables](#Properties_of_Random_Variables)
    * [Expectation_of_a_random_variable](#Expectation_of_a_random_variable)
    * [Variability in expectation](#Variability_in_expectation)


<a id='Bayes_Theorem'></a>
# Bayes' Theorem

There are situations, where we witness a particular event and we need to compute the probability of one of its possible causes.

In other words we observe $P(A\ |\ B)$ and we want to know $P(B\ |\ A)$.  This is where whe apply Baye's theorem. 

For illustrative purposes supose we want to calculate $P(A \ and \ B)$.  We can use the conditional probabiliy equation in two ways:

$$
P(A \ and \ B)=P(A|B) \cdot P(B)
$$

or 

$$
P(A \ and \ B)=P(B|A) \cdot P(A)
$$

then we can say

$$
P(B|A) \cdot P(A)=P(A|B) \cdot P(B)
$$

which one step away of the Bayes' Theorem, we only need to solve for any of the two conditional probabilities.

We can now add the equation for the sum of conditional probabilities.  Suppose like before that $A_1$, ..., $A_k$ represent all the disjoint outcomes for a variable or process $A$, then:

$$
P(A_1\ |\ B)=\frac{P(B\ |\ A_1)\cdot P(A_1)}{P(B)}=\frac{P(B|A_1)\cdot P(A_1)}{P(B|A_1)\cdot P(A_1)+P(B|A_2)\cdot P(A_2)+···+P(B|A_k)\cdot P(A_k)}
$$

Where

$$
P(B)= P(B|A_1)\cdot P(A_1)+P(B|A_2)\cdot P(A_2)+···+P(B|A_k)\cdot P(A_k)
$$

Bayes’ Theorem can be thought as a way of 'inverting' conditional probabilities.  Sometimes we know $P(A|B)$ but we need to calculate $P(B|A)$.

<a id='The_Bayesian_framework'></a>
## The Bayesian framework

  * The Bayesian approach assumes that we always have a prior distribution even though the prior may be very vague, equiprobable, or even outright wrong
  * When we obtain new data, we update the prior distribution in light of the new data to get an updated probability distribution called the posterior distribution
  * The posterior distribution reflects our state of knowledge after collecting data
  
Bayes is a big topic, we will give a quick introduction however it will be further developed at a later module.

Bayes’ theorem gives us a way to update the probability of a hypothesis, H, in light of some body of data, D.

Rewriting the Bayes theorem with Hypothesis and Data yields:

$$P(H|D)= \frac{P(D|H) \cdot P(H)}{P(D)}$$

Where:

$p(H)$ is the probability of the hypothesis before we see the data, called the **prior** probability

$p(H|D)$ is the probability of the hypothesis after we see the data, called the **posterior**

$p(D|H)$ is the probability of the data under the hypothesis H, called the **likelihood**

$P(D)$ is the probability of the data under any hypothesis, called the **evidence or normalizing constant**


### M&M Example
![image.png](attachment:image.png)
Image of M&Ms in all colours, including blue.

Image source: Plain-M&Ms-Pile.jpg, (2010).

**Example** (Downey, 2012).

In 1995, they introduced blue M & M’s.
* Before then, the color mix was 30% brown, 20% yellow, 20 % red, 10% Green, 10% orange, 10% tan.
* Afterward it was 24% blue, 20 % green, 16% orange, 14% yellow, 13% red, 13% brown.

You have 2 bags. One is from 1994 and one from 1996 (but you don't know which is which).
You draw one M&M from each bag. One is yellow and one is green. 

What is the probability that the yellow one came from the 1994 bag?

Note: it is easy to calculate what is the probability of a colour being drawn from a specific bag, however what we want to know is:

*If we draw a specific colour, what is the probability that it is coming from a specific bag?*

* Hypothesis A: Bag 1 is from 1994, which implies that Bag 2 is from 1996.
* Hypothesis B: Bag 1 is from 1996 and Bag 2 from 1994. 


Both bags have the same probability of being chosen to draw an M&M from: $\frac{1}{2}$
The probaility of drawing a green and a yellow from each bag is also easy to calculate:

$P(bag_{1994})=\frac{1}{2}$

$P(bag_{1996})=\frac{1}{2}$

$P(green|bag_{1994})=\frac{10}{100}$

$P(yellow|bag_{1994})=\frac{20}{100}$

$P(green|bag_{1996})=\frac{20}{100}$

$P(yellow|bag_{1996})=\frac{14}{100}$

|Hypothesis     | Prior         | Likelihood            |Prior $\cdot$ Likelihood|Posterior              |
| :-----------: | :-----------: |:--------------------: |:--------------------: |:--------------------: |
|Hypothesis| P(H)	        |P(D\|H)	            |P(H) p(D\|H)          |P(H\|D)               |
|    A          | $\frac{1}{2}$ |$\frac{10}{100} \cdot \frac{20}{100}$   |0.02  |0.74                   |
|    B          | $\frac{1}{2}$ |$\frac{14}{100} \cdot \frac{10}{100}$	 |0.007 |0.26                  |


To calculate the **posterior**, we need to apply Bayes' formula:

$$
P(H|D)= \frac{P(D|H) \cdot P(H)}{P(D)}= \frac {\frac{20}{100} \cdot \frac{20}{100} \cdot \frac{1}{2}}{\frac{20}{100} \cdot \frac{20}{100} \cdot \frac{1}{2} + \frac{14}{100} \cdot \frac{10}{100} \cdot \frac{1}{2}}=0.74
$$

<a id='Properties_of_Random_Variables'></a>
# Properties of Random Variables

<a id='Expectation_of_a_random_variable'></a>
## Expectation of a random variable

If we have a random variable (X), with multiple possible outcomes, our expectation is that its expected value will be the sum of its probabilities multiplied by their value.

Let's illustrate this with an example.

My restaurant sells three things:

  * Pasta (10 dollars) – 50% of customers buy it
  
  * Pizza (8 dollars) – 40% of customers buy it
  
  * Salad (6 dollars) – 10% of customers buy it
  
 How much can I expect each customer to spend?  
 
$E = $ expected value

$E = \$10\cdot 0.5 + \$8\cdot 0.4 + \$6\cdot 0.1$

$E = \$8.80$

So, if I serve 100 customers, I can expect to make \$880

In general, if X takes outcomes $x_1$, ..., $x_k$ with probabilities $P(X = x_1)$, ..., $P(X = x_k)$, the expected value of X is the sum of each outcome multiplied by its corresponding probability:

$$
E(X)=x_1 \cdot P(X =x_1)+···+x_k \cdot P(X =x_k)
$$

<a id='Variability_in_expectation'></a>
## Variability in expectation

The variance is calculated as the squared difference between each value and the mean, multiplied by the probability:

$E = \$10 \cdot 0.5 + \$8 \cdot 0.4 + \$6 \cdot 0.1 = \$8.8$

$V = (\$10-\$8.8)^2\cdot 0.5 + (\$8-\$8.8)^2\cdot0.4 + (\$6- \$8.8)^2\cdot 0.1$

$V = 0.72 + 0.256 + 0.784 = 1.76$

$SD = \$1.33$

For 100 customers, we can expect a range from $747-1013

In general, if X takes outcomes $x_1$, ..., $x_k$ with probabilities $P(X = x_1)$, ..., $P(X = x_k)$ and expected value $\mu = E(X)$, then the variance of X, denoted by Var(X) or the symbol $\sigma^2$, is:

$$
\sigma^2 =(x_1-μ)^2 \cdot P(X=x_1)+···+ (x_k-μ)^2 \cdot P (X=x_k)
$$

**End of Module.**

You have reached the end of this module.

If you have any questions, please reach out to your peers using the discussion boards. If you
and your peers are unable to come to a suitable conclusion, do not hesitate to reach out to
your instructor on the designated discussion board.

When you are comfortable with the content, and have practiced to your satisfaction, you may
proceed to any related assignments, and to the next module.


<a id='References'></a>
# References


Downey, A. (2012). Section 1.6: the M&M problem in *Think Bayes: Bayesian Statistics Made Simple, Version 1.0.9,* Green Tea Press. http://www.greenteapress.com/thinkbayes/html/index.html

Plain-M&Ms-Pile.jpg (2010). Retrieved Dec 5, 2018 from Wikimedia Commons. https://commons.wikimedia.org/wiki/File:Plain-M%26Ms-Pile.jpg Public Domain
