In [None]:
# this is some code to get pretty highlighted cells for the questions - ignore this
from IPython.display import HTML
style1 = "<style>div.warn { background-color: #fcf2f2;border-color: #dFb5b4; border-left: 5px solid #dfb5b4; padding: 0.5em;}</style>"
HTML(style1)

This notebook covers some worked examples and some examples for you to try relating to **Block A, Chapter 1** in the notes (Introduction to Probability).  This is practice and core material for coursework 1. The green questions are those most closely related to the assessed work for this module. *After completing this notebook, you can attempt QNs 1 and 4 on coursework 1.*

## Revision of concepts:

The axioms of probability are:

Axiom 1 : $0 \le P(A) \le 1$

Axiom 2 : $P(\Omega) = 1$

Axiom 3 : $P(A_1 \cup A_2 \cup A_3 ...) = P(A_1)+P(A_2)+(P(A_3)+...$

Moving from conditional probabilities to Bayes Theorem which states that:

$ P (A|B) = \dfrac{P(A \cap B) }{P(B)} \rightarrow \dfrac{P(B|A)P(A)}{P(B|A)P(A)+P(B|A^c)P(A^c)}$ 

Bayes Rule is normally used to determine the probability of a specific model, $\theta$, given some data D, such that

$P(\theta|D) = \dfrac{P(D|\theta) P(\theta)}{P(D)}$ where

where $P(D|\theta)$ is the *likelihood*, $P(\theta)$ is the *prior*, and $P(D)$ is the *evidence*. $P(\theta|D)$ is the *posterior*. 

People feel uncomfortable about ‘priors’, since often they are a ‘best guess’

Indeed, different analysts may have differing opinions about what the prior for a given experiment should be.  Although ‘frequentists’ disagree with the use of priors, note that technically they do assume one: they assume that all is equally likely (i.e. a ‘flat’ prior).

Clearly this is also wrong.  So priors are useful — but they must be clearly stated. They provide a formal means for the analyst to include previous information that is relevant to the experiment. They also allow you test whether the model is good.

Bayesian vs Frequentist:

A Bayesian might argue “the prior probability is a logical necessity when assessing the probability of a model. It should be stated, and if it is unknown you can just use an uninformative (wide) prior” 

A frequentist might argue “setting the prior is subjective - two experimenters could use the same data to come to two different conclusions just by taking different priors”

Bayesian analysis has a somewhat formidable reputation for being extremely difficult… why is that?

- In general, the denominator can be difficult to evaluate
- Tricky integrals
- Often require numerical solutions
- Large (multivariate) parameter space
- In the 20th century, the development of Monte Carlo Markov Chains have made the evaluation of the integrals and the probabilities much easier. 

You will learn more about this later in the course!

***
### Worked example of Bayes Theorem

Imagine that a box contains five coins, one of which is a joke (J) coin, with heads on both sides. A coin is selected at random from the box, and flipped 3 times. The result is 3 heads (3H). What is probability that the coin is the trick coin?

First, we should define what we are trying to work out. 

We are interested in $P(J | 3H)$. We will let the normal coin be denoted by $C$. So using Bayes Theorem we can write,

$P(J | 3H) = \dfrac{ P(3H | J) P(J)}  {P(3H) }$

To get the probability of $P(3H)$ we need to add up all possibilities of getting it.

$P(J | 3H) = \dfrac{ P(3H | J) P(J)}  {P(3H | J) P(J) + P(3H | C) P(C)}$

The probability of randomly selecting the joke coin is $P(J) = 1/5$. 

The probability of not selecting it, is $P(J^c) = 1 - 1/5 = 4/5 = P(C)$. 

The probability of getting 3 heads with the joke coin is 1, so 

$P(3H | J) = 1$

The probability of getting 3 heads with a standard coin is $(1/2) \times (1/2) \times (1/2)$ (remember these are independent events), so

$P(3H | C) = 1/8$

$P(J | 3H) = \dfrac{ 1 \times 1/5}  {1 \times 1/5 ~ + ~1/8 \times 4/5} = 2 / 3$

So there's a 66% chance the coin that we are seeing flipped is the joke coin!

***
### Your turn:

<div class="warn">** Question: ** <br><br>

Write down in words what the following expressions mean:<br>
 
- $P(B^c)$ <br>
- $P(C \cap D)$ <br>
- $P(C \cup D)$ <br>

</div>

- $P(B^c)$ Probability of event B not occurring <br>
- $P(C \cap D)$ Probability of event C and D occurring <br> 
- $P(C \cup D)$ Probability of event C or D occurring <br>

<div class="alert alert-block alert-success">**Question:** <br><br>
You flip a coin 10 times. What is the probability that heads will not occur 5 times?    
</div>

** Answer: **

In [7]:
# So the probability of 5 heads not occurring is equal to 1 minus the probability of 5 heads occurring
PH5 = ((0.5)**5)*((0.5)**5)
PHN5 = 1 - PH5
print(PHN5)

0.9990234375


<div class="alert alert-block alert-success">**Question:** <br><br>
Define the terms: prior, likelihood, posterior in relation to Bayes Rule in terms of probability (use words).
</div>

**Answer:**

$P(A|B) = \dfrac{P(B|A) P(A)}{P(B)}$ where

where $P(B|A)$ is the *likelihood*, $P(A)$ is the *prior*, and $P(B)$ is the *evidence*. $P(A|B)$ is the *posterior*. 

<div class="alert alert-block alert-success">**Question:** 

Suppose you think you have a rare disease where 1 in 10,000 people are affected. You go to your doctor, and she performs a test. The test is of a high quality, and is correct 99% of the time.

Unfortunately your results come back positive for the disease. What is your chance of having the disease? Is it (roughly): <br><br>

a) 99% <br>
b) 90% <br>
c) 10% or <br>
d) 1%? <br> <br>

Repeat the same as above, but change 1 in 10,000 to 1 in 100. What would the chances of having the disease be now? </div>

**Answer:**

$P(A|B) = \dfrac{P(B|A)P(A)}{P(B|A)P(A)+P(B|A^c)P(A^c)}$  <br>
$P(B|A) = 0.99, P(A) = 0.0001, P(A^c) = 0.9999, P(B|A^c) = 0.01$ <br>
$P(A|B) = (0.99 * 0.0001) / (0.99 * 0.0001 + 0.01 * 0.9999)$ <br>
$P(A|B) = 0.00980382$

<div class="alert alert-block alert-success">** Question: ** <br><br>

 Given that $P(A \cap B) = P(B \cap A)$, show that Bayes Theorem can be written as <br>

$\dfrac{P(B|A)P(A)}{P(B|A)P(A)+P(B|A^c)P(A^c)$}$  </div>

**Answer:**

So assuming $P(A \cap B) = P(B \cap A)$ <br>
and using, $P(A|B) = \dfrac{P(A \cap B) }{P(B)}$ , $P(B|A) = \dfrac{P(B \cap A) }{P(A)}$ and $P(B|A) = \dfrac{P(B \cap A^c) }{P(A^c)}$ <br>
Now by rearanging and subsituting the first two equations on the second line into the first line we get:<br>
$P(A|B) = \dfrac{P(B|A) P(A)}{P(B)}$ <br>
Then lastly we need subsitute a formula in for $P(B)$ <br>
So by using the last to formulas in the second line, we rearrange for $P(B)$ and subsitute in to our formula, giving: <br>
$P(A|B) = \dfrac{P(B|A)P(A)}{P(B|A)P(A)+P(B|A^c)P(A^c)$}$

<div class=warn>  **Question:** <br><br>

Find out (from your lecture notes or other source) the definition of a discrete and a continuous variable and give an example of each. 
</div>

**Answer:**

A Discrete variable is countable in a finite time. whereas, a continuous variable is impossible to count in finte time.
Example of a Discrete variable is number of wieghts in a box.
Example of a continuous variable is time.

***

### So in the end what is probability?

Our examples of coin flipping, die rolling and card selecting, we introduced the notion that the probability of a particular outcome or event is simply the number of times that event occurs, divided by the number of all possible outcomes.

But say you have a coin, and you want to know $P(H)$ -- how do you proceed? You could guess that the coin is fair and assign 0.5 to outcome heads/tails. But is the coin fair? One way you could test this is to perform lots of experiments (coin flips) and keep track of the outcome. If you do enough of these, eventually you will get an empirical measure,

$P(H) = \dfrac{n_H}{n_{\rm flips}}$

where $n_H$ is the number of heads that appeared in the experiment and $n_{\rm flips}$ is the number of times you flipped the coin (and counted the result). But when do you stop? Well, that depends on how accurately you want to know $P(H)$. But for now, we will simply note that this type of determination of $P$ is *frequentist*, in that the probability is defined by counting the instances of occurrence.

However, what about the probability that it will rain tomorrow? You can see straight away that such a probability is more difficult to define. In fact, the use of Bayes Theorem, and in particular the prior, introduces a much more vague idea that $P$ represents the belief that something will occur.