# Module 9 Part 1: Introduction to Bayesian Inference

## Introduction

Bayesian inference is a framework that takes a different approach to the frequentist statistics you have been working with so far. It brings the uncertainty in real-world problems front and centre, and uses it explicitly to calculate the probability of a particular outcome.

This module consists of 3 parts:

* **Part 1** - Introduction to Bayesian inference
* **Part 2** - The diachronic interpretation
* **Part 3** - Discriminative models

Each part is provided in a separate notebook file. It is recommended that you follow the order of the notebooks.

## Learning Outcomes

In this module, you will:
* Become familiar with Bayesian methodology
* Get comfortable with the concept of uncertainty
* Solve real-world problems using Bayesian inference
* Develop intuition for data stories

# Readings and Resources

We invite you to further supplement the content in this module with the following recommended texts.

* Davidson-Pilon, C. (2015). Chapter 1 *Bayesian methods for hackers: Probabilistic programming and Bayesian inference.* Addison-Wesley. https://camdavidsonpilon.github.io/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/

* Diez, D., Barr, C. & Çetinkaya-Rundel, M. (2017). Chapter 2-4 *OpenIntro Statistics (3rd Ed.)*. https://www.openintro.org/stat/textbook.php?stat_book=os

* Downey, A. (2012). Chapter 1 and 2 *Think Bayes*. Green Tea Press http://www.greenteapress.com/thinkbayes/thinkbayes.pdf

* Witten, I.H., Frank, E., Hall, M.A., & Pal, C.J. (2017). *Data mining – Practical machine learning tools and techniques (4th Ed.).* Cambridge: Morgan Kaufmann (Elsevier).



# Table of Contents

[Bayesian Statistics](#bayesian_statistics)

[Bayesian Analysis](#bayesian_analysis)

[Conditional Probability](#conditional_probability)

## Introduction to Bayesian inference

### Bayesian statistics <a id='bayesian_statistics'></a>

How do we measure uncertainty? And how do we make decisions in its presence? One of the ways to deal with uncertainty, in a more quantified way, is to think about probabilities.

There are two major frameworks that statisticians use to think about probabilities.

In the **Frequentist** framework, probabilities depend on the relative frequency of repeatable events. This approach works very well when we can define a hypothetical infinite sequence.

In the **Bayesian** framework, probabilities represent our perspective, which takes into account what we know about a particular problem. The uncertainty of the relevant measurement(s) are integral to the framework.

For example, when we flip a fair coin many times, the frequentist approach assumes that the statistics of the coin do not change (i.e., the coin remains fair). In the Bayesian framework, however, our perspective of the fairness of the coin may change as information comes in.

The Bayesian world-view interprets probability as a measure of believability in an event; that is, how confident we are in an event occurring. Frequentists, whose analysis is a more classical version of statistics, assume that probability is the long-run frequency of events. This makes sense for the probabilities of many events, but becomes more difficult to understand when events have no long-term frequency of occurrences.

### Bayesian analysis <a id='bayesian_analysis'></a>

Bayesians follow an intuitive approach. We will use an example to demonstrate the  frequentist versus Bayesian approach.

Consider the question:
*"What is the probability of a die being fair?"*

A frequentist would think like this: we can roll the die many times, but that's not going to change whether or not it's a fair die.

The probability is either 0 or 1.

* The frequentist approach tries to be objective in how it defines probabilities.
* But, sometimes we also get interpretations that are not particularly intuitive.

A Bayesian would think like this: we can roll the die many times. If we have different information than somebody else, then our probabilities may be different.

Probabilities are updated as more data comes in.

* This is inherently a subjective approach to probability.
* The Bayesian framework works well with a mathematically rigorous foundation, and follows all probability rules (i.e., $p_i < 1$, $\sum{p_i} = 1$).
* Thinking in this way leads to much more intuitive results.

#### What types of questions is Bayesian analysis suited to answer?

Certain questions, such as coin flips, dice rolls, and other situations in which probabilities are static, are easily answered using a frequentist approach. However, when the probabilities are not static, Bayesian analysis becomes a more intuitive way to investigate a problem.

For example, user preferences on topics such as movies and products in online stores are complex, rely on many factors, and can change rapidly. In this instance, a frequentist approach is less useful because it is extremely difficult to predict the underlying statistics, whereas a Bayesian approach allows updating of our user preference model as more data comes in.

Thus Bayesian analysis is suited to the following kinds of questions:

* What is the probability of a coin being fair?
* What is the probability of getting a four when rolling a die?
* What is the probability of rain tomorrow?
* What is the probability that users prefer site A vs. site B?

![frequentists_vs_bayesians.png](attachment:frequentists_vs_bayesians.png "Frequentists vs Bayesians")

(xkcd comics, n.d.)

Image description: Cartoon example of the frequentist versus Bayesian approach.

Question: Did the sun just explode? (It's night, so we're not sure.)

A "neutrino detector" measures "whether the sun has gone nova". This machine rolls two dice. If they both come up six, the machine lies to us. Otherwise, it tells the truth.

A frequentist and a Bayesian are in a room with the machine. The machine rolls the dice and answers the question with a "Yes". The frequentist says "The probability of this result happening by chance is 1/36 = 0.027. Since p < 0.05, I conclude that the sun has exploded. The Bayesian says "Bet you $50 it hasn't."

### Conditional probability - review <a id='conditional_probability'></a>

Bayesian statistical analysis requires the correct application of probability concepts. We will now review the basics.

If two events are related to each other, what is the probability of event A happening given that we know event B happened?

#### Marginal and joint probabilities

Recall the example from Module 2 on the probability about whether a teen will go to college based on whether their parents did.

If a probability is *based on a single variable*, it is called a **marginal probability**:

e.g., a probability based solely on the $teen$ variable is called a marginal probability:

$$P (teen\ college) = \frac{445}{792} = 0.56$$

The probability of outcomes for *two or more variables or processes* is called a **joint probability**:

$$P (teen\ college\ and\ parents\ not) = \frac{214}{792} = 0.27$$

#### Conditional probability and independence

The **conditional probability** of the outcome of interest $A$ given condition $B$ is computed as the following:

$$P(A\ |\ B) = \frac{P(A\cap B)}{P(B)}$$

(Recall from Module 2 that $P(A\ |\ B)$ means the probability of A *given* B, and $P(A\cap B)$ is the probability of A *and* B.)

Thus, in this example:

$$P (parents\ not\ given\ teen\ college) = \frac{P (teen\ college\ and\ parents\ not)}{P(teen\ college)} = \frac{\frac{214}{792}}{\frac{445}{792}} = \frac{214}{445} = 0.48 $$

Two events are called **independent** when:

$$P (A\ |\ B) = P (A)$$

Then:

$$P(A \cap B) = P(A) \cdot P(B)$$

In this example, if $P(teen\ college)$ and $P(parents\ not)$ were independent, then:

$$P (teen\ college\ given\ parents\ not) = P (teen\ college) = \frac{214}{445} = 0.48$$

#### Example

Suppose a box contains 100 t-shirts. 60 are blue, 40 are red.

Suppose also that we have 50 size small, of which 30 are red.

* Marginal probability of red $ = \frac{40}{100} = 0.4$
* Marginal probability of small = $ = \frac{50}{100} = 0.5$
* Joint probability of red and small $ = \frac{30}{100} = 0.3$

Question: if the T-shirt were randomly chosen from the box, what is $P(Red\ |\ Small) $?

$$\frac{P(Red\ and\ Small)}{P(Small)} = \frac{0.3}{0.5} = \frac{3}{5}$$

Alternatively, we can see from the information given that out of the 50 size small, 30 are red.

$$\frac{30}{50} = \frac{3}{5}$$

We arrived at the same answer.

Question: What is $P(Red\ |\ not\ Small)$?

Question: Is $P(Red\ and\ Small) = P(Red) \times P(Small)$? What can we say?

#### Bayes' theorem

This is the theoretical basis for the framework. What is $P(A\ |\ B)$ , in terms of $P(B\ |\ A)$?

From conditional probability:

$$P(A\cap B) = P(A\ |\ B) \cdot P(B)$$

and similarly:

$$P(A\cap B) = P(B\ |\ A) \cdot P(A)$$

Thus,

$$P(A\ |\ B) \cdot P(B) = P(B\ |\ A) \cdot P(A)$$

and this can be rearranged as:

$$P(A\ |\ B) = \frac{P(B\ |\ A) \cdot P(A)}{P(B)}$$

This is known as **Bayes Theorem**, and is one of the most important equations you will learn on this topic!

Also, since:

$$P(B) = P(A) \cdot P(B\ |\ A) + P(A^C) \cdot P(B\ |\ A^C)$$

we can write:

$$P(A\ |\ B) = \frac{P(B\ |\ A) \cdot P(A)}{P(B\ |\ A) \cdot P(A) + P(B\ |\ A^C) \cdot P(A^C)}$$

This is known as the **odds form** of Bayes' theorem.

Thus, in the previous example,

$$P(Red\ |\ Small) = \frac{P(Small\ |\ Red) \cdot P(Red)}{P(Small\ |\ Red) \cdot P(Red)+P(Small\ |\ not\ Red) \cdot P(not\ Red)}$$

and since:

$$P(Small\ |\ Red) = \frac{30}{40} = \frac{3}{4}$$

$$P(Small\ |\ not\ Red) = \frac{20}{60} = \frac{1}{3}$$

$$P(Red) = \frac{4}{10}$$

$$P(not\ Red) = \frac{3}{5}$$

therefore:

$$P(Red\ |\ Small) = \frac{\frac{3}{4} \cdot \frac{2}{5}}{\frac{3}{4} \cdot \frac{2}{5}+\frac{1}{3} \cdot \frac{3}{5}} = \frac{\frac{3}{10}}{\frac{3}{10}+\frac{2}{10}} = \frac{3}{5}$$

The same answer as before. It seems a rather roundabout way of getting it, but if you only have certain data available to you, it's important to know how to use this form.

#### Example: The cookie problem

Bowl 1 contains 30 vanilla cookies and 10 chocolate cookies.

Bowl 2 contains 20 vanilla cookies and 20 chocolate cookies.

Suppose you choose one of the bowls at random and select a cookie at random. The cookie is vanilla. What is the probability that it came from Bowl 1?

We want to find $P(Bowl\ 1\ |\ Vanilla)$, but how to compute it is not obvious.

Using Bayes' theorem:

$$P(Bowl\ 1\ |\ Vanilla) = \frac{P(Vanilla\ |\ Bowl\ 1) \cdot P(Bowl\ 1)}{p(Vanilla)}$$

$P(Bowl\ 1) = \frac{1}{2}$, the probability that we choose Bowl 1 (assuming this is random)

$P(Vanilla\ |\ Bowl\ 1)$ is the probability of getting a vanilla cookie from Bowl 1 ($\frac{30}{40}$ or $\frac{3}{4}$)

$P(Vanilla)$ is the probability of drawing a vanilla cookie from either bowl: $(\frac{1}{2})(\frac{30}{40}) + (\frac{1}{2})(\frac{10}{20}) = \frac{5}{8}$.

$P(Bowl\ 1\ |\ Vanilla) = \frac{(\frac{3}{4})(\frac{1}{2})}{(\frac{5}{8})} = \frac{3}{5} = 0.6$

This may seem obvious from the nature of the question (i.e., we know that there are 30 vanilla cookies in Bowl 1 and 50 vanilla cookies overall, so it seems clear that $P(Bowl\ 1\ |\ Vanilla) = 0.6$), but note that many problems of this type are not so clear cut. It's important to note that this method generalizes to more complex situations!

#### EXERCISE: Genetics and medicine

**A: Genetic testing**

Say that 1% of people have a certain genetic defect. Genetic testing is available for this particular defect, and 90% of tests for the gene detect the defect if it is there.

However, 9.6% of the tests are false positives (i.e., they detect the defect when it isn't really there).

If a person gets a positive test result, **what are the odds they actually have the genetic defect?**

In [1]:
#your work here

How would your answer to the above change if 5% of people had the defect?

In [2]:
#your work here

**B: A test for cancer**

Given the following statistics, what is the probability that a woman over 50 has cancer if she has a positive mammogram result?

One percent of women over 50 have breast cancer.
Ninety percent of women who have breast cancer test positive on mammograms.
Eight percent of women will have false positives.

In [3]:
#your work here

#### EXERCISE: Solution

**A: Genetic testing**

In [9]:
P_Gene = 0.01

P_NOT_Gene = 0.99

P_Pos_given_Gene = 0.9

P_Pos_given_NOT_Gene = 0.096

P_Gene_given_Pos = (P_Pos_given_Gene*P_Gene) / (P_Pos_given_Gene*P_Gene + P_Pos_given_NOT_Gene*P_NOT_Gene)

In [10]:
P_Gene_given_Pos

0.0865051903114187

(This is much smaller than we might expect for a test that supposedly detects a defective gene 90% of the time! It's important to realise how our intuitions can lead us astray in problems like these.)

In [11]:
# the odd to actually have genetic defect is P_Gene_given_Pos/(1-P_Gene_given_Pos) is very small

odd_to_have_Gene_defect = P_Gene_given_Pos/(1 - P_Gene_given_Pos)

odd_to_have_Gene_defect

0.09469696969696972

If 5% of people had the defect:

In [15]:
P_Gene = 0.05

# This changes the robability of observing patient without Gene defect P_NOT_Gene = 1 - P_Gene
P_NOT_Gene = 0.95

P_Gene_given_Pos = (P_Pos_given_Gene*P_Gene) / (P_Pos_given_Gene*P_Gene + P_Pos_given_NOT_Gene*P_NOT_Gene)

In [16]:
P_Gene_given_Pos

0.3303964757709251

In [19]:
# When the event is less rare to observe (5 % from 1%) the fraction of False positive drops
# what leads to much higher chance to actually have defect given positive test and corresponding odds

odd_to_have_Gene_defect_5 = P_Gene_given_Pos/(1 - P_Gene_given_Pos)
odd_to_have_Gene_defect_5

0.493421052631579

**B: A test for cancer**

Given the following statistics, what is the probability that a woman over 50 has cancer if she has a positive mammogram result?

One percent of women over 50 have breast cancer.
Ninety percent of women who have breast cancer test positive on mammograms.
Eight percent of women will have false positives.

In [20]:
P_Cancer = 0.01
P_NOT_Cancer = 0.99
P_Pos_given_Cancer = 0.9
P_Pos_given_NOT_Cancer = 0.08

P_Cancer_given_Pos = (P_Pos_given_Cancer*P_Cancer) / (P_Pos_given_Cancer*P_Cancer + P_Pos_given_NOT_Cancer*P_NOT_Cancer)

In [8]:
P_Cancer_given_Pos

0.10204081632653063

In [21]:
# The odds to actually have cancer given positive test:

odd_cancer_positive_test = P_Cancer_given_Pos / (1 - P_Cancer_given_Pos)

odd_cancer_positive_test

0.11363636363636366

# References

xkcd comics, (n.d.). *Frequentists vs. Bayesians*. Retrieved Dec 20, 2018 from https://xkcd.com/1132/Creative Commons Attribution-NonCommercial 2.5 License.



**End of Part 1**

This notebook makes up one part of this module. Now that you have completed this part, please proceed to the next notebook in this module.

If you have any questions, please reach out to your peers using the discussion boards. If you and your peers are unable to come to a suitable conclusion, do not hesitate to reach out to your instructor on the designated discussion board.