# Bayes' Theorem
> An introduction to one of the most important and counter-intuitive concept in probability theory.

- toc: true 
- badges: false
- comments: true
- categories: [probability-theory]
- image: images/1.jpg

## 1. Pre-requisites

- You need to have knowledge of basic probability theory. If you are comfortable in calculating probabilities of discrete events and comfortable with the sum rule and product rule then you're good to go (If you're not, don't worry, I've tried to give a terse explaination using an example below). Try out this question.

**Que** - Bag A has 5 red and 3 blue balls, Bag B has 6 red and 4 blue balls. The probability that a person chooses Bag A is 0.3 and he'll choose Bag B with probability 0.7. What is the probability of a person selecting a blue ball from bag B? What is the total probability of him coming out of with a red ball?


**Ans** - 
 - *Probability of choosing blue ball from B = (Probability of choosing B).(Probability of him picking blue ball **given that** he selected B)*. If your answer is 0.28, then you're comfortable enough with the product rule!!

 - *Probability of him coming out with a red ball = (Prob. of choosing red ball from A) + (Prob. of choosing red ball from B).*(Notice that you can calculate entities inside bracket using the formula in first bullet point).If your answer comes out to be 0.6075, then you're comfortable enough with the sum rule to go through this blog post !!

*Intuitively, sum rule comes in play when there is a choice between independent events (these events are generally seperated by a **or** between them) for eg. The event of either choosing A and then a red ball **or** B and then a red ball (as shown in 2. above).Product rule comes into play when two dependent events occur consecutively for eg. choosing bag B **and** then selecting a blue ball from it (as shown in 1. above).*  

## 2. Notations
- **P(X)** = Probality of event X. for eg. in above question P(A) = 0.3 where A = Event of person selecting bag A
- **P(X|Y)** = Probability of X given that Y has occured. for eg. in above question Probability of him picking blue ball given that he selected bag B = P(blue|B)

## 3. Formula and basic jargon

Let me spit out the formula for Bayes' theorem quickly. Subsequently, I'll explain every term of the formula in detail using an example and introduce some basic jargon along the way.
![](https://wikimedia.org/api/rest_v1/media/math/render/svg/87c061fe1c7430a5201eef3fa50f9d00eac78810)

Let's try to understand this formula using a familiar question involving bag A and bag B.

All the details of the question are same as before. But now we are told that after everything has occured, the person was found having a blue ball in his hand. What are the odds that he selected that ball from Bag A? (Notice the sequence of events. It's easy to calculate probability of choosing a blue ball once the person has selected Bag A, because these two events have a cause and effect relationship. Choosing Bag A 'caused' the effect of selection of blue ball. But in the problem stated above, we're already given the 'effect' and are asked the 'cause' that caused it.)  

Let us define some symbols first,
P(A) = probability of selecting bag A which is given as 0.3 in the question.
P(B) = probability of selecting bag B which is given as 0.7 in the question.
P(b) = probability of selecting the blue ball.
P(r) = probability of selecting the red ball.

We have to compute probability of having selected bag A **given that** blue ball was found in person's hand i.e P(A|b).
Expanding this term using the formula gives us:
                                    **P(A|b) = P(b|A).P(A)/ P(b)**    (3.1)

The sentence "The person was found having a blue ball in his hand" is called **Evidence**.In above expansion **Evidence** is written mathematically as P(b) which is the denominator of our Eq.3.1.
To solve the question, we begin by listing all the ways through which **Evidence** could have occured. There are two ways in which blue ball could have landed in person's hand:-

1. He selected A **and** then picked up a blue ball **or** 2. He selected B **and** then picked up a blue ball.

All these listed ways make up our P(b) in Eq.3.1. Thus P(b) written mathematically will be equal to 
                                         **P(A).P(b|A) + P(B).P(b|B)**  (3.2)

With the help of new parlance, let's convert our question into a hypothesis. Given the **Evidence**, that blue ball was found in person's hand, let's say we *hypothesise* that ball must have come from Bag A. Now, we go ahead and test whether our hypothesis holds true.

Now, I want you to make a intuitive guess about the answer of our question (or correctness of our hypothesis) .What do you think are the odds that Bag A was selected by the person carrying the blue ball? One could argue that the odds are 0.3 or P(A), because if we say to a new person to select a bag, he would choose A with probability 0.3. Thus, odds are that previous person (the one carrying the blue ball) also would have selected A with probability 0.3. Thus P(A) here is called **Prior**, because it reflects our *prior belief* about the selection of A *before seeing the* **Evidence**.
Now, what are the chances that he selects blue ball given that he selected bag A, in other words what are the chances of seeing the **Evidence** given that our **Hypothesis** is true? Mathematically speaking, P(b|A). This entity is called **Likelihood**.

**Likelihood** and **Prior** make up the numerator in the formula of Bayes' theorem. Thus our numerator is 
                                                **P(b|A).P(A)**. (3.3) 

Intuitively, we can think of it as following:-
Out of all the cases that make up our evidence (P(b)) we are only interested in the ones in which our hypothesis holds true. Thus only the case no 1 from above list of cases interests us and we put that in our numerator. And if you remember, that is basic probability; we calculate probability using the formula **cases that interest us/ total no of cases**. for eg what are the odds of selecting a red card from a deck of cards -> 26/52 or 0.5.

Thus,finally after putting all the pieces together we can calculate 
**P(A|b) = P(b|A).P(A)/ P(b)** = **P(A).P(b|A) / (P(A).P(b|A) + P(B).P(b|B)**) = 0.2866


The calculated probability **P(A|b)** is called the **Posterior**. This probability is an updated version of our **Prior**
based on new evidence.Now that we found the evidence that person was holding blue ball, we believe that he must have chosen bag A with probability 0.28(**posterior**) instead of 0.3(**prior**). Notice that probability goes down and intuitively this makes sense because Bag B has more blue balls than A, so if a person is found having the blue ball it's likely that it came from B. This is the main motive of Bayes' theorem. It helps us update our **Prior** beliefs continuously by collecting new **Evidence**. 


#### Quick summary and technique to solve problems involving Bayes' Theorem :

- Find out what is **given** in the problem. The given part serves as **Evidence** which aids us in assessing our **Hypothesis**.

- List out all the ways in which **Evidence** could have occured, calculate the probabilities of those ways using product rule and sum rule and write them as denominator.

- Pick out the way amongst the list of ways in which your **Hypothesis** holds true and put the probability of that way in the numerator.

## 4. Interesting Study Demonstrating The Counter-Intutiveness Of The Bayes' Theorem
*(This part of blog is inspired from a great video by [3 Blue 1 Brown](https://youtu.be/HZGCoVF3YvM)).*


Let me ask you an interesting question.
**Steve is very shy and withdrawn, invariably helpful but with little interest in people or in the world of reality. A meek and tidy soul, he has a need for order and structure, and a passion for detail."** Having read this sentence what do you think is the profession of Steve, a *librarian* or a *farmer* ? 

(*This quesion was asked by Nobel Laureate [Daniel Kahneman](https://en.wikipedia.org/wiki/Daniel_Kahneman) and [Amos Tversky](https://en.wikipedia.org/wiki/Amos_Tversky) in the studies which they conducted that showed that humans are intuitively bad staticians (even those who had PhDs in the field of statistics) and sometimes overestimate the correctness of their prior beliefs. Daniel Kahneman has written about these studies in his book "Thinking ,fast and slow".*)

Most people would guess that Steve is a librarian because he fits in the stereotypical image of one. Let's look at this problem with a Bayesian perspective. Let's say that the sentence written in bold above is our **Evidence**. Now we **Hypothesise** that *Steve is a librarian*. Let's calculate the validity of our hypothesis.

Steve is a random person taken from a representative sample. Let's say the probability of observing the above traits in a random person are P(E).
Let the probability of a random person being a farmer be P(F).
Let the probability of a random person being a librarian be P(L)

We would have to consider following questions to calculate the probability of our hypothesis given the evidence:-
1. Out of 100 librarians how many do you think fit the description given above in bold typeface? We are allowed to incorporate our stereotypes in estimating the answer to this question. Let's say 85 librarians fit the evidence. Mathematically speaking, **given that** a person is a librarian, the probability of him fiiting the above evidence (he is shy and a "meek and tidy soul") is  P(E|L) = 0.85

1. Out of 100 farmers how many do you think fit the description given above in bold typeface? Let's say 30 farmers fit the evidence.(Beacuse we all stereotypically think that farmers are less likely to be shy or a "meek and tidy soul").Mathematically speaking, **given that** a person is a farmer, the probability of him fiiting the above evidence is  P(E|F) = 0.3

We also need to take into account some statistical facts to decide our prior beliefs. At the time of conduction of this study, there were 20 farmers for every 1 librarian in america. Thus, out of 210, 10 people are librarian and 200 are farmers.Therefore, probability of a random person being a farmer i.e P(F) = 0.95 and probability of a random person being a librarian i.e P(L) is 0.05 (assuming our representative sample has only farmers and librarians).


#### Listing all the ways in which the evidence can occur :

1. The person selected at random is a librarian **and** he is a "meek and tidy soul" **or** 2. The person selected at random is a farmer **and** he is a "meek and tidy soul".

Writing this mathematically -> **P(L).P(E|L) + P(F).P(E|F)**

The case which interests us is case 1. Thus, **P(L|E) = P(L).P(E|L) / P(L).P(E|L) + P(F).P(E|F)**

**After doing the above calculation we find out that probability of Steve being a librarian is a mere 13 %. In other words only 13 out of 100 "meek and tidy souls" are librarians. This seems surprising and counter-intuitive because we incorporated our stereotypes in our calculations (by saying that 85 out of 100 librarians fit the evidence), yet the final calculations conclude that our hypothesis (which complied with our stereotypes) was wrong. 
An intutive way of thinking about this is as following:
There are way more farmers in general population than librarians, therfore there are way more "meek and tidy souls" ploughing the fields (77 out of 100 as per our calculations) than those who are meticulously keeping the books in the library. Take a sample of 210 people for example out of which 10 are librarians and 200 are farmers. According to our stereotypical estimates 85% of 10 librarians or ~9 librarians are shy, while only 30% of 200 lirarians or ~60 farmers are shy.Hence,out of 210 people 69 people are shy and tidy souls, majority of which are farmers. Thus if we randomly found a guy named Steve and he comes out as shy, he's probably a farmer.**

## 5. Bayes' Theorem As A Way Of Updating Our Priors And Belief Systems . 
*(This part of blog is inspired from this great video by [Veritasium](https://youtu.be/R13BD8qKeTg))*

Suppose, you go to a doctor and he tells you that results of your test for a disease are unfortunately positive. It is known that 0.1% of the population might have the disease. You know that the tests you took give correct results 99% of the time. Thus, you may be disheartened because such an accurate test has declared you of being sick from a rare disease. Intuitively, you would think that there is a 99% chance of you having this disease. But, let's look at this from a bayesian perspective.


- **Evidence** -> The test shows positive. P(E) = 0.99
- **Hypothesis** -> You have the disease **given** the evidence.
- **Prior belief before seeing the evidence** -> Probaility of your having the disease. P(D) = 0.001 (because 0.1% of the population has it and you're part of the population)


#### Ways In Which Evidence Can Be Observed (Test results Can Come Out As Positive). :

1. You have the disease **and** test comes as positive. **or** 2. You don't have the disease **and** test shows positive (incorrectly).

Mathematically -> **P(D).P(E|D) + P(-D).P(E|-D)**   {P(-D) -> Probability of not having the disease}.

We are interested in case 1.

Thus, probability of you having the disease **given** positive test results = **P(D).P(E|D) / P(D).P(E|D) + P(~D).P(E|~D)** 

**After calculations, the probability of you having the disease comes out to be a mere 9%, which again seems counter-intuitive. Even after being declared positive by a pretty accurate test you are probably healthy and test is False! **

This counter-intuitiveness stems from the fact that probability of our **hypothesis given the evidence** depends heavily on our **prior** i.e probability of our hypothesis being correct without the evidence (P(D) in above calculation). In this particular example, the probability of us having the disease without having the test results in our hand was so low (0.001) that even the new strong evidence couldn't vote in favour of our hypothesis that we have the disease.

Think of just 1000 people which also includes you. According to given data, 1 out of these 1000 is sick from the disease. Let's say that he goes for the test and is correctly identified as positive. The other 999 also go for tests. The test will falsely identify 1% of 999 healthy people, i.e 10 healthy people are shown positive. So now, there are 11 people in entire population with positive test results and you are one of them. Out of these 11 positive test results only 1 is correct. That's why having a positive result is not as bad as you might think!


### But What If You Took A Second Test And It Comes As Positive : 

Suppose just to be sure, you go through tests from a different lab and that result also comes out as positive (assuming that that lab also gives correct results 99% of the times). Now, what are the chances that you have the disease.Everything remains the same in terms of data except the **prior**. The basic definition of the prior is **"Probability that your hypothesis is true without the evidence"**. Thus, in this case the **prior** is probability of you having the disease without having seen the results from second test. Therefore, prior should be 9% or 0.09 for the second case (**Posterior** from the first test). Even though the earlier test was likely to be false, it served us by updating our **prior** from 0.001 to 0.09 by providing us with a strong evidence.


The probability of having the disease **given** that second test result is also positive = 0.99\*0.09/(0.99\*0.09 + 0.01\*0.91) = 91 %. 
**Thus, now you have 91% chances of being sick and intuitively this makes sense because the chances of two such accurate tests showing false results are pretty low.**
**You had a hypothesis that you are sick with 0.1 percent odds. Then, you collected a evidence by going throuh a test and that evidence updated your belief in your hypothesis to 9%. Subsequently you went out to collect another evidence by going through another test. That test further updated your belief in hypothesis to 91%.**
**This case shows that Bayes' theorem serves us by updating our priors with help of new evidences. The posteriors serve as priors for the next time any evidence is collected. This process iteratively helps in scientifically solidifying or falsifying our hypotheses by regularly collecting new evidences and updating our Priors subsequently.**

<script src="https://utteranc.es/client.js"
        repo="Abhimanyu08/blog"
        issue-term="pathname"
        theme="github-light"
        crossorigin="anonymous"
        async>
</script>