# Bayes Theorem

## Independence and Exclusivity

There is an important distinction between independence and exclusivity. We learned in the previous lesson that two events A and B are independent if the occurrence of one doesn't change the probability of the other. In mathematical terms, we've seen A and B are independent if any of the conditions below are true:

\begin{equation}
P(A) = P(A|B) \\ 
\end{equation}

\begin{equation}
P(B) = P(B|A) \\ 
\end{equation}

\begin{equation}
P(A \cap B) = P(A) \cdot P(B) 
\end{equation}

We learned that two events — A and B — are mutually exclusive if they cannot occur both at the same time. If one event happens, the other cannot possibly happen anymore, and vice-versa. Examples of mutually exclusive events include:

+ Getting a 5 (event A) and getting a 3 (event B) when we roll a regular six-sided die — it's impossible to get both a 5 and a 3.
+ A coin lands on heads (event A) and tails (event B) — it's impossible to flip a coin and see it landing on both heads and tails.




If two events A and B are mutually exclusive, then it's impossible that they both occur, which means (A ∩ B) is an impossible event (and the probability of impossible events is always 0):

\begin{equation}
P(A \cap B) = 0 
\end{equation}

Both independence and exclusivity describe the relationship between two or more events, and we see that they have different mathematical meanings:

\begin{aligned}
\text{Independence} &\implies P(A \cap B) = P(A) \cdot P(B) \\
\text{Dependence} &\implies P(A \cap B) = P(A) \cdot P(B|A) \\
\text{Exclusivity} &\implies P(A \cap B) = 0 
\end{aligned}

Let's take a quick look at a few examples. Say we roll a fair six-sided die twice and consider these four events:

+ Event A: We get a 4 on the first roll.
+ Event B: We get a 2 on the second roll.
+ Event C: We get an even number on the first roll.
+ Event D: We get a 5 on the first roll

If event A happens, then the probability of event B stays the same, since the result of the first roll doesn't influence the result of the second one in any way — this means A and B are __independent__. Also, we can get a 4 on the first roll (event A) and a 2 on the second roll (event B), which means A and B are __not mutually exclusive__.

Now let's look at the relationship between events A and C. If C happens, then the probability of A changes, and vice-versa. This means A and C are __dependent__. Also, if the outcome was 4, then we'd get a 4 (event A) and an even number (event C) at the same time, which means A and C are __not mutually exclusive__.

However, if we look at events A and D, we see they cannot possibly happen together — we cannot get both a 4 and a 5 on the first roll. This means event A and D are __mutually exclusive__. Since A and D cannot possibly happen together, it becomes meaningless to talk about independence since the concept of independence makes sense only as long as both events can happen.

### Task

For the exercises below, consider the following probabilities:

+ The probability of being infected with HIV is 0.00014. That is P(HIV)=0.00014.
+ The probability of being infected with HIV given a positive result from an HIV test is 0.03. That is P(HIV|T+)=0.03.

Assess with True or False the following statements:

+ Events HIV and T+ are independent. If you think this statement is true, then assign the boolean True to statement_1, otherwise assign False.
+  Events HIV and HIV<sup>c</sup> are mutually exclusive. If you think this statement is true, then assign the boolean True to statement_2, otherwise assign False.
+ Events HIV<sup>c</sup> and T+ are dependent. If you think this statement is true, then assign the boolean True to statement_3, otherwise assign False.


In [None]:
statement_1 = False
statement_2 = True
statement_3 = True

 Before we move on, recall that in the previous course we learned about the addition rule:

\begin{equation}
P(A \cup B) = P(A) + P(B) - P(A \cap B) 
\end{equation}

If events A and B are mutually exclusive, then P(A∩B)=0. Therefore, the addition rule for mutually exclusive events reduces to:

\begin{aligned}
P(A \cup B) &= P(A) + P(B) - 0 \\
P(A \cup B) &= P(A) + P(B)
\end{aligned}

With this in mind, let's consider the probabilities associated with testing for an HIV test:

+ The probability of getting a positive test result given that a patient is not infected with HIV is 1.05%. That is P(T<sup>+</sup> | HIV<sup>c</sup>) = 0.0105.
+ The probability of getting a positive test result given that a patient is infected with HIV is 99.78%. That is P(T<sup>+</sup> | HIV) = 0.9978.
+  The probability of being infected with HIV is 0.14%. That is P(HIV) = 0.0014.
+ The probability of not being infected with HIV is 99.86%. That is P(HIV<sup>c</sup>) = 0.9986.

Now what if we just want to find P(T+), the probability that a person selected at random will get a positive result? There are two possible scenarios when someone gets a positive result:


1. The person is infected with HIV and gets a positive result.
2. The person is not infected with HIV and gets a positive result.

In the first scenario, note that two events happen: HIV and T+. In set notation, we write (HIV ∩ T<sup>+</sup>) if both HIV and T<sup>+</sup>) occur.

In the second scenario, two events happen: HIV<sup>c</sup> and T<sup>+</sup>). In set notation, we write (HIV<sup>c</sup> ∩ T<sup>+</sup>) if both HIV<sup>c</sup> and T+ happen.

Since there are only two possible scenarios, we can understand the event T<sup>+</sup> as the union of the events (HIV ∩ T)<sup>+</sup> and (HIV<sup>c</sup> ∩ T<sup>+</sup>):

\begin{equation}
T^+ = (HIV \cap T^+) \cup (HIV^C \cap T^+)
\end{equation}

We can visualize in a venn diagram

![visual](img/cpm3_viz2.png)

The events (HIV ∩ T<sup>+</sup>) and (HIV<sup>c</sup> ∩ T<sup>+</sup>) are mutually exclusive (they cannot happen both at the same time), because a person tested positive cannot both have and not have HIV. This means that we can calculate the probability of their union using the addition rule we mentioned in the beginning of this screen: 

P(A U B) = P(A) + P(B)

\begin{equation}
P(\overbrace{T^+}^{A \cup B}) = P((\overbrace{HIV \cap T^+}^{A}) \cup (\overbrace{HIV^C \cap T^+}^{B}))
\end{equation}

\begin{equation}
P(T^+) = P(HIV \cap T^+) + P(HIV^C \cap T^+)
\end{equation}

Using the multiplication rule on P(HIV ∩ T+) and P(HIVC ∩ T+), the last equation above becomes:

\begin{equation}
P(T^+) = P(HIV) \cdot P(T^+ | HIV) + P(HIV^C) \cdot P(T^+ | HIV^C)
\end{equation}

All the probabilities we need were listed earlier, which means we can find P(T+):

\begin{aligned}
P(T^+) &= 0.0014 \cdot 0.9978 + 0.9986 \cdot 0.0105 \\
&= 0.0119
\end{aligned}

We see P(T+) — the probability of testing positive — is only 1.19%. This is mostly because the probability of having HIV is very low in the first place.

### Task

We can find the word "secret" in many spam emails. However, some emails are not spam even though they contain the word "secret." Let's say we know the following probabilities:

+ The probability of getting a spam email is 23.88%. That is P(Spam)=0.2388.
+ The probability of an email containing the word "secret" given that the email is spam is 48.02%. That is P("secret"|Spam)=0.4802.
+ The probability of an email containing the word "secret" given that the email is not spam is 12.84%. That is P("secret"|Spam<sup>c</sup>)=0.1284.


Calculate:

+ P(Spam<sup>c</sup>). Assign the result to p_non_spam.
+ P(Spam ∩ "secret"). Assign the result to p_spam_and_secret.
+ P(Spam<sup>c</sup> ∩ "secret"). Assign the result to p_non_spam_and_secret.
+ P("secret"). Assign the result to p_secret.


In [None]:
p_spam = 0.2388
p_secret_given_spam = 0.4802
p_secret_given_non_spam = 0.1284
p_non_spam = 1 - p_spam
p_spam_and_secret = p_spam * p_secret_given_spam
p_non_spam_and_secret = p_non_spam * p_secret_given_non_spam
p_secret = p_spam_and_secret + p_non_spam_and_secret

## A General Formula

Now we need to develop a general formula that reflects the way we calculated P(T+) on the previous screen:

\begin{equation}
P(T^+) = P(HIV \cap T^+) + P(HIV^C \cap T^+)
\end{equation}

Imagine that instead of T<sup>+</sup>, HIV, and HIV<sup>c</sup>, we have A, B, and B<sup>c</sup>:

![comp](img/cpm3_viz3.1.png)

With this in mind, we can now develop a general formula for P(A):

\begin{equation}
P(A) = P(B \cap A) + P(B^C \cap A)
\end{equation}

Using the multiplication rule on P(B ∩ A) and P(BC ∩ A), the above formula becomes:

\begin{equation}
P(A) = P(B) \cdot P(A|B) + P(B^C) \cdot P(A|B^C)
\end{equation}

### Task

An airline transports passengers using two types of planes: a Boeing 737 and an Airbus A320.

+ The Boeing operates 73% of the flights. Out of these flights, 3% arrive at the destination with a delay.
+ The Airbus operates the remaining 27% of the flights. Out of these flights, 8% arrive with a delay.

Convert the percentages above to probabilities:

1. Assign the probability of flying with a Boeing to p_boeing (to better understand what this probability means, imagine a passenger having bought a ticket with this airline — what's the probability that this passenger will be assigned to fly to her destination with a Boeing?).
2. Assign the probability of flying with an Airbus to p_airbus.
3. Assign the probability of arriving at the destination with a delay given that the passenger flies with a Boeing to p_delay_given_boeing.
4. Assign the probability of arriving at the destination with a delay given that the passenger flies with an Airbus to p_delay_given_airbus.

Calculate:

The probability that a passenger will arrive at her destination with a delay. Assign your answer to p_delay. 


In [None]:
p_boeing = 0.73
p_airbus = 0.27
p_delay_given_boeing = 3/100
p_delay_given_airbus = 8/100

p_delay = p_boeing * p_delay_given_boeing + p_airbus * p_delay_given_airbus

## Formula for Three Events

In the previous task, we used this formula to calculate the probability of having a delay when flying with a particular airline:

\begin{equation}
P(A) = P(B) \cdot P(A|B) + P(B^C) \cdot P(A|B^C)
\end{equation}

Recall that the airline transports passengers using two types of planes: a Boeing 737 and an Airbus A320. This allowed us to model P(Delay) as:

\begin{equation}
\overbrace{P(Delay)}^{P(A)} = \overbrace{P(Boeing) \cdot P(Delay|Boeing)}^{P(B) \cdot P(A|B)} + \overbrace{P(Airbus) \cdot P(Delay|Airbus)}^{P(B^C) \cdot P(A|B^C)}
\end{equation}

However, let's consider another airline which has three types of planes: a Boeing 737, an Airbus A320, and an ERJ 145.

+ The Boeing operates 58% of the flights. Out of these flights, 4% arrive at the destination with a delay.
+ The Airbus operates 31% of the flights. Out of these flights, 7% arrive with a delay.
+ The ERJ operates the remaining 11% of the flights. Out of these flights, 2% arrive with a delay.

A passenger buying a ticket with this airline will be assigned to only one of the three types of airplanes. This means that the sample space is made up of three events that are all mutually exclusive and exhaustive. On a Venn diagram, we have:

![three](img/cpm3_viz4.png)

Now let's add the event Delay on the above Venn diagram:

![three delay](img/cpm3_viz5.png)


Judging by the diagram, we can see that P(Delay) is:

\begin{equation}
P(Delay) = P(Boeing \cap Delay) + P(Airbus \cap Delay) + P(ERJ \cap Delay)
\end{equation}

Using the multiplication rule, the equation above becomes:

\begin{aligned}
P(Delay) &= P(Boeing) \cdot P(Delay|Boeing) + P(Airbus) \cdot P(Delay|Airbus) + P(ERJ) \cdot P(Delay|ERJ) \\
&= 0.58 \cdot 0.04 + 0.31 \cdot 0.07 + 0.11 \cdot 0.02 = 0.05
\end{aligned}

To develop a more general formula, imagine that instead of the events Delay, Boeing, Airbus, and ERJ, we have events A, B1, B2, and B3:

\begin{equation}
\overbrace{P(A)}^{P(Delay)} = \overbrace{P(B_1)}^{P(Boeing)} \cdot P(A|B_1) + \overbrace{P(B_2)}^{P(Airbus)} \cdot P(A|B_2) + \overbrace{P(B_3)}^{P(ERJ)} \cdot P(A|B_3)
\end{equation}

### Task 

An airline transports passengers using three types of planes: a Boeing 737, an Airbus A320, and an ERJ 145.

+ The Boeing operates 62% of the flights. Out of these flights, 6% arrive at the destination with a delay.
+ The Airbus operates 35% of the flights. Out of these flights, 9% arrive with a delay.
+ The ERJ operates the remaining 3% of the flights. Out of these flights, 1% arrive with a delay.

Calculate the probability of delay and assign your result to p_delay.

In [None]:
p_boeing = 0.62
p_airbus = 0.35
p_erj = 0.03
p_delay_boeing = 0.06 
p_delay_airbus = 0.09
p_delay_erj = 0.01

p_delay = p_boeing * p_delay_boeing + p_airbus * p_delay_airbus + p_erj + p_delay_erj