We continue learning about conditional probability and its relationship to independence. As we move towards the central topic, we feel a review of mutual exclusivity would be beneficial. It is common to mistake independence with mutual exclusivity. Events are independent when the occurrence of one event does not affect the probability of the other. Mathematically, we know two events to be independent if the probability of their intersection is the product of their marginal probabilities,

![image.png](attachment:image.png)

Recall that the above is the case since the conditional probabilities are equal to the marginal probabilities. Mutual exclusivity is also a statement about the probability of an intersection. When two events are mutually exclusive, it is impossible for them to happen at the same time. That is, the probability of their intersection is 0.

![image.png](attachment:image.png)

**Task**

Consider the following events and probabilities

![image.png](attachment:image.png)


**Answer**

`statement_1 <- FALSE
statement_2 <- TRUE
statement_3 <- TRUE`

We also need to review the addition rule;

![image.png](attachment:image.png)

When we first dealt with this example, it was only in the context of conditional probabilities, such as **P(T|D)**. We are equipped to move back from conditional to marginal probability. What if we just want to find 
**P
(
T
)**
, the probability that a person selected at random will get a positive result? In the context of this problem, there are two possible scenarios where someone can a positive result:

1. A person gets a positive result and is infected with HIV.
2. A person gets a positive result, but is not infected with HIV.

When we think about positive test results, our first instinct is often to think that it means the disease is present. Conditional probability reminds us that this isn't the case and that false positives are possible. Ignoring these false positives can warp our understanding of how well a test performs.

Both of the described scenarios are intersections of events: 
**T
∩
D**
 and 
$T
∩
D^
C$
. These two intersections represent the only two scenarios where a person can test positive, so we are able to describe the marginal probability of testing positive as the sum of these two probabilities.

$P
(
T
)
=
(
T
∩
D
)
∪
(
T
∩
D^
C
)$

To build out an understanding of this formula, we'll develop the Venn Diagram:

![image.png](attachment:image.png)

First, notice that the entire sample space is comprised of either 
**D**
 or 
$D^C$. When a group of events fully make up the sample space, we call them **exhaustive**. Assuming no intermediate level of disease, people can be described as either having HIV or not having it. Since someone cannot both have and not have HIV, we know that **D**
 and 
$D^C$ are mutually independent. Since 
**D**
 and 
$D^C$
 have these two qualities, exhaustive and mutual exclusivity, we may call them a partition of the sample space.

We can now also add the event 
**T** on the diagram above, which will show us visually why 
$T
=
(
D
∩
T
)
∪
(
D^
C
∩
T
)$
.

![image.png](attachment:image.png)

![image.png](attachment:image.png)

Now that we can represent 
**P
(
T
)**
 in terms of intersectional probabilities, we can use the multiplication rule;
 
 ![image.png](attachment:image.png)

We see 
**P
(
T
)**
 is only 1.19%. This is mostly because having HIV is rare in the first place. Ideally, we would want to see that 
**P
(
T
)**
 matches the prevalence of the disease 
**P
(
D
)**
; that is, only those with the disease will test positive.

**Task**

We can find the word "secret" in many spam emails. However, some emails are not spam even though they contain the word "secret." Let's define the following events and probabilities:

* **S**: the event that an email is spam
* **X**: the event that an email contains the word secret
* The probability of getting a spam email is 23.88%. That is 
**P
(
S
)
=
0.2388**
.
* The probability of an email containing the word "secret" given that the email is spam is 48.02%, 
**P
(
X
|
S
)
=
0.4802**
.
* The probability of an email containing the word "secret" given that the email is not spam is 12.84%, 
$P
(
X
|
S^C
)
=
0.1284$.


$Calculate:$

* $P
(
S^C
)$
. 
* **P
(
S
∩
X
)**
. 
* $P
(
S^C
∩
X
)$.
* **P
(
X
)**
. 


**Answer**

`p_spam <- 0.2388
p_secret_given_spam <- 0.4802
p_secret_given_non_spam <- 0.1284
p_non_spam <- 1 - p_spam
p_spam_and_secret <- p_spam * p_secret_given_spam
p_non_spam_and_secret <- p_non_spam * p_secret_given_non_spam
p_secret <- p_spam_and_secret + p_non_spam_and_secret`

Imagine that instead of 
**T**
, 
**D**
, and 
$D^C$
, we replace them with more general events 
**A**
, 
**B**
, and 
$B^C$
:

![image.png](attachment:image.png)

![image.png](attachment:image.png)

**Task**

An airline transports passengers using two types of planes: a Boeing 737 and an Airbus A320. Assume the following:

* The Boeing operates 73% of the flights. Out of these flights, 3% arrive at the destination with a delay.
* The Airbus operates the remaining 27% of flights. Out of these flights, 8% arrive with a delay.


Use the information above to calculate the following probabilities:

* Assign the probability of flying with a Boeing to `p_boeing` (to better understand what this probability means, imagine a passenger having bought a ticket with this airline — what's the probability that this passenger will be assigned to fly to her destination with a Boeing?).
* Assign the probability of flying with an Airbus to `p_airbus`.
* Assign the probability of arriving at the destination with a delay given that the passenger flies with a Boeing to `p_delay_given_boeing`.
* Assign the probability of arriving at the destination with a delay given that the passenger flies with an Airbus to `p_delay_given_airbus`.
* The probability that a passenger will arrive at her destination with a delay. Assign to `p_delay`.

**Answer**

`p_boeing <- 0.73
p_airbus <- 0.27
p_delay_given_boeing <- 0.03
p_delay_given_airbus <- 0.08
p_delay <- p_boeing*p_delay_given_boeing + p_airbus*p_delay_given_airbus`

We applied the following formula to calculate the probability of having a delay when flying with a particular airline:

![image.png](attachment:image.png)

We'll consider how to extend the formula to incorporate more than just two events to calculate a marginal probability. Let's consider another airline which has three types of planes, instead of two: a Boeing 737, an Airbus A320, and an ERJ 145.

* The Boeing operates 58% of the flights. Out of these flights, 4% arrive at the destination with a delay.
* The Airbus operates 31% of the flights. Out of these flights, 7% arrive with a delay.
* The ERJ operates the remaining 11% of the flights. Out of these flights, 2% arrive with a delay.

![image.png](attachment:image.png)

Just as how we did with the positive test result in the HIV example, we'll add the Delay event on the above Venn diagram:

![image.png](attachment:image.png)

The Delay event can also be reimagined as multiple intersections. Since these intersections are mutually exclusive, we can calculate 
**P
(
D
e
l
a
y
)**
 as:
 
 ![image.png](attachment:image.png)

In this example, the event that we are conditioning on is the choice of airplane. To extend the formula to three events for the condition, we needed to make sure that we had all of the events necessary to create a partition of the sample space. We knew that the airline only had three airplanes, so these planes became our partition. Any other event that we try to consider with this partition can be rethought of as the union of mutually exclusive intersections.

![image.png](attachment:image.png)

**Task**

An airline transports passengers using three types of planes: a Boeing 737, an Airbus A320, and an ERJ 145.

* The Boeing operates 62% of the flights. Out of these flights, 6% arrive at the destination with a delay.
* The Airbus operates 35% of the flights. Out of these flights, 9% arrive with a delay.
* The ERJ operates the remaining 3% of the flights. Out of these flights, 1% arrive with a delay.

Calculate the probability of delay.

**Answer**

`p_boeing <- 0.62
p_airbus <- 0.35
p_erj <- 0.03
p_delay_boeing <- 0.06 
p_delay_airbus <- 0.09
p_delay_erj <- 0.01
p_delay <- p_boeing*p_delay_boeing + p_airbus*p_delay_airbus + p_erj*p_delay_erj`

![image.png](attachment:image.png)

In order for this formula to calculate the correct probability, we needed to make sure that the set of events 
**{
B
1
,
B
2
,
B
3
}**
 formed a partition for their sample space. In the HIV example, having the disease or not having it made up all of the different possibilities. In the airline example, the three planes formed all of the different possibilities. Using this same line of reasoning, we could extend the formula to any arbitary number of events for the condition.

Let's say that we have a sample space 
**Ω**
 that can be divided up into a partition of 
**n**
 mutually exclusive and exhaustive events. We represent the number of events as the variable 
**n**
 since we don't know it ahead of time.

**Ω
=
{
B
1
,
B
2
,
.
.
.
,
B
n
}**

Using the same reasoning as we used above, the probability of **A** in this sample space made of **n** events is:

![image.png](attachment:image.png)

The above formula has a special name: The **Law of Total Probability**. When we have long sums, it's often more convenient to represent it as the Greek letter 
**Σ**
, pronounced **"sigma"**. The law of total probability is usually written using this summation sign 
**Σ**
:

![image.png](attachment:image.png)

we commented on the fact that conditional probabilities typically aren't the same when we switch the event and the condition: 
**P
(
A
|
B
)**
 vs 
**P
(
B
|
A
)**
. Our review of the Law of Total Probability gives us the ability to better flesh out the relationship between these two conditional probabilites. We saw that the probability of an intersection of two events can be written in one of two ways:

**P
(
A
∩
B
)
=
P
(
A
|
B
)
×
P
(
B
)**

**P
(
A
∩
B
)
=
P
(
B
|
A
)
×
P
(
A
)**

This works out because the intersection will not change no matter how we define the event and the condition in a conditional probability. Since these two are equal to each other, we can derive a relationship between the two flipped conditional probabilities.

![image.png](attachment:image.png)

This formula suggests that the two conditional probabilities are related by a ratio of probabilities: 
![image.png](attachment:image.png)This ratio doesn't have any immediate intuition, so we'll motivate it by our airline example again to develop our understanding.

We used the Law of Total Probability to figure out the probability of getting a delay, using information on the different airplanes and the chances of having a delay given that a particular airplane was used. We know each of the conditional probabilities of delay given a particular airline, but let's say that we want to flip this probability. If we actually observe a delay in a plane, what is the probability that the plane itself is a Boeing? In more familiar phrasing, what is the probability that the plane we see is a Boeing, given that it arrived with a delay?

The following was the data we had on an airline with just two airplanes: the Boeing and the Airbus.

* The Boeing operates 73% of the flights. Out of these flights, 3% arrive at the destination with a delay.
* The Airbus operates the remaining 27% of the flights. Out of these flights, 8% arrive with a delay.

![image.png](attachment:image.png)

Using our knowledge of conditional probability and the Law of Total Probability, we were able to "flip" the conditional probability and calculate it. The airline example was an application of **Bayes' Theorem** to solve a probability problem. Bayes' Theorem enables us to "flip" the conditional probability. The airline example represents a case where two events, an event and its complement, make up all of the possibilities. The denominator in Bayes' Theorem is a marginal probability, which we can calculate using the Law of Total Probability.

Mathematically, Bayes' Theorem can be written as:

![image.png](attachment:image.png)

![image.png](attachment:image.png)

The result of flipping the event and condition is just a slight change in the notation. Now let's use Bayes' Theorem to find 
**P(Airbus|Delay)**.

**Task**

An airline transports passengers using two types of planes: a Boeing 737 and an Airbus A320.

* The Boeing operates 73% of the flights. Out of these flights, 3% arrive at the destination with a delay.
* The Airbus operates the remaining 27% of the flights. Out of these flights, 8% arrive with a delay.

Use Bayes' theorem to find **P(Airbus|Delay)**.

**Answer**

`p_boeing <- 0.73
p_airbus <- 0.27
p_delay_given_boeing <- 0.03
p_delay_given_airbus <- 0.08
p_delay <- p_boeing*p_delay_given_boeing + p_airbus*p_delay_given_airbus
p_airbus_delay <- (p_airbus * p_delay_given_airbus) / p_delay`

Near the beginning, we considered an example around HIV testing. In reality, a patient will not know beforehand if they have HIV or not. That is the point of a diagnostic test. They are hoping that the test result will correctly tell them about their HIV status, but we must account for the fact that the test is not perfect. Let's say a patient tests positive. We are faced with a flipping of the event and the condition. We know 
**P
(
T
|
D
)**
 and 
$P
(
T
|
D^C
)$
, but the actual probability of interest to the patient is 
**P
(
D
|
T
)**
, the probability of having HIV given that they had a positive test result.

Since we are looking to flip the event and condition, we can find the answer by applying Bayes' Theorem. Let's begin by expanding 
**P
(
D
|
T
)**
 using the conditional probability formula:
 
![image.png](attachment:image.png)

Notice that if a person tests positively, the probability of being infected with HIV increases quite a bit from the given prevalence of HIV, 0.14%. This change in probability happens because the patient has more information on hand. Using a test, a patient has a better idea about how likely they are to be infected with HIV. In its mathematical form, Bayes' Theorem may seem like another probability concept, but all of us use it in our daily lives, whether we know it or not.

We all have beliefs about the world, some of which are correct and incorrect. If we think that R is a relatively useless skill, we can think of this as a low probability. The probability that R is useful is low. However, let's also say that we explored some job postings and saw that R was mentioned in almost all of them. After seeing this, we'd be more inclined to think that R was more useful than we originally thought and believe that its usefulness is higher. That is, after seeing some evidence about R's usefulness, we updated our belief about it. This is the essence of Bayes' Theorem! We all have prior beliefs, but we are also open to changing our beliefs in light of evidence.

In the above example, we've considered the probability of being infected with HIV in two scenarios:

1. Before doing any test: 
**P(
D
)**
2. After testing positive: 
**P
(
D
|
T
)** 

The probability of being infected with HIV before doing any test is called **the prior probability** ("prior" means "before"). Without any other information, our best guess is that the probability of having HIV is just how common it is in the population. The probability of being infected with HIV after testing positive is called **the posterior probability** ("posterior" means "after"). So, in this case, the prior probability is 0.14%, and the posterior probability is 11.74%. The patient uses the positive test to update their beliefs about their HIV status.

This was just a light introduction to Bayes' Theorem. An entire field of research is dedicated to this theorem, but it's good to know what all this research is based off of. 

**Task**

Many spam emails contain the word "secret". However, some emails are not spam even though they contain the word "secret". Let's say we have the following events and probabilities:

* **S**: the event that an email is spam
* **X**: the event that an email has the word "secret" in it
* The probability of getting a spam email is 23.88%. That is 
**P
(
S
)
=
0.2388**
.
* The probability of an email containing the word "secret" given that the email is spam is 48.02%, 
**P
(
X
|
S
)
=
0.4802**
.
* The probability of an email containing the word "secret" given that the email is not spam is 12.84%, 
$P
(
X
|
S^C
)
=
0.1284$.


Using this information, calculate:

* Use Bayes' theorem to find 
**P
(
S
|
X
)**. 
* Assign the prior probability of getting a spam email to prior.
* Assign the posterior probability of getting a spam email (after we see the email contains the word "secret") to posterior.
* Calculate the ratio between the posterior and the prior probability.

**Answer**

`p_spam <- 0.2388
p_secret_given_spam <- 0.4802
p_secret_given_non_spam <- 0.1284`

`p_non_spam <- 1 - p_spam
p_secret <- p_spam*p_secret_given_spam + p_non_spam*p_secret_given_non_spam
p_spam_given_secret <- (p_spam*p_secret_given_spam) / p_secret`


`prior <- p_spam
posterior <- p_spam_given_secret`


`ratio <- posterior/prior`

In this file, we managed to cover a few core probability concepts:

* We took an in-depth look at how independence is different from exclusivity.
* We learned to calculate marginal probabilities using the Law of Total Probability.
* We learned to flip conditional probabilities using Bayes' theorem.