## Precision and Recall

Remember than when performing classification, our universe of observations is divided into four categories:

- True positives (**TP**): These are cases in which we predicted yes correctly (e.g. the email is spam, and we predicted spam).
- True negatives (**TN**): We predicted no, and the result was no (e.g. the email is not spam, and we predicted not spam).
- False positives (**FP**): We predicted yes, but the result was no (e.g. the email is not spam, and we predicted spam).
- False negatives (**FN**): We predicted no, but the result was yes (e.g. the email is spam, and we predicted not spam).

We can use these four categories to calculate the **precision** and **recall** of a classification model.

**Precision** is the number of true positives divided by the number of true positives plus the number of false positives.  

$$Precision = \frac{True Positives}{True Positives + False Positives}$$

- Think: of all the times we predicted yes, how many times were we right?

**Recall** is the number of true positives divided by the number of true positives plus the number of false negatives.

$$Recall = \frac{True Positives}{True Positives + False Negatives}$$

- Think: What proportion of actual positives did we correctly predict?

Depending on the situation, we may want to maximize precision or recall. There is often a tradeoff between the two. For example, if we want to minimize false positives, we can set a high threshold for predicting yes. This will increase precision, but decrease recall.

### Example 1 - Spam filter

Suppose we have a dataset of 1000 emails, 500 of which are spam. We have a spam filter that flags 200 emails as spam. Of the 200 emails flagged as spam, 150 are actually spam. What is the precision and recall of the spam filter?

### Solution

In this example, we have 150 true positives and 50 false positives. Therefore, the precision is:

$$Precision = \frac{150}{150 + 50} = 0.75$$

We also have 150 true positives and 350 false negatives. Therefore, the recall is:

$$Recall = \frac{150}{150 + 350} = 0.3$$



#### Question

What is the *negative predictive value* of the filter?

**Hint** The negative predictive value is the probability that you don't have the disease if you test negative, $P(\neg D| \neg T)$.




In [None]:
# your answer here

### Example 2 - Medical test

Suppose we have a medical test that is 99% accurate. That is, the probability of a false positive is 1% and the probability of a false negative is 1%. Suppose 0.5% of people have a certain disease. If a person tests positive, what is the probability that they actually have the disease?

#### Q1

Calculate the precision and recall of the medical test.



In [None]:
# your answer here

#### Q2 

What are the $Odds$ of having the disease if you test positive?

In [None]:
# your answer here

## Conditional Probability

### Introduction


Conditional probability is the probability of an event given that another event has occurred. For example, if you are interested in the probability that it is raining outside, given that it is cloudy, you are dealing with conditional probability.

### Conditional Probability Formula

The conditional probability formula is:

$$P(A|B) = \frac{P(A \cap B)}{P(B)}$$

Where $P(A|B)$ is the probability of A given B, $P(A \cap B)$ is the probability of A and B, and $P(B)$ is the probability of B.

### Bayes' Theorem

Bayes' Theorem is a formula that describes how to update the probabilities of hypotheses when given evidence. It follows simply from the axioms of conditional probability, but can be used to powerfully reason about a wide range of problems involving belief updates.

$$P(A|B) = \frac{P(B|A)P(A)}{P(B)}$$

Where $P(A|B)$ is the probability of A given B, $P(B|A)$ is the probability of B given A, $P(A)$ is the probability of A, and $P(B)$ is the probability of B.

### the law of total probability

The law of total probability is a fundamental rule relating marginal probabilities to conditional probabilities. It expresses the total probability of an outcome which can be realized via several distinct events—hence the name.

$$P(A) = \sum_{i=1}^n P(A|B_i)P(B_i)$$

Where $P(A)$ is the probability of A, $P(A|B_i)$ is the probability of A given $B_i$, and $P(B_i)$ is the probability of $B_i$.




## Bayes' Theorem

### Scenario I:
We want to create a basic spam filter that determines the probability that an email is spam given the occurrence of the word "offer."

#### Concepts:
- **Spam Probability (P(Spam)):** The overall probability that an email is spam.
- **Word Occurrence Probability:** Probability of the word "offer" appearing in spam and non-spam emails.
- **Conditional Probability:** Probability that an email is spam given the occurrence of the word "offer" ($P(Spam | "offer")$).

#### Example:
Suppose:
- 30% of emails in our dataset are spam ($P(Spam) = 0.3$).
- "offer" appears in 60% of spam emails ($P(offer | Spam) = 0.6$) and 10% of non-spam emails $(P(offer | ¬Spam) = 0.1)$.

#### Applying Bayes' Theorem:
We want to calculate $P(Spam | "offer")$, the probability that an email is spam given it contains the word "offer."

Using Bayes' theorem:


$$P(Spam | "offer") = \frac{P("offer" | Spam) \times P(Spam)}{P("offer")}$$


Substitute the given values and calculate the probabilities:

$$ P(Spam | "offer") = \frac{0.6 \times 0.3}{P("offer")} $$

To find $P("offer")$, use the Law of Total Probability:

$$ P("offer") = P("offer" | Spam) \times P(Spam) + P("offer" | ¬Spam) \times P(¬Spam) $$

$$ P("offer") = 0.6 \times 0.3 + (0.1 \times 0.7) = 0.18 + 0.07 = 0.25 $$

Finally:

$$ P(Spam | "offer") = \frac{0.6 \times 0.3}{0.25} = 0.72 $$

#### Conclusion:
Given the presence of the word "offer" in an email, the filter predicts with approximately 72% probability that the email is spam.

This simplified example demonstrates how Bayes' theorem can be used in spam filtering by considering the occurrence of a single word to estimate the probability of an email being spam. In real applications, more sophisticated models and multiple features are used to enhance accuracy.

### Questions

#### Q1

What is the precision and recall of the spam filter?

In [None]:
# your answer here

##### Q2

Of all the emails in our data set, what proportion are spam and contain the word "offer"?

**Hint**: $P(Spam \cap "offer")$


In [None]:
# your answer here

##### Q3

How many emails in our data set are not spam and don't contain the word "offer"?

**Hint**: $1 - P(Spam \cup "offer")$

In [1]:
# your answer here

## Scenario II:
Consider a particular medical test for a rare disease. The test has a sensitivity of 90% (the probability of a positive test given the patient has the disease) and a specificity of 95% (the probability of a negative test given the patient does not have the disease).

### Given Information:
- The prevalence of this disease in the general population is 1 in 1000 (P(Disease) = 0.001).

### Using Bayes' Theorem:
$$ P(Disease | Positive Test) = \frac{P(Positive Test | Disease) \times P(Disease)}{P(Positive Test)} $$

Given:
- Sensitivity ($P(Positive Test | Disease)$) = 0.90
- Prevalence ($P(Disease)$) = 0.001
- We need to calculate $P(Positive Test)$, which requires considering both true positives and false positives.

### Tasks
1. Calculate the probability of a positive test, considering both scenarios (having the disease and not having the disease) using the Law of Total Probability.
$$ P(Positive Test) = P(Positive Test | Disease) \times P(Disease) + P(Positive Test | ¬Disease) \times P(¬Disease) $$

2. Apply the given sensitivity, specificity, and prevalence to solve for $P(Positive Test | ¬Disease)$ (probability of a positive test given the absence of the disease).

3. Substitute the calculated probabilities into Bayes' theorem to find the probability that a person actually has the disease given a positive test result.


In [2]:
# your answer here


## Scenario III:
In a certain city, consider a crime that has occurred. The police have arrested a suspect based on the available evidence. The accuracy of the forensic evidence used in this city is as follows:
- The test correctly identifies the suspect in 98% of cases when the suspect is guilty (sensitivity).
- The test correctly excludes the suspect in 97% of cases when the suspect is innocent (specificity).

### Given Information:
- Prior to any evidence, the probability of a randomly selected person being guilty of this crime is 0.1% (P(Guilty) = 0.001).

### Task:
Calculate the updated probability that the suspect is guilty given that the test results are positive (P(Guilty | Positive Test)) using Bayes' theorem.

### Using Bayes' Theorem:
$$ P(Guilty | Positive Test) = \frac{P(Positive Test | Guilty) \times P(Guilty)}{P(Positive Test)} $$

Given:
- Sensitivity ($P(Positive Test | Guilty)$) = 0.98
- Prevalence ($P(Guilty)$) = 0.001
- We need to calculate $P(Positive Test)$, which requires considering both true positives and false positives.

Applying Bayes' theorem using the provided sensitivity, prevalence, and other relevant probabilities : 

    Calculate the updated probability of guilt given a positive test result.

In [3]:
# your answer here