Sometimes we are interested in **conditional probabilities**, probabilities that must consider if some condition occurred. We use conditional probability often in our daily lives, and we might not even realize it. We may be getting dressed for school or work and notice it is awfully cloudy outside. We may wonder, "It looks overcast today, I wonder if it'll rain later?"

This question is different from just asking, What is the chance that it will rain?" Cloudiness is associated with an increased chance of rain, so it is more likely to rain when the sky is cloudy. 

The presence of conditions slightly alters the probability calculations, so we will learn how to incorporate these changes into our current knowledge of probability.

Suppose the die is rolled and before we're able to observe the actual result, we're given some new information: the die showed an odd number. With this new piece of information, should we reconsider our original calculation of **P(5)**? Will the probability of getting a 5 still be **1/6**, or does the new information allow us to recalculate?

When we don't know if the result is odd or not, the possible outcomes of the experiment are still 
**{1,2,3,4,5,6}**. After we find out the number is odd, the possible outcomes narrows down to **{1,3,5}** since the other outcomes don't match the condition. In other words, the new information serves to **reduce** the size of the original sample space from 
**{1,2,3,4,5,6}** to **{1,3,5}**.

**Task**

A fair six-sided die is rolled. Before we see the result, we are told that the number we got is less than 5. With this in mind, calculate:

1. The probability of getting a 3. 
2. The probability of getting a 6. 
3. The probability of getting an odd number. 
4. The probability of getting an even number.

**Answer**

`p_3 <- 1/4
p_6 <- 0/4
p_odd <- 2/4
p_even <- 2/4`

**Task**

A student is randomly selected. We are given some information that they were born during the winter season. Assume the winter months are **December**, **January**, and **February**. For the purposes of this exercise, we'll also assume that each month is equally likely to have been the month the student was born in. Calculate:

1. The conditional probability that they were born in December.
2. The conditional probability that they were born in a 31-day month.
3. The conditional probability that they were born during summer.
4. The conditional probability that they were born in a month which ends in letter **"r"**.

**Answer**

`p_december <- 1/3
p_31 <- 2/3
p_summer <- 0/3
p_ends_r <- 1/3`

There is a special name that we give to the size or number of elements in a set, which we call the **cardinality** of the set.

We abbreviate the cardinality of a sample space as is abbreviated as  **card(Ω)**. Referring back to the original sample space of a fair die roll, its cardinality was 6.

**card(Ω) = 6**

The **cardinality** of an set or event corresponds to the total number of possible outcomes for that event

The number of outcomes satisfying the event is the cardinality of the intersection of **A and B**. Our formula becomes:

![image.png](attachment:image.png)

We now have a formula for **conditional probability**.

**Task**

Two fair six-sided die are rolled at the same time, and the two results are added together. The diagram below shows all the possible results that we can get from adding the two numbers together.

![image.png](attachment:image.png)

**Task**

Find **P(A|B)**, where **A** is the event that the sum is an **even number**, and **B** is the event that the sum is **less than eight**.

1. Find the cardinality of **B**. Note that we'll have to treat identical sums differently if they come from different die numbers.
2. Find the card of the intersection of **A and B**.
3. Calculate **P(A|B)**.

**Answer**

`card_b <- 21
card_a_and_b <- 9
p_a_given_b <- card_a_and_b / card_b`

We'll now use **conditional probability** formula in the context of a common example seen in the real world. A team of biologists wants to measure the efficiency of a new HIV test they developed. HIV is a virus that causes AIDS, a disease which affects the immune system. They used the new test on 53 people, and the results are summarized in the table below:

![image.png](attachment:image.png)

Reading the table above, we can glean some important details about the test:

* 23 people are infected with HIV
* 30 people are not infected with HIV (aka the complement to being infected with HIV)
* 45 people tested positive for HIV
* 8 people tested negative for HIV
* Given that someone was infected, 21 tested positive for HIV
* Given that someone was not infected, 24 tested positive for HIV

This represents a classic problem with diagnostic tests and diseases. Doctors want to know the true disease state of a person, but are only able to know about it through a test. The problem is that the tests are not perfect. They may falsely declare someone with the disease as not having it (**false negative**) or it might conclude that someone without the disease actually has it (**false postive**). Both of these outcomes are horrible, so we want tests that minimize these erroneous results.

The team now intends to use these results to calculate probabilities for new patients and figure out whether the test is reliable enough to use in hospitals. They want to know:

* What is the probability of testing positive, given that a patient is infected with HIV?
* What is the probability of testing negative, given that a patient is not infected with HIV?

We'll denote two events to represent the two events we care about. **T** will be the event that the test is `positive`, and **D** will be the event that the patient has `HIV`. For example, **P(T|D)**
is the probability of testing positive, given that the patient is infected with HIV. Using our probability formula, we can fill in this information:

![image.png](attachment:image.png)

![image.png](attachment:image.png)

The probability of testing positive given that the patient is infected with HIV is 91.30%. At face value, this result suggests that the new test is fairly good at detecting the HIV virus when the it is actually present. However, we must consider this percentage in terms of actual numbers of people. If we used this test 10,000 patients infected with HIV, only 9,131 patients will get a correct diagnosis, while the other 869 will not. Given the severity of HIV, the team should probably conclude that the test needs more refinement with respect to detecting the virus among the infected.

**Task**

calculate $P(T^C|D^C)$, the probability of testing negative given that a patient is not infected with HIV.

**Answer**

`p_negative_given_non_hiv <- 6/30
print(p_negative_given_non_hiv)`

"The probability of testing negative given that a patient does not
have HIV is 20%. This means that for every 10,000 healthy
patients tested, only about 2000 will get a correct diagnosis, while the
other 8000 will not. This rate is unacceptable, and it would be dangerous
to have it used in hospitals."

Sometimes, there may be cases where we do not know the cardinality of an event. This could happen in cases where the event is extremely rare or complex. In these cases, we would not be able to use our current conditional probability formula, so it would be good to have a formula that does not rely on the cardinalities. If we use the probabilities of both the numerator and denominator events, we will find that the result comes out exactly the same. 

Using the table above, we see that;

![image.png](attachment:image.png)

Using these values in place of the cardinalities, we would get:

![image.png](attachment:image.png)

Using the cardinalities to calculate the conditional probability was convenient when they were known. In general, we will know more about the individual probabilities of events rather than their cardinalities, so using probabilities is more useful. This allows us to define a formula for conditional probability purely in terms of probabilities instead of set cardinalities. Thus, for any two events **A** and **B**, **P(A|B)** is

![image.png](attachment:image.png)

This formula is useful when we only know probabilities. For instance, let's say a different test is used to diagnose a patient. The conditional probability **P(T|D)** is more useful from a doctor perspective; a doctor will know exactly how well the test can diagnose HIV status. However, from a patient perspective, this information is not as useful. Often a patient will want to know something slightly different: **P(D|T)**, the probability of actually having HIV, given that the test was positive. A patient does not know if they are diseased or not, so they want to know their chances of having HIV just from looking at the test result. It is critical to know that, in general, **P(A|B)** is not the same as **P(B|A)**. That is to say, if we exchange the event of interest and the condition that we are using, their probabilities will not be the same.

Let's consider a new test. A patient tests positive for HIV, and they want to find P(D|T). Using our conditional probability formula (using probabilities instead of cardinalities), we get:

![image.png](attachment:image.png)

This result can offer a bit of relief to a worried patient. The probability that the test is positive and the subject has HIV is small. Even though the test was positive, the probability that they actually have the disease is still relatively small.

**Task**

A company offering a browser-based task manager tool intends to do some targeted advertising based on people's browsers. The data they collected about their users is described in the table below:

![image.png](attachment:image.png)

Find:

* **P(Premium|Chrome)** — the probability that a randomly chosen user has a premium subscription, provided their browser is Chrome. 
* **P(Basic|Safari)** — the probability that a randomly chosen user has a basic subscription, provided their browser is Safari. 
* **P(Free|Firefox)** — the probability that a randomly chosen user has a free subscription, provided their browser is Firefox. 
* Between a Chrome user and a Safari user, who is more likely to have a premium subscription? If we think a Chrome user is the answer, then assign the string "Chrome" to a variable named `more_likely_premium`, otherwise assign 'Safari'. To solve this exercise, we'll also need to calculate **P(Premium|Safari)**.

**Answer**

`p_premium_given_chrome <- 158/2762
p_basic_given_safari <- 274/1288
p_free_given_firefox <- 2103/2285
more_likely_premium <- 'Safari'` # because P(Premium | Safari) > P(Premium | Chrome)