<div class="alert alert-block alert-danger">

# Multiplication Rules and Conditional Probability (COMPLETE)

    
</div>

Made in collaboration with [Skew the Script](https://skewthescript.org/) and [CourseKata](https://coursekata.org/).

<img src="https://i.postimg.cc/ty1GkxB8/Skew-the-Script-Logo.png" title="Skew the script logo" width=200 align = left>

<img src="https://i.postimg.cc/tXcF0nzD/Course-Kata-logo.png" title="CourseKata logo" width=200 align = right>

In [None]:
# Load the CourseKata library
suppressPackageStartupMessages({
    library(coursekata)
})

### 1.0 - Revisiting the Question of Honesty in Dating Profiles

We are going to revisit the online dating profiles data that we previously explored. If you recall, the `profiles` dataset is composed of public online dating profiles from 2012. We filtered for people who identified as age 36, male, and heterosexual, living in the San Francisco area. After filtering, we collected data for each profile that reported their height and yearly earnings.

Reminder: *Why are we looking at just the age 36 profiles?*

- It’s a lower-bound for the age when men typically earn their median salaries.
- It’s an age when men who are dating online might really start feeling the pressure to find someone.

In [None]:
profiles <- read.csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vT-uzkNU7wCiNtwmk4mehDYGgz1bmYP-w6Uoo3x5N5WCOY2lceCueKW4RVHnGr0PMI9p4eCaWbCvaDI/pub?gid=950707983&single=true&output=csv")
head(profiles)

Previously, we considered the idea that some people may want to make their profiles more appealing, and if they are taking cues from pop culture (such as romantic comedies), this may suggest to them that social standards tend to favor:

- Tall
- Wealthy
- Heterosexual
- White Dudes

So, perhaps such a man thinks to himself....*"To be more appealing, maybe I should make myself appear a little taller, and a little wealthier in my online profile."*

<img src="https://i.postimg.cc/D21ZmbSz/Addition-Rule-Tall-Wealthy2.png" title="Tall Wealthy Man" width = 500/>

So we looked at: *What percent of heterosexual men are insecure enough to lie online?*

To answer this, we needed to know the probability of randomly selecting an online profile of a tall and wealthy man, so we developed a probability model.

First, we split the profiles into equal groups based on the median.

**Median Height and Median Income**

- The estimated 2012 median yearly earnings for individual men in San Francisco county: **\$59,397** 

- Male median height (U.S.): **69.2 in. (5’9”)**

*Source Note: Income median is from the 2012 American Community Survey (data.census.gov) and height median is from the Center for Disease Control and Prevention, 2018.*

In [None]:
# Create a new variable by splitting 'height' at the median height of US men
profiles$height2 <- profiles$height <= 69.2
profiles$height2 <- factor(profiles$height2, levels = c(TRUE,FALSE), labels = c("Short", "Tall"))

# Create a new variable that splits income at the median for San Francisco
profiles$income2 <- profiles$income <=59397
profiles$income2 <- factor(profiles$income2, levels = c(TRUE, FALSE), labels = c("Low-Earner", "High-Earner"))

head(profiles)

Then, we categorized the 192 men in the `profiles` dataset using the following convention:

<img src="https://i.postimg.cc/DvSm1Thp/Addition-Rule-Table-of-Categories.png" title="Table of Categories" width = 800/>

We used these conventions to then determine our expected probabilties and to compare them to the probabilities we actually find in our data.

For example, the expected probabilities for a DGP of "honest" profiles:

1. $P(T)$ = 0.50
2. $P(H)$ = 0.50
3. $P(T \cup H)$ = 0.75
4. $P(T \cap H)$ = 0.25 

Compare those to the probabilities we found in our sample :

1. $P(T)$ = 64.1%
2. $P(H)$ = 77.6%
3. $P(T \cup H)$ = 89.1%
4. $P(T \cap H)$ = 52.6%

Now, let's expand upon these ideas with our current data analysis.

**Our key analysis today will be:** 

*Can we use probability to find the secure men (the “honest” profiles)?*

### 2.0 - Conditional Probability

First, let's talk about conditional probability. A *condition* is a *given* in a problem.

We can notate "given" with a vertical line (|) like so:

P(A|B) = Probability of A occurring given that B has already occurred (or you know for sure will occur).

#### We would like to know what is the probability of finding an honest profile *given* so many were above the median.

Let's use a Venn diagram to help us think through this.

In the `profiles` data frame we find 21 short low-earners, 22 tall lowearners, 48 short high-earners, and 101 tall high-earners.

If we let:

T = event of selecting a tall man

H = event of selecting a high-income earner

we get the following representation:

<img src="https://i.postimg.cc/kqkzrjvk/Chi-Sq-Mult-Rules-T-H-Diagram.png" title="Diagram of T and H" width=400>

Let's remind ourselves what we found last time when found the probability of randomly selecting a tall man from the dataset.

P(T) = 

*Hint: add up all the possibilities in the realm of "T" and divide that by the total number of possibilities.*

In [None]:
# P(T)
(22 + 101) / 192

We get a probability of 64.1%, even though we would expect only about 50% of the cases to be above the median height!


Note that, while we usually divide by the total number of possibilites for these types of general probability problems, for a conditional probability situation we divide by something different.

For example, let's find P(T|H): The probability of selecting someone tall *given* they are a high-income earner.



<img src="https://i.postimg.cc/Q8Xy503y/Chi-Sq-Mult-Rules-P-T-given-H.png" title="Diagram of T given H" width=400>

Because of the *given*, we know they have to be a high-income earner so we are going to ignore everything else. So our new denominator is just the new set of possibilities: 101 + 48 = 149.

So that's what the *given* does; it slices the data to make it just out of that total.

Find the P(T|H).

In [None]:
# P(T|H)
101 / 149

This brings us to the formula for what we just did:

$$P(A|B) = \frac {P(A \cap B)}{P(B)}$$


Or, in our context:

$$P(T|H) = \frac {P(T \cap H)}{P(H)}$$

**2.1:** Translate the formula into words.

<div class="alert alert-block alert-warning">

**Sample Response**
 
The probability of getting someone tall, given that they are also a high-earner, equals the probability of being tall and a high-earner divided by the probability of being a high-earner (the new denominator).

</div>

Let's also establish an "intuitive rule" for conditional probability so we don't have to rely on the formula too much:

<img src="https://i.postimg.cc/2zpz2Rxx/Chi-Sq-Mult-Rules-Intuitive-Rule.png" title="Intuitive Rule of T given H" width=300>

***Intuitive Rule***

The probability of T given H is the *given* in the denominator (the condition that people are high-earners) and the people who are still in the "event" of being tall in the numerator.

So we are doing the same thing that we did before when trying to figure out $P(T \cap H)$ before, but now we are dividing by the *given* (instead of dividing by the total of all the possible events).

Instead of Venn diagrams, let's practice using this rule in two-way tables as well. 

For instance, if we wanted to use the table below to find $P(T)$ we could just add up the total number of tall events by the total number of events (123/192 = 64.1%).

<img src="https://i.postimg.cc/X35ZwW6Q/Chi-Sq-Mult-Rules-Conds-in-two-way-tables.png" title="Conditions in two way tables" width=400>

**2.2:** Extend the intuitive rule for conditional probabilities to find the probability of being tall *given* they are ***NOT*** a high-earner (note: the exponent $C$ means "not"): $P(T|H^C)$

<img src="https://i.postimg.cc/y1N3z3DP/Chi-Sq-Mult-Rules-T-given-Not-H.png" title="Table of T given Not H" width=450>

<div class="alert alert-block alert-warning">

**Sample Response**
 
$P(T|H^C)$ = 22/43 = 51%

*Note*: This would be the same if we were using the percentages instead of the raw counts (e.g., 0.115/0.224 = 51%)

</div>

### 3.0 - Independence

Let's talk about the rule of independence. We will need to understand this in order to properly apply the upcoming multiplication rule and answer our key analysis question. 

**Independence:** Two events (A & B) are independent if knowing the outcome of one event does not affect the probability that the other event will occur.

$P(A|B) = P(A)$

Knowing that $B$ (or, in our context, $H$) occurred does NOT affect the probability of $A$ (or, in our context, $T$) occurring.

**3.1:** Try it: Are the events “selecting a tall man” and “selecting a man who is a low-earner” independent?

<div class="alert alert-block alert-warning">

**Sample Response**
 
We find that the probability of being tall, without knowing about their earnings (without a *given*), is:

$P(T) = 64.1\%$

However, the probability of being tall *given* they were *not* a high-earner (i.e., when we know they are a low-earner) is:

$P(T|H^C) = 51.2\%$


Answer:
When we knew the person was a low-earner (when HC was “given”), the probability of selecting a tall person shrunk. So, the events of selecting a low earner and someone who is tall are not independent.

In other words, their earnings gave us information about their height. So, earnings and height are not independent.

</div>

***Independent vs Mutually Exclusive***

It can be easy to confuse independence with mutual exclusivity, but they are different things. 

**Mutually Exclusive:** When events have no intersection (they cannot both occur).

<img src="https://i.postimg.cc/QDmWqTSC/Chi-Sq-Mult-Rules-Indep-vs-Mutually-Exc.png" title="Mutually Exclusive Events" width=400>

So, for the situation above:

$P(A) = 0.80$

But:

$P(A|B) = 0.00$

This is because they are mutually exclusive, so if you know $B$ occurred, you also know that $A$ definitely did not occur. Thus, mutually exclusive events are the opposite of independent, in fact, they are completely *dependent*. Knowing that $B$ occurred completely affected and erased the probability of $A$.

### 4.0 - The Multiplication Rule

Now, let's talk about the multiplication rule.

The formal multiplication rule (for all events):

$P(A \cap B) = P(A) * P(B|A)$

But for independent events:

$P(A \cap B) = P(A) * P(B)$

And, like before, let's supplement this formula with an intuitive rule:

- "And" means *multiply*
- Account for dependent events

***Tree Diagrams***

Tree diagrams can help us model multiple dependent or successive events (events that depend on each other or that happen right after each other). 

Each branch represents a different event.

Let's consider this example: You are trying to figure out the probability of getting into your dream school.

Tree Diagram Problem Setup (Dream School)
1. You have a 0.65 probability of getting higher than average GPA and ACT scores.
2. If you have a higher than average GPA/ACT, you have a 0.83 chance of being admitted.
3. If you have a below average GPA/ACT score, you have a 0.39 chance of being admitted.



<img src="https://i.postimg.cc/66N4NyRb/Chi-Sq-Mult-Rules-Tree-Diagram.png" title="tree diagram" width=800>

Now, let's find some probabilities.

Let:

H = Event of getting a higher than average ACT/GPA

A = Event of being admitted to your dream school

**4.1:** Find $P(H \cap A)$ (the probability of getting higher than average scores *AND* getting admitted).

Hint: remember *and* means *multiply*.

<div class="alert alert-block alert-warning">

**Sample Response**
 
If we follow that branch of the tree, we see that $0.65 * 0.83 = 0.54$. 

$P(H \cap A) = 54\%$

</div>

**4.2:** Next, let's consider $P(A)$.

There are multiple paths to being admitted to the school. Below you see the probabilities of being admitted for both paths. You can be admitted either by path 1 (54%) *OR* by path 2 (14%). Remember from our previous lesson: "or" means *add*. So we just add the two probabilities together!

<img src="https://i.postimg.cc/y1hqjy4N/Chi-Sq-Mult-Rules-Tree-Diagram-2.png" title="tree diagram 2" width=800>

<div class="alert alert-block alert-warning">

**Sample Response**
 
We get an overall probability of being admitted of 0.54 + 0.14 = 0.68, or $P(A) = 68\%$.

</div>

Ok, so how can we use this tree diagram to help us find a conditional probability (a *given*)?

Let's find $P(H|A)$: The probability of having attained a higher than average ACT/GPA *given* that you were admitted to your dream school.

Looking at the *given* first--$P(A)$--we had found that the overall probability of getting admitted was $68\%$. If we apply our intuitive rule we will recall that "given" means **divide by** the *given*, so we know our denominator will be $68\%$. This is how we scale the probability by the chance of the *given*.

Now, among these paths we need to find the path of the event we are looking for (where there is a higher than average ACT/GPA). Among these paths that's the top path, where we multiply our two possibilities to get our numerator: $0.65 * 0.83 = 0.54$.



In [None]:
# Find P(H|A)
0.54/0.68

# OR
54/68

### 5.0 - 5 Intuitive Probability Rules

Let's recap the five intuitive rules we have developed about probability.

**5 Intuitive Probability Rules:**

1. Probabilities are between 0 – 1 (inclusive)

2. Complement rule: P(AC) = 1 – P(A)

3. “Or” means add, beware of double-counts

4. “Given” means divide by the given

5. “And” means multiply, adjust for dependence



### 6.0 - Discussion

We have found that if we randomly sample among the profiles of age 36 heterosexual men:

P(above median height): **64.1%**

P(above median income): **77.6%**

However,these probabilities are only **50%** in the population! Thus, we have some reason to doubt that all of the heights and earnings reported in these profiles are accurate...

But what if we just look at the people who reported that they make below median earnings (the people who weren't inflating their income)?

First, recall that if we just look at the people who are tall, $P(T) = 64.1\%$:

<img src="https://i.postimg.cc/NsmJmvnr/Chi-Sq-Mult-Rules-Low-Earner-Honesty.png" title="Low-Earner Honesty" width=600>

But, if we find the conditional distribution (“given” they are a low-earner) and ignore all the high-earners, we’re close to the 50-50 split we’d expect around the median height!

<img src="https://i.postimg.cc/8TTxBcdB/Chi-Sq-Mult-Rules-P-of-T-given-not-H.png" title="Probability of T given not H" width=700>

So is it possible that "low-earner" may act as an indicator that someone is honest?

**6.1: Discussion Question** 

Do you believe that filtering results for just the men who reported below-median earnings would return more honest matches? Explain your reasoning.

<div class="alert alert-block alert-warning">

**Sample Response**
 
Yes. In our last lesson, we found that if the men on dating websites lied about one variable, it was probably their income. 77.6% of the men on the site had above-median earnings (among 36-yeard-old men in San Francisco), even though we’d expect this proportion to be closer to 50% in the population. In addition,
their earning patterns by age didn’t match the patterns we see nationally. Furthermore, single men in
the U.S. tend to make less than married men, so there’s no compelling reason to believe men who
choose to make profiles on OKCupid would make more than their peers in the population.

If we filter just for the men who self-reported earnings below the median annual income, it’s likely that
they are the ones who are choosing to make honest profiles. Our data supports this theory: conditioning
for low-earners results in a more even split of short and tall men, which is exactly what we’d expect in
estimates around the median population height. So, there is evidence that filtering for the low-earnings
profiles will yield more honest dates.

</div>