# Basics of Probability

## Assignments
Now it’s time to compute some probabilities. Keep track of your work in a Google document or markdown file that you can share with your mentor.

### Drill set 1
**1.** Calculate the probability of flipping a balanced coin four times and getting each pattern: HTTH, HHHH and TTHH.

**Solution:**

Each coin flip is an independent event. The probability of getting heads or tails is always 0.5 or 50%. Therefore, we can compute the probability for each of these outcomes by multiply the probability for each individual outcome.

In [1]:
p_h = 0.5
p_t = 0.5

In [2]:
p_htth = p_h*p_t*p_t*p_h
p_hhhh = p_h*p_h*p_h*p_h
p_tthh = p_t*p_t*p_h*p_h

print('The probability of getting HTTH is {}.'.format(p_htth))
print('The probability of getting HHHH is {}.'.format(p_hhhh))
print('The probability of getting TTHH is {}.'.format(p_tthh))

The probability of getting HTTH is 0.0625.
The probability of getting HHHH is 0.0625.
The probability of getting TTHH is 0.0625.


**2.** If a list of people has 24 women and 21 men, then the probability of choosing a man from the list is 21/45. What is the probability of not choosing a man?

**Solution:**

The probability of not choosing man is **24/45**.

We can compute this by subtracting the probability of choosing a man, 21/45, from 1. 1 is equivalent to 45/45. 45/45 - 21/45 = 24/45.


**3.** The probability that Bernice will travel by plane sometime in the next year is 10%. The probability of a plane crash at any time is .005%. What is the probability that Bernice will be in a plane crash sometime in the next year?

**Solution:**

Here we are interested in the probility that Bernice travels and their is a crash. Therefore, we will multiply each probability together to find the chance that Bernice's plane will crash.

In [3]:
p_travel = 0.10
p_crash = 0.00005
p_travel_and_crash = p_travel*p_crash

print('The probability that Bernice will be in a plane crash is {}.'.format(p_travel_and_crash))

The probability that Bernice will be in a plane crash is 5e-06.


**Correction**

The answer to this question is 0.0005. I am not sure why this is the probability.

**4.** A data scientist wants to study the behavior of users on the company website. Each time a user clicks on a link on the website, there is a 5% chance that the user will be asked to complete a short survey about their behavior on the website. The data scientist uses the survey data to conclude that, on average, users spend 15 minutes surfing the company website before moving on to other things. What is wrong with this conclusion?

**Solution:**

This conclusion is only based on the results of the 5% of people who actually complete the survey. The other 95% of users who did not complete the survey could exhibit completely different user behavior.

**Correction**

The longer users surf the internet, the more links they will click on, increasing the likelihood that they will be asked to fill out a survey.

### Drill set 2

Now it's time to use Bayes' rule to compute some conditional probabilities. First look over the numbers and estimate each of the four probabilities, using your intuition. Then, calculate the probabilities using Bayes' rule. Keep track of your work in a Google document or markdown file that you can share with your mentor.

A diagnostic test has a 98% probability of giving a positive result when applied to a person suffering from Thripshaw's Disease, and 10% probability of giving a (false) positive when applied to a non-sufferer. It is estimated that 0.5% of the population are sufferers. Suppose that the test is now administered to a person whose disease status is unknown. Calculate the probability that the test will:

1. Be positive
2. Correctly diagnose a sufferer of Thripshaw's
3. Correctly identify a non-sufferer of Thripshaw's
4. Misclassify the person

#### Intuition
First, used intuition to calculate the answers.

**1.** Be positive

Intuition: 0.98 * 0.005 = 0.0049 or 0.49%.

**2.** Correctly diagnose a sufferer of Thripshaw's

Intuition: 98%.

**3.** Correctly identify a non-sufferer of Thripshaw's

Intuition: 90%.

**4.** Misclassify the person.

Intuition: 10%.

#### Bayes Formula
Now, use Bayes formula to compute the answers.

First, list given probabilities from assignment and probabilities that can be derived from these quantities.

In [4]:
# A = have disease
# B = test postive
p_A = 0.005
p_not_A = 1 - p_A
p_B_given_A = 0.98
p_not_B_given_A = 1 - 0.98
p_B_given_not_A = 0.10
p_not_B_given_not_A = 1 - p_B_given_not_A

print('P(A) = {}  [given]'.format(p_A))
print('P(~A) = {}  [derived]'.format(p_not_A))
print('P(B|A) = {} [given]'.format(p_B_given_A))
print('P(~B|A) = {}  [derived]'.format(p_not_B_given_A))
print('P(B|~A) = {}  [given]'.format(p_B_given_not_A))
print('P(~B|~A) = {}  [derived]'.format(p_not_B_given_not_A))

P(A) = 0.005  [given]
P(~A) = 0.995  [derived]
P(B|A) = 0.98 [given]
P(~B|A) = 0.020000000000000018  [derived]
P(B|~A) = 0.1  [given]
P(~B|~A) = 0.9  [derived]


**1.** Be positive

In [5]:
p_B = p_B_given_A*p_A + p_B_given_not_A*p_not_A
print('P(B) = {}'.format(p_B))

P(B) = 0.1044


**2.** Correctly diagnose a sufferer of Thripshaw's

In [6]:
p_A_given_B = (p_A*p_B_given_A)/p_B
print('P(A|B) = {}'.format(p_A_given_B))

P(A|B) = 0.046934865900383135


**Correction:**

In [7]:
p_A_given_B = 0.98
print('P(A|B) = {}'.format(p_A_given_B))

P(A|B) = 0.98


**3.** Correctly identify a non-sufferer of Thripshaw's

In [8]:
print('P(~B|~A) = {}'.format(p_not_B_given_not_A))

P(~B|~A) = 0.9


**4.** Misclassify the person.

In [9]:
p_misclassified = p_B_given_not_A + p_not_B_given_A
print('P(misclassified) = {}'.format(p_misclassified))

P(misclassified) = 0.12000000000000002


**Correction:**

In [10]:
p_misclassified = 1 - (p_A_given_B*p_A + p_not_B_given_not_A*p_not_A)
print('P(misclassified) = {}'.format(p_misclassified))

P(misclassified) = 0.09960000000000002
