Essential Math for Data Science
---
#### Chapter 2: Probability
___

##### Join probability

$P(A$ $AND$ $B)$  $= P(A) * P(B)$

Exercise:

You draw two cards from a standard 52-card deck without replacement. Let:

- Event A: Drawing an Ace first.
- Event B: Drawing a King second.
1. Calculate the joint probability of both events A and B occurring.
2. How does the probability change if the cards are drawn with replacement?

In [7]:
total_cards = 52
event_a = 4/52
event_b = 4/52

joint_event = event_a * event_b
print("The probability of both events happening is: %.5f"% joint_event)
print("Probability without replacement: %.5f"% ((4/52) * (4/51)))

The probability of both events happening is: 0.00592
Probability witouht replacement: 0.00603


##### Union Probability

The probability of getting vent A or B

`Sum rule of probability: ` $P(A$ $OR$ $B)$ = $P(A)+P(B)- P(A)*P(B)$

Exercise:

You draw a single card from a standard 52-card deck. Let:

- Event A: Drawing a King.
- Event B: Drawing a Heart.
1. Calculate the probability of drawing a card that is either a King or a Heart (Event A or Event B).

In [13]:
event_a = 4/52
event_b = 13/52
p_and_b = (4/52) * (13/51)
print(p_and_b -  1/52)
# since there is only one king of hearts, P(A and B) = 1/52
# and getting a king or a heart are not independent events therefore 1/52 does
# not equal event_a * event_b
probability = (event_a + event_b) - (1/52)
print("The probability of either event happening is: %.5f"% probability)

0.00037707390648566985
The probability of either event happening is: 0.30769


##### Conditional Probability and Bayes’ Theorem

___
###### Conditional Probability

Conditional probability is the probability of an event A occurring given event B has occurred.

Expressed as $P(A$ $GIVEN$ $B)$ OR $P(A|B)$


___
###### Bayes' Theorem

$P(A|B)=$ $\frac{P(B|A)P(A)}{P(B)}$

Exercise:

Suppose there is a disease that affects 1% of the population. A test for this disease is 99% accurate, meaning:

- If a person has the disease, the test will correctly identify it 99% of the time (true positive rate).
- If a person does not have the disease, the test will correctly identify this 99% of the time (true negative rate).

However, we want to find the probability that a person actually has the disease given that they tested positive.

In [14]:
# Given probabilities
P_A = 0.01  # Probability of having the disease
P_not_A = 0.99  # Probability of not having the disease
P_B_given_A = 0.99  # True positive rate
P_B_given_not_A = 0.01  # False positive rate

# Calculate P(B)
P_B = (P_B_given_A * P_A) + (P_B_given_not_A * P_not_A)

# Calculate P(A|B) using Bayes' Theorem
P_A_given_B = (P_B_given_A * P_A) / P_B

print(f"The probability that a person has the disease given that they tested positive is: {P_A_given_B:.2f}")

The probability that a person has the disease given that they tested positive is: 0.50


___
###### Joint and Union Conditional Probabilities

Using scipy for binomial distribution

In [22]:
from scipy.stats import binom

n = 10
p = 0.8

for k in range(n + 1):
    probability = binom.pmf(k, n, p)
    print("{0} - {1:.3f}".format(k, probability))


0 - 0.000
1 - 0.000
2 - 0.000
3 - 0.001
4 - 0.006
5 - 0.026
6 - 0.088
7 - 0.201
8 - 0.302
9 - 0.268
10 - 0.107
