# Statistics & Probability



---

### **Basic Descriptive Statistics**
1. Write a function to compute the mean of a dataset.  
2. Compute the median of a dataset and verify its robustness to outliers.  
3. Calculate the mode of a dataset.  
4. Implement a function to compute the variance and standard deviation of a dataset.  
5. Write a function to calculate the interquartile range (IQR) of a dataset.  
6. Detect outliers in a dataset using the IQR method.  
7. Compute the correlation coefficient between two variables.  
8. Visualize a dataset using a histogram and calculate skewness.  
9. Write a function to compute the covariance matrix of a multivariate dataset.  
10. Standardize a dataset to have a mean of 0 and a variance of 1.

---

### **Probability Basics**
11. Simulate rolling a fair six-sided die and compute probabilities of outcomes.  
12. Implement a function to calculate the probability of an event using relative frequency.  
13. Simulate flipping a biased coin and estimate its probability of heads.  
14. Write a function to calculate the complement of an event.  
15. Use Python to compute conditional probability $ P(A|B) $ given sample data.  
16. Verify the Law of Total Probability using simulated data.  
17. Simulate and calculate the probability of drawing specific cards from a deck.  
18. Compute joint probabilities for two events using Python.  
19. Verify Bayes’ Theorem with a real-world example.  
20. Visualize a probability distribution (e.g., uniform, normal).

---

### **Discrete and Continuous Random Variables**
21. Generate a binomial random variable and compute its mean and variance.  
22. Simulate a Poisson process and calculate probabilities of specific events.  
23. Implement and visualize the probability mass function (PMF) of a discrete random variable.  
24. Generate a normal random variable and compute probabilities for specific intervals.  
25. Compute the cumulative distribution function (CDF) of a normal distribution.  
26. Use the Central Limit Theorem to approximate the sum of random variables.  
27. Implement and visualize the probability density function (PDF) of a normal distribution.  
28. Simulate and compute probabilities for an exponential distribution.  
29. Fit a normal distribution to a given dataset and estimate its parameters.  
30. Compare the behavior of discrete vs. continuous random variables.

---

### **Sampling and Estimation**
31. Implement simple random sampling on a dataset.  
32. Write a function to compute the sample mean and sample variance.  
33. Simulate and analyze sampling distributions of the mean.  
34. Perform stratified sampling on a dataset.  
35. Estimate population parameters using maximum likelihood estimation (MLE).  
36. Simulate bootstrap sampling to compute confidence intervals.  
37. Compute the bias and variance of an estimator using simulation.  
38. Use Python to verify the Law of Large Numbers.  
39. Simulate and compute the impact of sample size on estimation accuracy.  
40. Write a function to calculate standard error for a sample mean.

---

### **Hypothesis Testing**
41. Perform a one-sample $ t $-test to check if a sample mean differs from a known value.  
42. Perform a two-sample $ t $-test to compare the means of two datasets.  
43. Implement and interpret a chi-square test for independence.  
44. Conduct an ANOVA test to compare means of multiple groups.  
45. Perform a permutation test for a given hypothesis.  
46. Implement and interpret a Mann-Whitney U test for non-parametric data.  
47. Simulate Type I and Type II errors for hypothesis tests.  
48. Write a function to compute p-values from test statistics.  
49. Visualize the rejection region of a hypothesis test.  
50. Perform a hypothesis test to determine if a dataset follows a normal distribution.

---


In [95]:
import statistics as stats
import numpy as np
import random
from collections import Counter

## -- Basic Descriptive --

In [None]:
# Compute the mean of dataset

data = np.array([12,4,5,6,7,8,2323])
print(np.mean(data))
print(stats.mean(data))


In [None]:
# Calculate the median of data set and verify robustness of outliers

# The median is considered robust to outliers because it is based on the middle value(s) of the dataset, not the actual values of all data points. This means that extreme values (outliers) have little to no effect on the median.

data = [1, 2, 3, 4, 100]

print(np.median(data))
print(stats.median(data))

In [None]:
# Mode of dataset

data = np.array([12,4,54,5,6,67,8,341])

print(stats.mode(data))

In [None]:
# compute the variance and standard deviation

data = [2, 4, 6, 8, 10]

# Population variance and standard deviation
populationVariance = stats.pvariance(data)
populationStandardDeviation = stats.pstdev(data)

# Sample variance and standard deviation
sampleVariance = stats.variance(data)
sampleStandardDeviation = stats.stdev(data)

print("Population Variance:", populationVariance)
print("Population Standard Deviation:", populationStandardDeviation)
print("Sample Variance:", sampleVariance)
print("Sample Standard Deviation:", sampleStandardDeviation)

The **Interquartile Range (IQR)** is a measure of statistical dispersion that represents the range between the first quartile (Q1) and the third quartile (Q3). It is used to describe the middle 50% of the data and is robust to outliers.

---

### Steps to Calculate the Inter-quartile Range (IQR):
1. **Sort the dataset** in ascending order.
2. **Find the median** of the dataset. This is the **second quartile (Q2)**.
3. **Find the first quartile (Q1)**:
   - This is the median of the lower half of the data (values below Q2).
4. **Find the third quartile (Q3)**:
   - This is the median of the upper half of the data (values above Q2).
5. **Calculate the IQR**:
   $
   \text{IQR} = Q3 - Q1
   $

---

### Example:
Dataset: $\{3, 7, 8, 5, 12, 14, 21, 13, 18$$

1. **Sort the data**:
   $
   \{3, 5, 7, 8, 12, 13, 14, 18, 21$
   $

2. **Find the median (Q2)**:
   - There are 9 values, so the median is the 5th value:  
   $
   Q2 = 12
   $

3. **Find the first quartile (Q1)**:
   - Lower half of the data: $\{3, 5, 7, 8$$  
   - Median of the lower half:  
     $
     Q1 = \frac{5 + 7}{2} = 6
     $

4. **Find the third quartile (Q3)**:
   - Upper half of the data: $\{13, 14, 18, 21$$  
   - Median of the upper half:  
     $
     Q3 = \frac{14 + 18}{2} = 16
     $

5. **Calculate the IQR**:
   $
   \text{IQR} = Q3 - Q1 = 16 - 6 = 10
   $

So, the inter-quartile range is **10**.



In [None]:
# Calculate the inter-quartile range of data set

data = [3, 7, 8, 5, 12, 14, 21, 13, 18]

# Calculate Q1 (25th percentile) and Q3 (75th percentile)
Q1 = np.percentile(data, 25)
Q3 = np.percentile(data, 75)

# Calculate IQR
IQR = Q3 - Q1

print("Q1:", Q1)
print("Q3:", Q3)
print("IQR:", IQR)

In [None]:
# detect Outliers using IQR
# The IQR measures the spread of the middle 50% of the data, and outliers are defined as data points that fall significantly below or above the "fences" calculated using the IQR.



data = [1, 3, 5, 7, 9, 11, 13, 15, 17, 50]

# Calculate Q1, Q3, and IQR
Q1 = np.percentile(data, 25)
Q3 = np.percentile(data, 75)
IQR = Q3 - Q1

# Define fences
lowerFence = Q1 - 1.5 * IQR
upperFence = Q3 + 1.5 * IQR

# Detect outliers
outliers = [x for x in data if x < lowerFence or x > upperFence]

print("Q1:", Q1)
print("Q3:", Q3)
print("IQR:", IQR)
print("Lower Fence:", lowerFence)
print("Upper Fence:", upperFence)
print("Outliers:", outliers)

# The IQR method is robust for detecting outliers in skewed datasets.

The **correlation coefficient** measures the strength and direction of the linear relationship between two variables. The most common measure is the **Pearson correlation coefficient**, which ranges from **-1 to 1**:
- **1**: Perfect positive linear relationship,
- **-1**: Perfect negative linear relationship,
- **0**: No linear relationship.

---

### Formula for Pearson Correlation Coefficient:
The Pearson correlation coefficient ($ r $) between two variables $ X $ and $ Y $ is calculated as:

$
r = \frac{\sum{(x_i - \bar{x})(y_i - \bar{y})}}{\sqrt{\sum{(x_i - \bar{x})^2} \sum{(y_i - \bar{y})^2}}}
$

Where:
- $ x_i $ and $ y_i $ are individual data points,
- $ \bar{x} $ and $ \bar{y} $ are the means of $ X $ and $ Y $, respectively.

---

### Steps to Compute the Correlation Coefficient:
1. **Calculate the mean** of $ X $ ($ \bar{x} $) and the mean of $ Y $ ($ \bar{y} $).
2. **Compute the deviations** from the mean for each data point:
   - $ (x_i - \bar{x}) $ and $ (y_i - \bar{y}) $.
3. **Multiply the deviations** for each pair of data points:
   - $ (x_i - \bar{x})(y_i - \bar{y}) $.
4. **Sum the products** of deviations:
   - $ \sum{(x_i - \bar{x})(y_i - \bar{y})} $.
5. **Square the deviations** for $ X $ and $ Y $, then sum them:
   - $ \sum{(x_i - \bar{x})^2} $ and $ \sum{(y_i - \bar{y})^2} $.
6. **Divide the sum of products** by the square root of the product of the sums of squared deviations:
   - $ r = \frac{\sum{(x_i - \bar{x})(y_i - \bar{y})}}{\sqrt{\sum{(x_i - \bar{x})^2} \sum{(y_i - \bar{y})^2}}} $.

---

### Example:
Let’s calculate the correlation coefficient between $ X $ and $ Y $:

| $ X $ | $ Y $ |
|--------|--------|
| 1      | 2      |
| 2      | 4      |
| 3      | 5      |
| 4      | 4      |
| 5      | 5      |

1. **Calculate the means**:
   - $ \bar{x} = \frac{1 + 2 + 3 + 4 + 5}{5} = 3 $,
   - $ \bar{y} = \frac{2 + 4 + 5 + 4 + 5}{5} = 4 $.

2. **Compute deviations and their products**:

   | $ X $ | $ Y $ | $ x_i - \bar{x} $ | $ y_i - \bar{y} $ | $ (x_i - \bar{x})(y_i - \bar{y}) $ | $ (x_i - \bar{x})^2 $ | $ (y_i - \bar{y})^2 $ |
   |--------|--------|---------------------|---------------------|--------------------------------------|-------------------------|-------------------------|
   | 1      | 2      | -2                  | -2                  | 4                                    | 4                       | 4                       |
   | 2      | 4      | -1                  | 0                   | 0                                    | 1                       | 0                       |
   | 3      | 5      | 0                   | 1                   | 0                                    | 0                       | 1                       |
   | 4      | 4      | 1                   | 0                   | 0                                    | 1                       | 0                       |
   | 5      | 5      | 2                   | 1                   | 2                                    | 4                       | 1                       |

3. **Sum the columns**:
   - $ \sum{(x_i - \bar{x})(y_i - \bar{y})} = 4 + 0 + 0 + 0 + 2 = 6 $,
   - $ \sum{(x_i - \bar{x})^2} = 4 + 1 + 0 + 1 + 4 = 10 $,
   - $ \sum{(y_i - \bar{y})^2} = 4 + 0 + 1 + 0 + 1 = 6 $.

4. **Calculate the correlation coefficient**:
   $
   r = \frac{6}{\sqrt{10 \times 6}} = \frac{6}{\sqrt{60}} = \frac{6}{7.746} \approx 0.775
   $

So, the correlation coefficient is approximately **0.775**, indicating a **strong positive linear relationship**.

---


### Key Takeaways:
- The correlation coefficient ($ r $) measures the **linear relationship** between two variables.
- It ranges from **-1 to 1**, where:
  - $ r = 1 $: Perfect positive correlation,
  - $ r = -1 $: Perfect negative correlation,
  - $ r = 0 $: No correlation.
- Use Python libraries like `numpy`, `pandas`, or `scipy` for quick calculations.


In [None]:
# Correlation coefficient between two variables 
X = [1, 2, 3, 4, 5]
Y = [2, 4, 5, 4, 5]

r = np.corrcoef(X, Y)[0, 1]
print("Correlation Coefficient (r):", r)

The **covariance matrix** is a square matrix that summarizes the variances and covariances of a **multivariate dataset**. It is a key concept in statistics and machine learning, especially in dimensionality reduction techniques like Principal Component Analysis (PCA).

---

### What is a Covariance Matrix?
For a dataset with $ n $ variables (features), the covariance matrix is an $ n \times n $ matrix where:
- The **diagonal elements** represent the **variances** of each variable.
- The **off-diagonal elements** represent the **covariances** between pairs of variables.

---

### Formula for Covariance Matrix:
Given a dataset with $ p $ variables and $ n $ observations, the covariance matrix $ \Sigma $ is calculated as:

$
\Sigma = \frac{1}{n-1} \cdot (X - \bar{X})^T (X - \bar{X})
$

Where:
- $ X $ is the $ n \times p $ data matrix (each row is an observation, each column is a variable),
- $ \bar{X} $ is the $ 1 \times p $ vector of means for each variable,
- $ (X - \bar{X}) $ is the mean-centered data matrix,
- $ ^T $ denotes the transpose of a matrix.

---

### Steps to Compute the Covariance Matrix:
1. **Center the data** by subtracting the mean of each variable.
2. **Compute the product** of the centered data matrix and its transpose.
3. **Divide by $ n-1 $** (for sample covariance) or $ n $ (for population covariance).

---

### Example:
Consider a dataset with 3 variables ($ X_1, X_2, X_3 $) and 4 observations:

| $ X_1 $ | $ X_2 $ | $ X_3 $ |
|----------|----------|----------|
| 1        | 2        | 3        |
| 4        | 5        | 6        |
| 7        | 8        | 9        |
| 10       | 11       | 12       |

1. **Compute the mean of each variable**:
   - $ \bar{X_1} = \frac{1 + 4 + 7 + 10}{4} = 5.5 $,
   - $ \bar{X_2} = \frac{2 + 5 + 8 + 11}{4} = 6.5 $,
   - $ \bar{X_3} = \frac{3 + 6 + 9 + 12}{4} = 7.5 $.

2. **Center the data** by subtracting the means:

   | $ X_1 - \bar{X_1} $ | $ X_2 - \bar{X_2} $ | $ X_3 - \bar{X_3} $ |
   |-----------------------|-----------------------|-----------------------|
   | -4.5                  | -4.5                  | -4.5                  |
   | -1.5                  | -1.5                  | -1.5                  |
   | 1.5                   | 1.5                   | 1.5                   |
   | 4.5                   | 4.5                   | 4.5                   |

3. **Compute the product of the centered data matrix and its transpose**:
   - Let $ A = X - \bar{X} $. Then:
     $
     A^T A = \begin{bmatrix}
     -4.5 & -1.5 & 1.5 & 4.5 \\
     -4.5 & -1.5 & 1.5 & 4.5 \\
     -4.5 & -1.5 & 1.5 & 4.5
     \end{bmatrix}
     \begin{bmatrix}
     -4.5 & -4.5 & -4.5 \\
     -1.5 & -1.5 & -1.5 \\
     1.5 & 1.5 & 1.5 \\
     4.5 & 4.5 & 4.5
     \end{bmatrix}
     $
   - The result is a $ 3 \times 3 $ matrix.

4. **Divide by $ n-1 $** (for sample covariance):
   $
   \Sigma = \frac{1}{4-1} \cdot A^T A
   $

---

### Key Takeaways:
- The **covariance matrix** summarizes the relationships between variables in a multivariate dataset.
- Diagonal elements represent **variances**, and off-diagonal elements represent **covariances**.
- Use Python libraries like `numpy` or `pandas` for efficient computation.


In [None]:
# Compute the covariance matrix of a multivariate dataset

# Define the dataset
X = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
    [10, 11, 12]
])

# Compute the covariance matrix
covarianceMatrix = np.cov(X, rowvar=False)  # rowvar=False means columns are variables
print("Covariance Matrix:")
print(covarianceMatrix)

Standardizing a dataset to have a **mean of 0** and a **variance of 1** is a common preprocessing step in data analysis and machine learning. This process is also known as **z-score normalization**. Standardization ensures that all features are on the same scale, which is particularly important for algorithms that are sensitive to the magnitude of features (e.g., PCA, k-means, SVM).

---

### Steps to Standardize a Dataset:
1. **Compute the mean** of each feature (column) in the dataset.
2. **Compute the standard deviation** of each feature.
3. **Standardize each value** using the formula:
   $
   z = \frac{x - \mu}{\sigma}
   $
   Where:
   - $ x $ is the original value,
   - $ \mu $ is the mean of the feature,
   - $ \sigma $ is the standard deviation of the feature.

---

### Example:
Consider the following dataset with 2 features ($ X_1 $ and $ X_2 $):

| $ X_1 $ | $ X_2 $ |
|----------|----------|
| 1        | 2        |
| 2        | 3        |
| 3        | 4        |
| 4        | 5        |

1. **Compute the mean** of each feature:
   - $ \mu_{X_1} = \frac{1 + 2 + 3 + 4}{4} = 2.5 $,
   - $ \mu_{X_2} = \frac{2 + 3 + 4 + 5}{4} = 3.5 $.

2. **Compute the standard deviation** of each feature:
   - For $ X_1 $:
     $
     \sigma_{X_1} = \sqrt{\frac{(1-2.5)^2 + (2-2.5)^2 + (3-2.5)^2 + (4-2.5)^2}{4}} = \sqrt{\frac{2.25 + 0.25 + 0.25 + 2.25}{4}} = \sqrt{1.25} \approx 1.118
     $
   - For $ X_2 $:
     $
     \sigma_{X_2} = \sqrt{\frac{(2-3.5)^2 + (3-3.5)^2 + (4-3.5)^2 + (5-3.5)^2}{4}} = \sqrt{\frac{2.25 + 0.25 + 0.25 + 2.25}{4}} = \sqrt{1.25} \approx 1.118
     $

3. **Standardize each value**:
   - For $ X_1 $:
     $
     z_{X_1} = \frac{x - \mu_{X_1}}{\sigma_{X_1}}
     $
   - For $ X_2 $:
     $
     z_{X_2} = \frac{x - \mu_{X_2}}{\sigma_{X_2}}
     $

   Applying this to each value:

   | $ X_1 $ | $ X_2 $ | $ z_{X_1} $ | $ z_{X_2} $ |
   |----------|----------|---------------|---------------|
   | 1        | 2        | $\frac{1-2.5}{1.118} \approx -1.34$ | $\frac{2-3.5}{1.118} \approx -1.34$ |
   | 2        | 3        | $\frac{2-2.5}{1.118} \approx -0.45$ | $\frac{3-3.5}{1.118} \approx -0.45$ |
   | 3        | 4        | $\frac{3-2.5}{1.118} \approx 0.45$ | $\frac{4-3.5}{1.118} \approx 0.45$ |
   | 4        | 5        | $\frac{4-2.5}{1.118} \approx 1.34$ | $\frac{5-3.5}{1.118} \approx 1.34$ |

   The standardized dataset is:

   | $ z_{X_1} $ | $ z_{X_2} $ |
   |---------------|---------------|
   | -1.34         | -1.34         |
   | -0.45         | -0.45         |
   | 0.45          | 0.45          |
   | 1.34          | 1.34          |

---


### Key Takeaways:
- Standardization transforms the data to have a **mean of 0** and a **standard deviation of 1**.
- It is essential for algorithms that are sensitive to feature scales.
- Use Python libraries like `scikit-learn` or `numpy` for efficient standardization.


In [None]:
# standardize a dataset to have a mean of 0 and variance of 1

# Define the dataset
X = np.array([
    [1, 2],
    [2, 3],
    [3, 4],
    [4, 5]
])

# Compute mean and standard deviation
mean = np.mean(X, axis=0)
std = np.std(X, axis=0)

# Standardize the data
xStandardized = (X - mean) / std

print("Standardized Dataset:")
print(xStandardized)

## -- Probability Basics --

In [None]:
# Simulate a rolling a fair six sided die and compute probabilities of  of outcomes


def rollDice()-> int:
    return random.randint(1, 6)
def simulateRolls(rolls=10)-> list:
    return [rollDice() for _ in range(rolls)]
total: list = simulateRolls()
print("TotalRolls:", total)


def largeRollSimulation(rolls=1000):
    rolling: list = simulateRolls(rolls=rolls)
    outcome: dict = Counter(rolling)
    # Compute probabilities
    probabilities: dict = {outcome: count / rolls for outcome, count in outcome.items()}
    print("Outcome Counts:", outcome)
    print("Probabilities:", probabilities)

largeRollSimulation()




### Formula for Relative Frequency:
The probability of an event $ A $ using relative frequency is given by:

$
P(A) = \frac{\text{Number of times event } A \text{ occurs}}{\text{Total number of trials}}
$

---

### Steps to Calculate Relative Frequency:
1. **Perform the experiment** or simulation multiple times (e.g., roll a die, flip a coin).
2. **Count the number of times** the event of interest occurs.
3. **Divide by the total number of trials** to get the relative frequency.


---

### Key Takeaways:
- Relative frequency is an empirical way to estimate probabilities based on observed data.
- It is particularly useful when theoretical probabilities are unknown or difficult to compute.
- Use Python to simulate experiments and calculate relative frequencies efficiently.


In [None]:

# Simulate rolling a die
def rollDie():
    return random.randint(1, 6)

# Simulate multiple die rolls
def simulateRolls(num_rolls):
    return [rollDie() for _ in range(num_rolls)]

# Number of trials
numRolls = 1000
rolls = simulateRolls(numRolls)

# Count the number of times a 4 appears
eventCount = rolls.count(3)

# Calculate relative frequency
relativeFrequency = eventCount / numRolls

print("Number of times 4 appears:", eventCount)
print("Relative Frequency of rolling a 4:", relativeFrequency)

In [None]:
# Simulate flipping a biased coin and estimate it's probability of heads

# Define the bias (probability of heads)
p = 0.7  # Example: 70% chance of heads

# Simulate a single biased coin flip
def biasedCoinFlip(p):
    return "Heads" if random.random() < p else "Tails"

# Simulate multiple biased coin flips
def simulateFlips(nFlips, p):
    return [biasedCoinFlip(p) for _ in range(nFlips)]

def probabilityHead(nFlips:int=1000):

# Number of flips
    flips:list[str] = simulateFlips(nFlips, p)

# Count the number of heads
    headsCount:int = flips.count("Heads")

# Calculate the relative frequency of heads
    relativeFrequencyHeads:float = headsCount / nFlips

    print("Number of heads:", headsCount)
    print("Relative Frequency of heads:", relativeFrequencyHeads)

probabilityHead()

The **complement of an event** is a fundamental concept in probability. The complement of an event $ A $, denoted as $ A^c $ or $ \overline{A} $, represents all outcomes that are **not** in $ A $. The probability of the complement of an event is given by:

$
P(A^c) = 1 - P(A)
$

---

### Key Properties of Complements:
1. **Mutually Exclusive**:
   - An event $ A $ and its complement $ A^c $ cannot occur simultaneously.
   - $ A \cap A^c = \emptyset $ (they are disjoint).

2. **Exhaustive**:
   - Either $ A $ or $ A^c $ must occur.
   - $ A \cup A^c = S $, where $ S $ is the sample space.

3. **Probability**:
   - The sum of the probabilities of an event and its complement is always 1:
     $
     P(A) + P(A^c) = 1
     $

---

### Steps to Calculate the Complement of an Event:
1. **Identify the event $ A $** and its probability $ P(A) $.
2. **Use the complement formula**:
   $
   P(A^c) = 1 - P(A)
   $

---

### Example 1: Simple Event
Suppose you roll a fair six-sided die. Let $ A $ be the event of rolling a **6**. The probability of $ A $ is:
$
P(A) = \frac{1}{6}
$

The complement $ A^c $ is the event of **not rolling a 6**. The probability of $ A^c $ is:
$
P(A^c) = 1 - P(A) = 1 - \frac{1}{6} = \frac{5}{6}
$

---

### Example 2: Compound Event
Suppose you draw a card from a standard deck of 52 cards. Let $ A $ be the event of drawing a **heart**. The probability of $ A $ is:
$
P(A) = \frac{13}{52} = \frac{1}{4}
$

The complement $ A^c $ is the event of **not drawing a heart**. The probability of $ A^c $ is:
$
P(A^c) = 1 - P(A) = 1 - \frac{1}{4} = \frac{3}{4}
$

---

### Example 3: Real-World Scenario
Suppose the probability of rain tomorrow is $ 0.3 $. Let $ A $ be the event of **rain tomorrow**. Then:
$
P(A) = 0.3
$

The complement $ A^c $ is the event of **no rain tomorrow**. The probability of $ A^c $ is:
$
P(A^c) = 1 - P(A) = 1 - 0.3 = 0.7
$
---

### Key Takeaways:
- The complement of an event $ A $ represents all outcomes **not** in $ A $.
- The probability of the complement is $ P(A^c) = 1 - P(A) $.
- Complements are useful for simplifying probability calculations, especially when it’s easier to calculate $ P(A^c) $ than $ P(A) $.


In [None]:
# Calculate the complement of an event

def complementEvent():
    # P = probability of rain
    P:float = 0.6
    pComplement = 1 - P
    print("Probability of event A:", P)
    print("Probability of complement of event A:", pComplement)

complementEvent()


**Conditional probability** is the probability of an event $ A $ occurring given that another event $ B $ has already occurred. It is denoted as $ P(A|B) $ and is calculated using the formula:

$
P(A|B) = \frac{P(A \cap B)}{P(B)}
$

Where:
- $ P(A \cap B) $ is the probability of both $ A $ and $ B $ occurring,
- $ P(B) $ is the probability of event $ B $.

If you have **sample data**, you can estimate $ P(A|B) $ using relative frequencies.

---

### Steps to Compute Conditional Probability from Sample Data:
1. **Identify the relevant events**:
   - Let $ A $ and $ B $ be two events of interest.

2. **Count the occurrences**:
   - $ N $: Total number of observations in the sample.
   - $ N_B $: Number of observations where event $ B $ occurs.
   - $ N_{A \cap B} $: Number of observations where both $ A $ and $ B $ occur.

3. **Compute the probabilities**:
   - $ P(B) = \frac{N_B}{N} $,
   - $ P(A \cap B) = \frac{N_{A \cap B}}{N} $.

4. **Calculate the conditional probability**:
   $
   P(A|B) = \frac{P(A \cap B)}{P(B)} = \frac{N_{A \cap B}}{N_B}
   $

---

### Example:
Suppose you have the following sample data for a survey of 100 people:

| Event          | Number of People |
|----------------|------------------|
| $ B $: Smoker | 30               |
| $ A \cap B $: Smoker and has lung disease | 10 |

Here:
- $ N = 100 $ (total number of people),
- $ N_B = 30 $ (number of smokers),
- $ N_{A \cap B} = 10 $ (number of smokers with lung disease).

1. **Compute $ P(B) $**:
   $
   P(B) = \frac{N_B}{N} = \frac{30}{100} = 0.3
   $

2. **Compute $ P(A \cap B) $**:
   $
   P(A \cap B) = \frac{N_{A \cap B}}{N} = \frac{10}{100} = 0.1
   $

3. **Compute $ P(A|B) $**:
   $
   P(A|B) = \frac{P(A \cap B)}{P(B)} = \frac{0.1}{0.3} \approx 0.333
   $

So, the probability of having lung disease given that a person is a smoker is approximately **0.333** (or 33.3%).

---

### Key Takeaways:
- Conditional probability $ P(A|B) $ measures the likelihood of event $ A $ occurring given that event $ B $ has occurred.
- It is calculated as $ P(A|B) = \frac{P(A \cap B)}{P(B)} $.
- When working with sample data, you can estimate $ P(A|B) $ using relative frequencies.


In [None]:
# Compute Conditional Probability P(A|B) given sample data

def conditionalProbability(observations=1000, observationB= 20, observationAB= 10):
    # P(B)
    PB = observationB / observations
    # A# Compute P(A ∩ B)
    PAB = observationAB / observations
    # Compute P(A|B)
    PGivenAB = PAB / PB
    print("P(A|B):", PGivenAB)
    return PGivenAB

conditionalProbability()


The **Law of Total Probability** is a fundamental rule in probability theory that allows you to calculate the total probability of an event by considering all possible scenarios or partitions of the sample space. It states:

If $ B_1, B_2, \dots, B_n $ are mutually exclusive and exhaustive events (i.e., they partition the sample space), then for any event $ A $:

$
P(A) = \sum_{i=1}^n P(A|B_i) \cdot P(B_i)
$

To **verify the Law of Total Probability using simulated data**, you can:
1. Simulate data that follows a known distribution or process.
2. Partition the data into mutually exclusive and exhaustive events $ B_1, B_2, \dots, B_n $.
3. Compute $ P(A|B_i) $ and $ P(B_i) $ for each partition.
4. Verify that $ P(A) = \sum_{i=1}^n P(A|B_i) \cdot P(B_i) $.

---

### Example: Verifying the Law of Total Probability
Let’s say we have a biased coin and two dice:
- The coin has a probability $ P(H) = 0.6 $ of landing heads and $ P(T) = 0.4 $ of landing tails.
- If the coin lands heads, we roll a fair 6-sided die.
- If the coin lands tails, we roll a biased 6-sided die where the probability of rolling a 6 is $ 0.5 $, and the other numbers are equally likely.

We want to verify the Law of Total Probability for the event $ A $: **Rolling a 6**.

---

### Theoretical Calculation
Using the Law of Total Probability:
$
P(A) = P(A|H) \cdot P(H) + P(A|T) \cdot P(T)
$

- $ P(H) = 0.6 $, $ P(T) = 0.4 $.
- If the coin lands heads, we roll a fair die: $ P(A|H) = \frac{1}{6} $.
- If the coin lands tails, we roll a biased die: $ P(A|T) = 0.5 $.

So:
$
P(A) = \left(\frac{1}{6}\right) \cdot 0.6 + 0.5 \cdot 0.4 = 0.1 + 0.2 = 0.3
$

The theoretical probability of rolling a 6 is **0.3**.

---


###  Verify the Law of Total Probability
From the simulation:
- The simulated probability $ P(A) \approx 0.2998 $.
- The theoretical probability $ P(A) = 0.3 $.

The results are very close, verifying the Law of Total Probability.

---

### Key Takeaways:
1. The Law of Total Probability allows you to compute $ P(A) $ by considering all possible scenarios ($ B_i $).
2. You can verify the law using simulated data by:
   - Partitioning the sample space into mutually exclusive and exhaustive events.
   - Computing $ P(A|B_i) $ and $ P(B_i) $ for each partition.
   - Confirming that $ P(A) = \sum_{i=1}^n P(A|B_i) \cdot P(B_i) $.


In [None]:
# Verify Law of Total Probability using simulated Data

# Parameters
headsProbability = 0.2  # Probability of heads
tailsProbability = 0.8  # Probability of tails
numOfTrials = 100000  # Number of simulations

# Simulate the process
countA = 0  # Count of event A (rolling a 6)

for _ in range(numOfTrials):
    # Flip the coin
    if random.random() < headsProbability:
        # Roll a fair die
        roll = random.randint(1, 6)
    else:
        # Roll a biased die
        if random.random() < 0.5:
            roll = 6
        else:
            roll = random.randint(1, 5)
    
    # Check if event A occurs
    if roll == 6:
        countA += 1

# Compute P(A) from simulation
PASimulated = countA / numOfTrials

print("Simulated P(A):", PASimulated)
print("Theoretical P(A):", 0.3)

In [None]:
# Simulate and Calculate the probability of drawing specific card from deck

# Specific card to draw (e.g., Ace of Spades)
def randomDeckOfCards( specificCard = "Ace of Spades", simulations = 100000):

# Define the deck of cards
    suits: list[str] = ["Hearts", "Diamonds", "Clubs", "Spades"]
    ranks:list[str] = ["2", "3", "4", "5", "6", "7", "8", "9", "10", "Jack", "Queen", "King", "Ace"]
    deck = [f"{rank} of {suit}" for suit in suits for rank in ranks]
    countSpecificCard = 0

# Simulate drawing a card
    for _ in range(simulations):
        # Shuffle the deck
        random.shuffle(deck)
        # Draw the top card
        drawnCard = deck[0]
        # Check if it's the specific card
        if drawnCard == specificCard:
            countSpecificCard += 1

    # Calculate the empirical probability
    empiricalProbability = countSpecificCard / simulations

    print("Specific Card:", specificCard)
    print("Theoretical Probability:", 1/52)
    print("Empirical Probability:", empiricalProbability)

randomDeckOfCards(specificCard="Queen of Hearts")

**Joint probability** refers to the probability of two events occurring together. It is denoted as $ P(A \cap B) $ or $ P(A, B) $, and it represents the likelihood of both events $ A $ and $ B $ happening simultaneously.

---

### Formula for Joint Probability:
The joint probability of two events $ A $ and $ B $ is given by:

$
P(A \cap B) = P(A) \cdot P(B|A)
$

Or equivalently:

$
P(A \cap B) = P(B) \cdot P(A|B)
$

Where:
- $ P(A) $ is the probability of event $ A $,
- $ P(B|A) $ is the probability of event $ B $ given that $ A $ has occurred,
- $ P(B) $ is the probability of event $ B $,
- $ P(A|B) $ is the probability of event $ A $ given that $ B $ has occurred.

If events $ A $ and $ B $ are **independent**, then:
$
P(A \cap B) = P(A) \cdot P(B)
$

---

### Steps to Compute Joint Probability:
1. **Identify the events**:
   - Define events $ A $ and $ B $.

2. **Determine if the events are independent**:
   - If $ A $ and $ B $ are independent, use $ P(A \cap B) = P(A) \cdot P(B) $.
   - If $ A $ and $ B $ are dependent, use $ P(A \cap B) = P(A) \cdot P(B|A) $ or $ P(A \cap B) = P(B) \cdot P(A|B) $.

3. **Compute the probabilities**:
   - Calculate $ P(A) $, $ P(B) $, and (if necessary) $ P(B|A) $ or $ P(A|B) $.

4. **Calculate the joint probability**:
   - Use the appropriate formula to compute $ P(A \cap B) $.

---

### Example 1: Independent Events
Suppose you roll a fair six-sided die and flip a fair coin. Let:
- $ A $: Rolling a **3** on the die.
- $ B $: Flipping **heads** on the coin.

Since the die roll and coin flip are independent:
$
P(A) = \frac{1}{6}, \quad P(B) = \frac{1}{2}
$

The joint probability is:
$
P(A \cap B) = P(A) \cdot P(B) = \frac{1}{6} \cdot \frac{1}{2} = \frac{1}{12} \approx 0.0833
$

---

### Example 2: Dependent Events
Suppose you draw two cards from a standard deck of 52 cards without replacement. Let:
- $ A $: First card is an **Ace**.
- $ B $: Second card is also an **Ace**.

Here, $ A $ and $ B $ are dependent events.

1. Compute $ P(A) $:
   $
   P(A) = \frac{4}{52} = \frac{1}{13}
   $

2. Compute $ P(B|A) $:
   - If the first card is an Ace, there are now 3 Aces left in the remaining 51 cards.
   $
   P(B|A) = \frac{3}{51} = \frac{1}{17}
   $

3. Compute the joint probability:
   $
   P(A \cap B) = P(A) \cdot P(B|A) = \frac{1}{13} \cdot \frac{1}{17} = \frac{1}{221} \approx 0.0045
   $

---


### Key Takeaways:
- Joint probability measures the likelihood of two events occurring together.
- For **independent events**, $ P(A \cap B) = P(A) \cdot P(B) $.
- For **dependent events**, $ P(A \cap B) = P(A) \cdot P(B|A) $ or $ P(A \cap B) = P(B) \cdot P(A|B) $.
- Use Python to compute joint probabilities efficiently.


In [134]:
# Compute Joint Probabilities of Two Events



def independentEvent()->float:
    # Probabilities
    pA: float = 1 / 6  # Probability of rolling a 3
    pB: float = 1 / 2  # Probability of flipping heads

# Joint probability for independent events
    pApB:float = pA * pB

    print("Joint Probability P(A ∩ B):", pApB)
    return pApB
def dependentEvent()->float:
    # Probabilities
    pA:float = 4 / 52  # Probability of first card being an Ace
    pBgivenA:float = 3 / 51  # Probability of second card being an Ace given the first was an Ace

    # Joint probability for dependent events
    pApB:float = pA * pBgivenA

    print("Joint Probability P(A ∩ B):", pApB)
    return pApB

independentEvent()
dependentEvent()

Joint Probability P(A ∩ B): 0.08333333333333333
Joint Probability P(A ∩ B): 0.004524886877828055


0.004524886877828055

**Bayes' Theorem** is a fundamental concept in probability that allows us to update the probability of an event based on new information. It is stated as:

$
P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}
$

Where:
- $ P(A|B) $: Posterior probability of event $ A $ given event $ B $,
- $ P(B|A) $: Likelihood of event $ B $ given event $ A $,
- $ P(A) $: Prior probability of event $ A $,
- $ P(B) $: Total probability of event $ B $.

To **verify Bayes' Theorem with a real-world example**, let’s use a medical testing scenario.

---

### Real-World Example: Medical Testing
Suppose:
- A disease affects **1%** of the population ($ P(A) = 0.01 $).
- A test for the disease is **99% accurate**:
  - If a person has the disease, the test is positive **99%** of the time ($ P(B|A) = 0.99 $).
  - If a person does not have the disease, the test is negative **99%** of the time ($ P(B^c|A^c) = 0.99 $).

We want to find:
- The probability that a person has the disease given that they tested positive ($ P(A|B) $).

---

### Step 1: Theoretical Calculation Using Bayes' Theorem
1. **Compute $ P(B|A) $**:
   - $ P(B|A) = 0.99 $.

2. **Compute $ P(A) $**:
   - $ P(A) = 0.01 $.

3. **Compute $ P(B|A^c) $**:
   - The probability of a false positive is $ 1 - 0.99 = 0.01 $.

4. **Compute $ P(B) $**:
   - $ P(B) = P(B|A) \cdot P(A) + P(B|A^c) \cdot P(A^c) $,
   - $ P(B) = (0.99 \cdot 0.01) + (0.01 \cdot 0.99) = 0.0099 + 0.0099 = 0.0198 $.

5. **Apply Bayes' Theorem**:
   $
   P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)} = \frac{0.99 \cdot 0.01}{0.0198} = \frac{0.0099}{0.0198} = 0.5
   $

So, the probability that a person has the disease given that they tested positive is **50%**.


---

### Key Takeaways:
1. **Bayes' Theorem** allows us to update probabilities based on new evidence.
2. In this example:
   - Even with a **99% accurate test**, the probability of having the disease after testing positive is only **50%** due to the low prevalence of the disease.
3. Simulations can be used to verify theoretical results and build intuition.


In [135]:
# Verify Bayes Theorem with a Real World Example

def realWorldBayesTheorem():

    # Parameters
    pA = 0.01  # Prevalence of the disease
    pBgivenA = 0.99  # Probability of testing positive given the disease
    pBgivenNotA = 0.01  # Probability of testing positive given no disease

    # Number of simulations
    trials = 100000

    # Counters
    countAandB = 0  # Number of people with the disease and testing positive
    countB = 0  # Number of people testing positive

# Simulate
    for _ in range(trials):
      # Determine if the person has the disease
        hasDisease = random.random() < pA
     # Determine if the test is positive
        if hasDisease:
            testPositive = random.random() < pBgivenA
        else:
            testPositive = random.random() < pBgivenNotA
     # Update counters
        if testPositive:
            countB += 1
            if hasDisease:
                countAandB += 1

    # Compute P(A|B) from simulation
    pAgivenBSimulated = countAandB / countB

    # Theoretical P(A|B)
    pB = pBgivenA * pA + pBgivenNotA * (1 - pA)
    pAgivenBTheoretical = (pBgivenA * pA) / pB

    print("Simulated P(A|B):", pAgivenBSimulated)
    print("Theoretical P(A|B):", pAgivenBTheoretical)


realWorldBayesTheorem()

Simulated P(A|B): 0.4844625573102394
Theoretical P(A|B): 0.5
