# Module 14: Probability Theory

## Topics Covered
1. Sample Spaces and Events
2. Conditional Probability and Bayes' Theorem
3. Random Variables
4. Probability Distributions (Binomial, Poisson, Normal)
5. Expected Value and Variance
6. Central Limit Theorem
7. Joint and Marginal Distributions
8. Maximum Likelihood Estimation

## Learning Objectives

By the end of this module, you will be able to:
- Understand and calculate probabilities using sample spaces and events
- Apply conditional probability and Bayes' Theorem to real-world problems
- Work with different types of random variables and probability distributions
- Calculate expected values and variances for decision-making
- Understand the Central Limit Theorem and its importance in data science
- Analyze joint and marginal distributions in multivariate scenarios
- Apply Maximum Likelihood Estimation to estimate parameters

---

---
# Section 1: Sample Spaces and Events
---

## What are Sample Spaces and Events?

A **sample space** is the set of all possible outcomes of a random experiment. An **event** is a subset of the sample space - a collection of one or more outcomes we're interested in.

For example, when rolling a six-sided die, the sample space is {1, 2, 3, 4, 5, 6}. An event might be "rolling an even number," which corresponds to the subset {2, 4, 6}.

Understanding sample spaces and events is the foundation of probability theory. In data science, we use these concepts to model uncertainty in customer behavior, financial markets, medical diagnoses, and countless other applications.

### Why This Matters in Data Science

In real-world data science, you'll often need to calculate the probability of specific outcomes. For instance, what's the probability that a customer will click on an ad? What's the chance that a transaction is fraudulent? These questions all start with defining the sample space and the events of interest.

In [None]:
# Example 1: Sample Space and Events with a Die Roll
import numpy as np
import pandas as pd
from itertools import product

# Sample space for rolling a single die
sample_space_die = {1, 2, 3, 4, 5, 6}
print("Sample Space (one die):", sample_space_die)

# Event: Rolling an even number
event_even = {2, 4, 6}
print("Event (even number):", event_even)

# Probability of event
prob_even = len(event_even) / len(sample_space_die)
print(f"Probability of rolling even: {prob_even:.3f}")

In [None]:
# Example 2: Sample Space for Two Dice (Practical Application)
# This is similar to analyzing combinations in A/B testing or game theory

# Generate sample space for two dice
sample_space_two_dice = list(product(range(1, 7), repeat=2))
print(f"Total outcomes with two dice: {len(sample_space_two_dice)}")
print("First 10 outcomes:", sample_space_two_dice[:10])

# Event: Sum equals 7
event_sum_7 = [(d1, d2) for d1, d2 in sample_space_two_dice if d1 + d2 == 7]
print(f"\nOutcomes where sum = 7: {event_sum_7}")
print(f"Probability of sum = 7: {len(event_sum_7) / len(sample_space_two_dice):.3f}")

## Practice Exercise 1.1

**Task:** A company's customer database shows that customers can be classified by two attributes:
- Subscription type: ['Free', 'Premium', 'Enterprise']
- Region: ['North', 'South', 'East', 'West']

1. Create the sample space of all possible customer types
2. Define an event for "Premium or Enterprise customers from East or West regions"
3. If each combination is equally likely, calculate the probability of this event

**Expected Output:**
```
Total possible customer types: 12
Event outcomes: 4
Probability: 0.333
```

In [None]:
# Your code here


In [None]:
# Solution 1.1

# Define attributes
subscription_types = ['Free', 'Premium', 'Enterprise']
regions = ['North', 'South', 'East', 'West']

# Create sample space
sample_space = list(product(subscription_types, regions))
print(f"Total possible customer types: {len(sample_space)}")

# Define event: Premium or Enterprise from East or West
event = [(sub, reg) for sub, reg in sample_space 
         if sub in ['Premium', 'Enterprise'] and reg in ['East', 'West']]
print(f"Event outcomes: {len(event)}")
print(f"Event: {event}")

# Calculate probability
probability = len(event) / len(sample_space)
print(f"Probability: {probability:.3f}")

---
# Section 2: Conditional Probability and Bayes' Theorem
---

## What is Conditional Probability?

**Conditional probability** is the probability of an event occurring given that another event has already occurred. We write this as P(A|B), read as "the probability of A given B."

The formula is: **P(A|B) = P(A and B) / P(B)**

**Bayes' Theorem** is a powerful formula that lets us reverse conditional probabilities. It states:

**P(A|B) = [P(B|A) Ã— P(A)] / P(B)**

### Why This Matters in Data Science

Bayes' Theorem is fundamental to many machine learning algorithms, including Naive Bayes classifiers, spam filters, and medical diagnosis systems. It allows us to update our beliefs based on new evidence - for example, updating the probability that an email is spam after seeing certain keywords.

In [None]:
# Example 1: Conditional Probability - Email Campaign
# Analyzing customer response based on email opens

# Given data from email campaign
total_customers = 1000
opened_email = 300
made_purchase = 150
opened_and_purchased = 120

# P(Purchase | Opened)
prob_purchase_given_open = opened_and_purchased / opened_email
print(f"P(Purchase | Opened Email) = {prob_purchase_given_open:.3f}")

# P(Opened | Purchase)
prob_open_given_purchase = opened_and_purchased / made_purchase
print(f"P(Opened Email | Purchase) = {prob_open_given_purchase:.3f}")

# Overall probabilities
prob_open = opened_email / total_customers
prob_purchase = made_purchase / total_customers
print(f"\nP(Opened) = {prob_open:.3f}")
print(f"P(Purchase) = {prob_purchase:.3f}")

In [None]:
# Example 2: Bayes' Theorem - Medical Test Accuracy
# Classic application: disease diagnosis

# Given probabilities
prob_disease = 0.01  # 1% of population has the disease
prob_positive_given_disease = 0.95  # Test sensitivity (true positive rate)
prob_positive_given_no_disease = 0.05  # False positive rate

# Calculate P(Positive) using law of total probability
prob_no_disease = 1 - prob_disease
prob_positive = (prob_positive_given_disease * prob_disease + 
                prob_positive_given_no_disease * prob_no_disease)

# Apply Bayes' Theorem: P(Disease | Positive)
prob_disease_given_positive = (prob_positive_given_disease * prob_disease) / prob_positive

print(f"P(Disease) = {prob_disease:.3f}")
print(f"P(Positive Test) = {prob_positive:.3f}")
print(f"P(Disease | Positive Test) = {prob_disease_given_positive:.3f}")
print(f"\nInterpretation: Even with a positive test, there's only a {prob_disease_given_positive*100:.1f}% chance of actually having the disease!")

## Practice Exercise 2.1

**Task:** An e-commerce company has the following data:
- 5% of customers are classified as "high-value" customers
- 80% of high-value customers use the mobile app
- 30% of regular customers use the mobile app

Use Bayes' Theorem to calculate: If a customer uses the mobile app, what is the probability they are a high-value customer?

**Expected Output:**
```
P(High-Value | Uses App) = 0.123
```

In [None]:
# Your code here


In [None]:
# Solution 2.1

# Given probabilities
prob_high_value = 0.05
prob_app_given_high_value = 0.80
prob_app_given_regular = 0.30
prob_regular = 1 - prob_high_value

# Calculate P(Uses App) using law of total probability
prob_app = (prob_app_given_high_value * prob_high_value + 
           prob_app_given_regular * prob_regular)

# Apply Bayes' Theorem
prob_high_value_given_app = (prob_app_given_high_value * prob_high_value) / prob_app

print(f"P(Uses App) = {prob_app:.3f}")
print(f"P(High-Value | Uses App) = {prob_high_value_given_app:.3f}")

---
# Module Summary

## Key Takeaways

- **Sample spaces and events** form the foundation of probability, helping us model and calculate the likelihood of outcomes
- **Conditional probability and Bayes' Theorem** allow us to update probabilities based on new information, essential for classification and diagnosis
- **Random variables** (discrete and continuous) provide a mathematical framework for modeling uncertainty in data
- **Probability distributions** (Binomial, Poisson, Normal) are patterns that appear repeatedly in real-world data
- **Expected value and variance** quantify central tendency and risk, crucial for decision-making under uncertainty
- **Central Limit Theorem** is the bridge between sample statistics and population parameters, enabling statistical inference
- **Joint and marginal distributions** help analyze relationships between multiple variables
- **Maximum Likelihood Estimation** provides a principled method for fitting probability models to data

## Next Module

In Module 15: Advanced Statistics, we'll build on probability theory to explore statistical inference, hypothesis testing, confidence intervals, and the differences between Bayesian and Frequentist approaches. You'll learn how to make data-driven decisions with quantified uncertainty.

## Additional Practice

For extra practice, try these challenges:

1. **Risk Analysis**: A product has a 2% defect rate. If you sample 200 units, what's the probability of finding more than 5 defects? Use the Binomial distribution.

2. **Bayes in Action**: Implement a spam filter using Bayes' Theorem. Given that 30% of emails are spam, and the word "FREE" appears in 80% of spam but only 10% of legitimate emails, calculate P(Spam | contains "FREE").

3. **CLT Simulation**: Simulate rolling a die 1000 times with different sample sizes (n=5, 10, 30, 50). Plot the distribution of sample means for each and observe when the CLT takes effect.

4. **Customer Segmentation**: Create a joint distribution for customer age groups and product categories. Calculate marginal distributions and test for independence.

5. **MLE Challenge**: Generate 100 data points from a normal distribution with unknown parameters. Use MLE to estimate both the mean and standard deviation, then compare to the true values you used to generate the data.