# P&S Module 6: Discrete Probability Distributions

**Course:** Probability and Statistics

**Week 6:** Discrete Probability Distributions

**Topics Covered:**
- Binomial Distribution (Properties, Mean, Variance)
- Poisson Distribution (Properties, Mean, Variance)
- Relationship between Binomial and Poisson
- Real-world Applications

---

## Introduction

In Week 5, we learned about random variables and how to calculate their mean and variance. Now we'll study two important **discrete probability distributions** that appear frequently in real-world applications.

**What are Probability Distributions?**

A probability distribution is a mathematical function that describes all possible values a random variable can take and their probabilities.

**Why Study These Distributions?**

Instead of calculating probabilities from scratch every time, we can use well-known distributions:
- **Binomial**: Models success/failure experiments repeated n times
- **Poisson**: Models the number of events occurring in a fixed interval

**Real-world Applications:**
- **Binomial**: Quality control (defective items), clinical trials (treatment success), surveys (yes/no responses)
- **Poisson**: Customer arrivals, phone calls per hour, accidents per month, emails received per day

---

## Part A: Binomial Distribution

### When to Use Binomial Distribution?

The binomial distribution models situations where:
1. **Fixed number of trials (n)**: You perform the experiment n times
2. **Two outcomes**: Each trial results in "success" or "failure"
3. **Independent trials**: Each trial doesn't affect others
4. **Constant probability**: P(success) = p stays the same

### Common Examples:

✓ Flipping a coin n times (heads/tails)
✓ Quality control: testing n items (defective/good)
✓ Medical trials: treating n patients (cured/not cured)
✓ Survey: asking n people (yes/no)

### Mathematical Definition

If X ~ Binomial(n, p), then:

$$P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}$$

Where:
- **n** = number of trials
- **k** = number of successes (0, 1, 2, ..., n)
- **p** = probability of success on each trial
- $\binom{n}{k} = \frac{n!}{k!(n-k)!}$ = binomial coefficient

### Mean and Variance

**Mean (Expected Value):**

$$E[X] = \mu = n \cdot p$$

**Variance:**

$$\text{Var}(X) = \sigma^2 = n \cdot p \cdot (1-p)$$

**Standard Deviation:**

$$\sigma = \sqrt{n \cdot p \cdot (1-p)}$$

**Intuition:**
- Mean: On average, you expect np successes
- Variance: Measures spread around the mean

---

### Problem 1: Defective Light Bulbs

A factory produces light bulbs. Each bulb has a **30% chance** of being defective.

You randomly select **20 bulbs**.

**Questions:**
1. What is the mean and variance?
2. What is P(X ≤ 5)? (at most 5 defective)
3. What is P(X ≥ 5)? (at least 5 defective)
4. Visualize the PMF

**Setup:**
- n = 20 (number of bulbs)
- p = 0.3 (probability of defective)
- X = number of defective bulbs

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import binom

# Problem 1: Defective light bulbs
print("=" * 60)
print("Problem 1: Binomial Distribution - Defective Light Bulbs")
print("=" * 60)

# Parameters
n = 20     # number of trials (bulbs selected)
p = 0.3    # probability of defective

# Calculate theoretical mean and variance
mean_theoretical = n * p
variance_theoretical = n * p * (1 - p)
std_theoretical = np.sqrt(variance_theoretical)

print(f"\nParameters:")
print(f"  n = {n} (number of bulbs)")
print(f"  p = {p} (probability defective)")

print(f"\nTheoretical Results:")
print(f"  Mean = n × p = {n} × {p} = {mean_theoretical:.2f}")
print(f"  Variance = n × p × (1-p) = {n} × {p} × {1-p} = {variance_theoretical:.2f}")
print(f"  Standard Deviation = {std_theoretical:.2f}")

# Create binomial distribution object
dist = binom(n, p)

# Calculate probabilities
prob_at_most_5 = dist.cdf(5)  # P(X ≤ 5)
prob_at_least_5 = 1 - dist.cdf(4)  # P(X ≥ 5) = 1 - P(X ≤ 4)
prob_exactly_5 = dist.pmf(5)  # P(X = 5)

print(f"\nProbabilities:")
print(f"  P(X ≤ 5) = {prob_at_most_5:.4f} ({prob_at_most_5*100:.2f}%)")
print(f"  P(X ≥ 5) = {prob_at_least_5:.4f} ({prob_at_least_5*100:.2f}%)")
print(f"  P(X = 5) = {prob_exactly_5:.4f} ({prob_exactly_5*100:.2f}%)")

# Verify with simulation
print(f"\n{'Simulation Verification (10,000 trials)':^60}")
print("-" * 60)
simulated_data = np.random.binomial(n, p, size=10000)
mean_simulated = np.mean(simulated_data)
variance_simulated = np.var(simulated_data)

print(f"  Simulated Mean: {mean_simulated:.2f} (Theoretical: {mean_theoretical:.2f})")
print(f"  Simulated Variance: {variance_simulated:.2f} (Theoretical: {variance_theoretical:.2f})")

print("=" * 60)

In [None]:
# Visualize the Binomial PMF

x_values = np.arange(0, n+1)
pmf_values = dist.pmf(x_values)

# Create visualization
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Plot 1: Complete PMF
ax1.bar(x_values, pmf_values, color='skyblue', edgecolor='black', alpha=0.7)
ax1.axvline(mean_theoretical, color='red', linestyle='--', linewidth=2, label=f'Mean = {mean_theoretical:.1f}')
ax1.set_xlabel('Number of Defective Bulbs (k)', fontsize=11)
ax1.set_ylabel('Probability P(X = k)', fontsize=11)
ax1.set_title(f'Binomial PMF: n={n}, p={p}', fontsize=13, fontweight='bold')
ax1.legend(fontsize=10)
ax1.grid(axis='y', alpha=0.3)
ax1.set_xticks(x_values)

# Plot 2: Highlight P(X ≤ 5)
colors = ['orange' if xi <= 5 else 'lightblue' for xi in x_values]
ax2.bar(x_values, pmf_values, color=colors, edgecolor='black', alpha=0.7)
ax2.set_xlabel('Number of Defective Bulbs (k)', fontsize=11)
ax2.set_ylabel('Probability P(X = k)', fontsize=11)
ax2.set_title(f'Highlighting P(X ≤ 5) = {prob_at_most_5:.4f}', fontsize=13, fontweight='bold')
ax2.grid(axis='y', alpha=0.3)
ax2.set_xticks(x_values)

# Add text annotation
ax2.text(8, max(pmf_values)*0.8, f'Orange: P(X ≤ 5)\nBlue: P(X > 5)',
         bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5), fontsize=10)

plt.tight_layout()
plt.show()

### Interpretation

From the results:

1. **Mean = 6**: On average, we expect 6 defective bulbs out of 20
2. **Variance ≈ 4.2**: The spread around the mean
3. **P(X ≤ 5)**: There's about a 38% chance of getting 5 or fewer defective bulbs
4. **P(X ≥ 5)**: There's about a 77% chance of getting 5 or more defective bulbs

**Key Observation:**

The distribution is centered around the mean (6), with most probability mass between 2 and 10 defective bulbs.

### 📝 TO DO #1: Experiment with Different Parameters

Try changing n and p to see how the distribution changes:

In [None]:
# TO DO #1: Experiment with binomial parameters

def plot_binomial(n, p, title_suffix=""):
    """Plot binomial distribution for given parameters"""
    mean = n * p
    variance = n * p * (1 - p)

    x = np.arange(0, n+1)
    pmf = binom.pmf(x, n, p)

    plt.figure(figsize=(10, 5))
    plt.bar(x, pmf, color='lightcoral', edgecolor='black', alpha=0.7)
    plt.axvline(mean, color='darkred', linestyle='--', linewidth=2, label=f'Mean = {mean:.2f}')
    plt.xlabel('Number of Successes (k)', fontsize=11)
    plt.ylabel('Probability P(X = k)', fontsize=11)
    plt.title(f'Binomial PMF: n={n}, p={p} {title_suffix}', fontsize=13, fontweight='bold')
    plt.legend()
    plt.grid(axis='y', alpha=0.3)
    plt.tight_layout()
    plt.show()

    print(f"n = {n}, p = {p}")
    print(f"  Mean = {mean:.2f}")
    print(f"  Variance = {variance:.2f}\n")

# Original scenario
print("Original Scenario:")
plot_binomial(20, 0.3)

# TO DO: Try these scenarios by uncommenting

# Scenario 1: Same n, different p (higher success rate)
# print("Scenario 1: Higher success rate (p=0.7)")
# plot_binomial(20, 0.7, "- More successes expected")

# Scenario 2: Larger n, same p
# print("Scenario 2: More trials (n=50)")
# plot_binomial(50, 0.3, "- More trials")

# Scenario 3: Fair coin (p=0.5)
# print("Scenario 3: Fair coin (p=0.5)")
# plot_binomial(20, 0.5, "- Symmetric distribution")

print("--- YOUR TURN ---")
print("Uncomment the scenarios above to see different distributions!")
print("Notice how changing p affects the shape and mean!")

### 📝 TO DO #2: Real-World Application

Design your own binomial problem and solve it:

In [None]:
# TO DO #2: Create your own binomial problem

# Example: Free throw basketball
print("Example: Basketball Free Throws")
print("-" * 50)

# A player makes 70% of free throws
n_shots = 10  # Takes 10 shots
p_make = 0.70  # 70% success rate

# Create distribution
basketball_dist = binom(n_shots, p_make)
mean_makes = n_shots * p_make

print(f"Setup: Player takes {n_shots} free throws")
print(f"  Success rate: {p_make*100}%")
print(f"  Expected makes: {mean_makes:.1f}")

# Calculate some probabilities
prob_all_makes = basketball_dist.pmf(10)
prob_at_least_7 = 1 - basketball_dist.cdf(6)

print(f"\nProbabilities:")
print(f"  P(makes all 10) = {prob_all_makes:.4f}")
print(f"  P(makes at least 7) = {prob_at_least_7:.4f}")

# Visualize
x = np.arange(0, n_shots+1)
pmf = basketball_dist.pmf(x)
plt.figure(figsize=(10, 5))
plt.bar(x, pmf, color='orange', edgecolor='black', alpha=0.7)
plt.axvline(mean_makes, color='red', linestyle='--', linewidth=2, label=f'Expected = {mean_makes:.1f}')
plt.xlabel('Number of Successful Free Throws')
plt.ylabel('Probability')
plt.title('Basketball Free Throws Distribution')
plt.legend()
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

print("\n--- YOUR TURN ---")
print("Create your own scenario:")
print("1. Flipping 15 coins")
print("2. Medical treatment with 85% success rate, 30 patients")
print("3. Quality control: 5% defect rate, inspect 50 items")

---

## Part B: Poisson Distribution

### When to Use Poisson Distribution?

The Poisson distribution models the **number of events** occurring in a **fixed interval** of time or space when:

1. **Events occur independently**
2. **Events occur at a constant average rate**
3. **Two events cannot occur at exactly the same instant**

### Common Examples:

✓ Number of customers arriving per hour
✓ Number of phone calls received per minute
✓ Number of emails received per day
✓ Number of accidents per month
✓ Number of typos per page
✓ Number of radioactive decay events per second

### Mathematical Definition

If X ~ Poisson(λ), then:

$$P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}$$

Where:
- **λ (lambda)** = average rate (expected number of events)
- **k** = actual number of events (0, 1, 2, 3, ...)
- **e** ≈ 2.71828 (Euler's number)

### Mean and Variance

**Mean:**

$$E[X] = \mu = \lambda$$

**Variance:**

$$\text{Var}(X) = \sigma^2 = \lambda$$

**Key Property:** For Poisson, **mean = variance = λ**

This is a unique property of the Poisson distribution!

---

### Problem 2: Customer Arrivals

A store receives an average of **4.5 customers per hour**.

Let X = number of customers arriving in one hour.

**Questions:**
1. What distribution does X follow?
2. What is the mean and variance?
3. Visualize the PMF
4. What is P(X = 3)?
5. What is P(X ≥ 5)?

**Setup:**
- X ~ Poisson(λ = 4.5)

In [None]:
from scipy.stats import poisson

# Problem 2: Customer arrivals (Poisson distribution)
print("=" * 60)
print("Problem 2: Poisson Distribution - Customer Arrivals")
print("=" * 60)

# Parameter
lam = 4.5  # lambda: average customers per hour

# Theoretical mean and variance
mean_poisson = lam
variance_poisson = lam
std_poisson = np.sqrt(lam)

print(f"\nParameters:")
print(f"  λ = {lam} (average customers per hour)")

print(f"\nTheoretical Results:")
print(f"  Mean = λ = {mean_poisson}")
print(f"  Variance = λ = {variance_poisson}")
print(f"  Standard Deviation = √λ = {std_poisson:.4f}")

# Calculate probabilities
prob_exactly_3 = poisson.pmf(3, lam)
prob_exactly_5 = poisson.pmf(5, lam)
prob_at_least_5 = 1 - poisson.cdf(4, lam)  # P(X ≥ 5) = 1 - P(X ≤ 4)

print(f"\nProbabilities:")
print(f"  P(X = 3) = {prob_exactly_3:.4f} ({prob_exactly_3*100:.2f}%)")
print(f"  P(X = 5) = {prob_exactly_5:.4f} ({prob_exactly_5*100:.2f}%)")
print(f"  P(X ≥ 5) = {prob_at_least_5:.4f} ({prob_at_least_5*100:.2f}%)")

# Simulation
print(f"\n{'Simulation Verification (10,000 trials)':^60}")
print("-" * 60)
simulated_poisson = np.random.poisson(lam, size=10000)
mean_sim = np.mean(simulated_poisson)
var_sim = np.var(simulated_poisson)

print(f"  Simulated Mean: {mean_sim:.4f} (Theoretical: {mean_poisson})")
print(f"  Simulated Variance: {var_sim:.4f} (Theoretical: {variance_poisson})")

print("=" * 60)

In [None]:
# Visualize Poisson PMF

x_poisson = np.arange(0, 15)
pmf_poisson = poisson.pmf(x_poisson, lam)

plt.figure(figsize=(10, 6))
plt.bar(x_poisson, pmf_poisson, color='lightgreen', edgecolor='black', alpha=0.7)
plt.axvline(lam, color='darkgreen', linestyle='--', linewidth=2, label=f'Mean = λ = {lam}')
plt.xlabel('Number of Customers (k)', fontsize=11)
plt.ylabel('Probability P(X = k)', fontsize=11)
plt.title(f'Poisson PMF: λ = {lam}', fontsize=13, fontweight='bold')
plt.legend(fontsize=11)
plt.grid(axis='y', alpha=0.3)
plt.xticks(x_poisson)
plt.tight_layout()
plt.show()

print("Interpretation:")
print(f"  - Most likely outcomes are around {int(lam)} customers")
print(f"  - Very unlikely to have 0 or 10+ customers")
print(f"  - Distribution is slightly right-skewed")

### 📝 TO DO #3: Different Poisson Scenarios

Explore how changing λ affects the distribution:

In [None]:
# TO DO #3: Experiment with different λ values

def plot_poisson(lam, title_suffix=""):
    """Plot Poisson distribution for given lambda"""
    x = np.arange(0, int(lam * 3 + 10))
    pmf = poisson.pmf(x, lam)

    plt.figure(figsize=(10, 5))
    plt.bar(x, pmf, color='mediumpurple', edgecolor='black', alpha=0.7)
    plt.axvline(lam, color='darkviolet', linestyle='--', linewidth=2, label=f'λ = {lam}')
    plt.xlabel('Number of Events (k)', fontsize=11)
    plt.ylabel('Probability P(X = k)', fontsize=11)
    plt.title(f'Poisson PMF: λ = {lam} {title_suffix}', fontsize=13, fontweight='bold')
    plt.legend()
    plt.grid(axis='y', alpha=0.3)
    plt.tight_layout()
    plt.show()

    print(f"λ = {lam}: Mean = {lam}, Variance = {lam}\n")

# Compare different λ values
print("Comparing Different Poisson Distributions:\n")

# Small λ (rare events)
print("Scenario 1: Rare events (λ = 1)")
plot_poisson(1, "- Rare events")

# TO DO: Uncomment to see more scenarios

# Medium λ
# print("Scenario 2: Moderate events (λ = 5)")
# plot_poisson(5, "- Moderate rate")

# Large λ (frequent events)
# print("Scenario 3: Frequent events (λ = 15)")
# plot_poisson(15, "- Frequent events")

print("--- YOUR TURN ---")
print("Notice:")
print("  - As λ increases, the distribution spreads out")
print("  - Larger λ looks more symmetric (approaches normal)")
print("  - Variance increases with λ (because Var = λ)")

### 📝 TO DO #4: Real-World Poisson Problems

Create and solve your own Poisson problems:

In [None]:
# TO DO #4: Real-world Poisson applications

# Example 1: Email arrivals
print("Example 1: Email Arrivals")
print("-" * 50)

lam_emails = 8  # Average 8 emails per hour
email_dist = poisson(lam_emails)

print(f"Average emails per hour: {lam_emails}")
print(f"  P(no emails) = {email_dist.pmf(0):.4f}")
print(f"  P(more than 10 emails) = {1 - email_dist.cdf(10):.4f}")

# Example 2: Website visits
print("\nExample 2: Website Visits")
print("-" * 50)

lam_visits = 12  # Average 12 visits per minute
visit_dist = poisson(lam_visits)

print(f"Average visits per minute: {lam_visits}")
print(f"  P(exactly 12 visits) = {visit_dist.pmf(12):.4f}")
print(f"  P(between 10 and 15 visits) = {visit_dist.cdf(15) - visit_dist.cdf(9):.4f}")

print("\n--- YOUR TURN ---")
print("Create scenarios for:")
print("1. Phone calls to a call center (average 20 per hour)")
print("2. Typos per page (average 2 per page)")
print("3. Accidents on a highway (average 3 per month)")
print("\nFor each, calculate:")
print("  - P(X = 0)")
print("  - P(X > average)")
print("  - P(X is within 1 standard deviation of mean)")

---

## Part C: Relationship Between Binomial and Poisson

### Poisson as Approximation to Binomial

Under certain conditions, the Poisson distribution can **approximate** the Binomial distribution.

### When to Use Poisson Approximation?

**Conditions:**
1. **n is large** (typically n ≥ 20 or n ≥ 100)
2. **p is small** (typically p ≤ 0.05 or p ≤ 0.01)
3. **λ = np** is moderate (typically λ < 10)

**Approximation:**

$$\text{Binomial}(n, p) \approx \text{Poisson}(\lambda = np)$$

### Why Does This Work?

When n is large and p is small:
- We have many trials (n)
- But success is rare (p small)
- This is exactly what Poisson models: rare events!

### Practical Benefit:

**Binomial** calculation: $\binom{n}{k} p^k (1-p)^{n-k}$ (complex for large n)

**Poisson** calculation: $\frac{\lambda^k e^{-\lambda}}{k!}$ (much simpler!)

---

### Problem 3: Comparing Binomial and Poisson

**Scenario:** A factory produces 100 items. Each item has a 4.5% chance of being defective.

**Binomial approach:**
- n = 100, p = 0.045
- X ~ Binomial(100, 0.045)

**Poisson approximation:**
- λ = np = 100 × 0.045 = 4.5
- X ≈ Poisson(4.5)

**Question:** How good is the approximation?

In [None]:
# Problem 3: Binomial vs Poisson comparison

print("=" * 60)
print("Problem 3: Poisson Approximation to Binomial")
print("=" * 60)

# Parameters
n = 100
p = 0.045
lam = n * p  # λ = np = 4.5

print(f"\nSetup:")
print(f"  Binomial: n = {n}, p = {p}")
print(f"  Poisson: λ = np = {n} × {p} = {lam}")

# Check conditions for approximation
print(f"\nConditions for Poisson approximation:")
print(f"  ✓ n = {n} is large (≥ 20)")
print(f"  ✓ p = {p} is small (≤ 0.05)")
print(f"  ✓ λ = {lam} is moderate (< 10)")
print(f"\n  → Poisson approximation should work well!")

# Create distributions
binomial_dist = binom(n, p)
poisson_dist = poisson(lam)

# Compare probabilities for specific values
print(f"\n{'k':<5} {'Binomial':<15} {'Poisson':<15} {'Difference':<15}")
print("-" * 60)
for k in [0, 2, 4, 5, 6, 8, 10]:
    prob_binom = binomial_dist.pmf(k)
    prob_poisson = poisson_dist.pmf(k)
    diff = abs(prob_binom - prob_poisson)
    print(f"{k:<5} {prob_binom:<15.6f} {prob_poisson:<15.6f} {diff:<15.6f}")

# Compare means and variances
print(f"\n{'Statistic':<20} {'Binomial':<15} {'Poisson':<15}")
print("-" * 60)
print(f"{'Mean':<20} {n*p:<15.4f} {lam:<15.4f}")
print(f"{'Variance':<20} {n*p*(1-p):<15.4f} {lam:<15.4f}")

print("=" * 60)

In [None]:
# Visual comparison of Binomial vs Poisson

x_range = np.arange(0, 15)
pmf_binomial = binomial_dist.pmf(x_range)
pmf_poisson = poisson_dist.pmf(x_range)

plt.figure(figsize=(12, 5))

# Plot both distributions
plt.plot(x_range, pmf_binomial, 'o-', label=f'Binomial(n={n}, p={p})',
         color='blue', markersize=8, linewidth=2)
plt.plot(x_range, pmf_poisson, 's--', label=f'Poisson(λ={lam})',
         color='red', markersize=6, linewidth=2, alpha=0.7)

plt.xlabel('Number of Defective Items (k)', fontsize=12)
plt.ylabel('Probability P(X = k)', fontsize=12)
plt.title('Comparison: Binomial vs Poisson Approximation', fontsize=14, fontweight='bold')
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)
plt.xticks(x_range)
plt.tight_layout()
plt.show()

print("Observation:")
print("  The Poisson approximation (red) closely matches")
print("  the exact Binomial (blue) for this scenario!")
print("\n  Maximum difference in probabilities: < 0.001")

### 📝 TO DO #5: When Does Approximation Break Down?

Test the approximation under different conditions:

In [None]:
# TO DO #5: Test approximation under different conditions

def compare_binomial_poisson(n, p, title):
    """Compare Binomial and Poisson distributions"""
    lam = n * p

    print(f"\n{title}")
    print(f"  n = {n}, p = {p}, λ = {lam}")

    # Check conditions
    conditions_met = []
    if n >= 20:
        conditions_met.append("✓ n ≥ 20")
    else:
        conditions_met.append("✗ n < 20")

    if p <= 0.05:
        conditions_met.append("✓ p ≤ 0.05")
    else:
        conditions_met.append("✗ p > 0.05")

    if lam < 10:
        conditions_met.append("✓ λ < 10")
    else:
        conditions_met.append("✗ λ ≥ 10")

    print(f"  Conditions: {', '.join(conditions_met)}")

    # Calculate max difference
    x = np.arange(0, min(n+1, 30))
    pmf_b = binom.pmf(x, n, p)
    pmf_p = poisson.pmf(x, lam)
    max_diff = np.max(np.abs(pmf_b - pmf_p))

    print(f"  Max difference: {max_diff:.6f}")

    if max_diff < 0.01:
        print(f"  → Excellent approximation!")
    elif max_diff < 0.05:
        print(f"  → Good approximation")
    else:
        print(f"  → Poor approximation")

# Test cases
print("Testing Poisson Approximation Under Different Conditions")
print("=" * 60)

# Good approximation (all conditions met)
compare_binomial_poisson(100, 0.045, "Case 1: All conditions met")

# TO DO: Uncomment these to test

# Case 2: p too large
# compare_binomial_poisson(100, 0.5, "Case 2: p too large (0.5)")

# Case 3: n too small
# compare_binomial_poisson(10, 0.05, "Case 3: n too small (10)")

# Case 4: λ too large
# compare_binomial_poisson(200, 0.3, "Case 4: λ too large (60)")

# Case 5: Borderline case
# compare_binomial_poisson(50, 0.1, "Case 5: Borderline (n=50, p=0.1)")

print("\n--- YOUR TURN ---")
print("Uncomment cases above to see when approximation works well!")
print("Notice: Approximation fails when conditions aren't met")

---

## Summary

### Key Concepts Covered:

**1. Binomial Distribution**
- Models: n independent trials with success probability p
- Formula: $P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}$
- Mean: np
- Variance: np(1-p)
- Use when: Fixed trials, two outcomes, constant probability

**2. Poisson Distribution**
- Models: Number of events in fixed interval
- Formula: $P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}$
- Mean: λ
- Variance: λ (same as mean!)
- Use when: Events occur independently at constant rate

**3. Relationship**
- Poisson approximates Binomial when:
  - n is large (≥ 20)
  - p is small (≤ 0.05)
  - λ = np is moderate (< 10)
- Simplifies calculations for large n

### Comparison Table:

| Feature | Binomial | Poisson |
|---------|----------|---------|
| **Models** | Fixed trials | Events in interval |
| **Parameters** | n, p | λ |
| **Mean** | np | λ |
| **Variance** | np(1-p) | λ |
| **Example** | Coin flips | Customer arrivals |
| **When to use** | Fixed n, success/failure | Counting rare events |

### Important Formulas:

**Binomial:**
- PMF: $P(X=k) = \binom{n}{k}p^k(1-p)^{n-k}$
- Mean: $\mu = np$
- Variance: $\sigma^2 = np(1-p)$

**Poisson:**
- PMF: $P(X=k) = \frac{\lambda^k e^{-\lambda}}{k!}$
- Mean: $\mu = \lambda$
- Variance: $\sigma^2 = \lambda$

---

## Practice Problems

**Problem A: Binomial**

A basketball player makes 75% of free throws. She takes 15 shots.

(a) What distribution does the number of successful shots follow?
(b) Find the mean and variance
(c) What is P(makes at least 10 shots)?
(d) What is P(makes exactly 12 shots)?

**Problem B: Poisson**

A website receives an average of 6 visitors per minute.

(a) What is P(no visitors in 1 minute)?
(b) What is P(more than 8 visitors in 1 minute)?
(c) What is the probability of getting exactly the expected number?
(d) Find P(between 4 and 8 visitors, inclusive)

**Problem C: Approximation**

A factory produces 200 items per day with a 2% defect rate.

(a) Set up both Binomial and Poisson models
(b) Check if Poisson approximation is valid
(c) Calculate P(X ≤ 5) using both distributions
(d) Compare the results

**Problem D: Applications**

For each scenario, identify if it's Binomial or Poisson and explain why:

1. Number of misprints on a newspaper page
2. Number of heads in 50 coin flips
3. Number of customers calling in 10 minutes
4. Number of students passing an exam (out of 30)
5. Number of car accidents on a highway per week

---

## Next Steps

**In Week 7, you'll explore:**
- **Normal (Gaussian) Distribution**: The most important continuous distribution
- **Standard Normal Distribution**: Z-scores and probability tables
- **Normal Approximation**: Using normal to approximate binomial

### Key Takeaways from Week 6:

1. **Binomial**: Fixed trials, success/failure
2. **Poisson**: Counting events in an interval
3. **Mean = Variance** is unique to Poisson
4. **Approximation**: Poisson simplifies binomial for large n, small p
5. **Real applications**: Quality control, customer arrivals, reliability

### When to Use Which?

**Use Binomial when:**
- You know the exact number of trials (n)
- Each trial is independent
- Probability of success is constant
- Example: Testing 20 products, each has 10% defect rate

**Use Poisson when:**
- Counting events in time/space
- Don't know exact number of "trials"
- Events are rare but can occur many times
- Example: Number of customers per hour (don't know how many "potential" customers)

**Use Poisson approximation when:**
- Binomial calculation too complex
- Conditions met (n large, p small, λ moderate)

### Practice Tips:

- Always identify distribution type first
- Check if approximation is valid before using
- Visualize with PMF plots
- Verify calculations with simulation
- Remember: Mean and variance tell you a lot!

**Keep practicing with the TO DO sections!**

---

**Great job completing Week 6! 🎉**

You now understand two fundamental discrete distributions used throughout statistics and data science!