# Lab 1: Introduction to Probability

## Learning Objectives

By the end of this lab, you will:
- Understand basic probability theory and axioms
- Calculate conditional probabilities
- Work with random variables and distributions
- Apply probability to real-world problems
- Build intuition for Bayesian reasoning

## Why Probability in AI?

Most real-world AI problems involve **uncertainty**:
- Sensor readings are noisy
- Information is incomplete
- Future events are unpredictable
- Models are approximations

Probability theory provides a principled framework for reasoning under uncertainty.

## Real-World Applications

- 🏥 **Medical Diagnosis**: What's the probability a patient has disease X given symptoms?
- 📧 **Spam Filtering**: How likely is this email to be spam?
- 🚗 **Self-Driving Cars**: What will other vehicles do next?
- 🎮 **Game AI**: What strategy will the opponent use?
- 📈 **Finance**: What's the risk of this investment?


In [None]:
# Import libraries
import numpy as np
import matplotlib.pyplot as plt
from typing import Dict, List, Tuple, Set
from itertools import product
import scipy.stats as stats
from collections import Counter

# Set random seed for reproducibility
np.random.seed(42)

# Plot settings
plt.style.use('seaborn-v0_8-darkgrid')
plt.rcParams['figure.figsize'] = (10, 6)

## Part 1: Probability Basics

### Key Concepts

**Sample Space (Ω)**: Set of all possible outcomes
- Coin flip: {H, T}
- Die roll: {1, 2, 3, 4, 5, 6}
- Weather: {Sunny, Rainy, Cloudy}

**Event (E)**: A subset of the sample space
- "Roll an even number": {2, 4, 6}
- "Get heads": {H}

**Probability (P)**: A measure of how likely an event is
- Range: 0 ≤ P(E) ≤ 1
- P(Ω) = 1 (something must happen)
- P(∅) = 0 (impossible event)

### Probability Axioms (Kolmogorov)

1. **Non-negativity**: P(E) ≥ 0 for all events E
2. **Normalization**: P(Ω) = 1
3. **Additivity**: For mutually exclusive events A and B:
   - P(A ∪ B) = P(A) + P(B)


In [None]:
class ProbabilitySpace:
    """A simple probability space for discrete events."""
    
    def __init__(self, outcomes: List[str], probabilities: List[float] = None):
        """
        Initialize a probability space.
        
        Args:
            outcomes: List of possible outcomes
            probabilities: List of probabilities (uniform if not provided)
        """
        self.outcomes = outcomes
        
        if probabilities is None:
            # Uniform distribution
            self.probabilities = {o: 1/len(outcomes) for o in outcomes}
        else:
            if len(probabilities) != len(outcomes):
                raise ValueError("Number of probabilities must match outcomes")
            if not np.isclose(sum(probabilities), 1.0):
                raise ValueError("Probabilities must sum to 1")
            if any(p < 0 for p in probabilities):
                raise ValueError("Probabilities must be non-negative")
            
            self.probabilities = dict(zip(outcomes, probabilities))
    
    def prob(self, event: Set[str]) -> float:
        """Calculate probability of an event."""
        return sum(self.probabilities[o] for o in event if o in self.outcomes)
    
    def sample(self, n: int = 1) -> List[str]:
        """Sample from the probability space."""
        return np.random.choice(
            self.outcomes,
            size=n,
            p=list(self.probabilities.values())
        ).tolist()
    
    def __repr__(self):
        return f"ProbabilitySpace({self.outcomes})"


# Example: Fair die
die = ProbabilitySpace(['1', '2', '3', '4', '5', '6'])
print("Fair die:")
print(f"P(roll 3) = {die.prob({'3'})}")
print(f"P(roll even) = {die.prob({'2', '4', '6'})}")
print(f"P(roll > 4) = {die.prob({'5', '6'})}")

# Example: Biased coin
biased_coin = ProbabilitySpace(['H', 'T'], [0.7, 0.3])
print("\nBiased coin (70% heads):")
print(f"P(heads) = {biased_coin.prob({'H'})}")
print(f"P(tails) = {biased_coin.prob({'T'})}")

### Visualizing Probability Distributions

In [None]:
def visualize_distribution(prob_space: ProbabilitySpace, title: str = "Probability Distribution"):
    """Visualize a probability distribution."""
    outcomes = list(prob_space.probabilities.keys())
    probs = list(prob_space.probabilities.values())
    
    plt.figure(figsize=(10, 6))
    bars = plt.bar(outcomes, probs, color='skyblue', edgecolor='navy', alpha=0.7)
    
    # Add probability labels on bars
    for bar, prob in zip(bars, probs):
        height = bar.get_height()
        plt.text(bar.get_x() + bar.get_width()/2., height,
                f'{prob:.3f}',
                ha='center', va='bottom', fontweight='bold')
    
    plt.xlabel('Outcome', fontsize=12, fontweight='bold')
    plt.ylabel('Probability', fontsize=12, fontweight='bold')
    plt.title(title, fontsize=14, fontweight='bold')
    plt.ylim(0, max(probs) * 1.2)
    plt.grid(axis='y', alpha=0.3)
    plt.tight_layout()
    plt.show()

# Visualize our examples
visualize_distribution(die, "Fair Six-Sided Die")
visualize_distribution(biased_coin, "Biased Coin (70% Heads)")

## Part 2: Conditional Probability

### What is Conditional Probability?

**Conditional Probability**: The probability of event A given that event B has occurred.

$$P(A|B) = \frac{P(A \cap B)}{P(B)}$$

**Intuition**: We're restricting our sample space to only cases where B is true.

**Example**: 
- P(rain tomorrow) = 0.3
- P(rain tomorrow | cloudy today) = 0.7

Knowing it's cloudy increases our belief that it will rain!


In [None]:
def conditional_probability(p_a_and_b: float, p_b: float) -> float:
    """
    Calculate P(A|B) given P(A and B) and P(B).
    
    Args:
        p_a_and_b: Probability of both A and B
        p_b: Probability of B
    
    Returns:
        P(A|B)
    """
    if p_b == 0:
        raise ValueError("Cannot condition on zero-probability event")
    return p_a_and_b / p_b


# Example: Medical test
print("Medical Test Example")
print("=" * 50)

# Suppose:
# - 1% of population has disease
# - Test is 95% accurate (detects disease when present)
# - Test has 5% false positive rate

p_disease = 0.01
p_test_pos_given_disease = 0.95  # True positive rate
p_test_pos_given_healthy = 0.05  # False positive rate

# P(test positive AND disease) = P(disease) * P(test+|disease)
p_test_pos_and_disease = p_disease * p_test_pos_given_disease

# P(test positive AND healthy) = P(healthy) * P(test+|healthy)
p_healthy = 1 - p_disease
p_test_pos_and_healthy = p_healthy * p_test_pos_given_healthy

# P(test positive) = sum of both cases
p_test_pos = p_test_pos_and_disease + p_test_pos_and_healthy

print(f"P(disease) = {p_disease:.3f}")
print(f"P(test +) = {p_test_pos:.3f}")
print(f"P(test + AND disease) = {p_test_pos_and_disease:.3f}")

# What's the probability of having disease given positive test?
p_disease_given_test_pos = conditional_probability(p_test_pos_and_disease, p_test_pos)
print(f"\nP(disease | test +) = {p_disease_given_test_pos:.3f}")
print(f"\nSurprise! Even with a positive test, only {p_disease_given_test_pos*100:.1f}% chance of disease!")
print("This is because the disease is rare.")

### Visualization: Conditional Probability

In [None]:
# Visualize the medical test example
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Population breakdown
categories = ['Has Disease\n& Test+', 'Healthy\n& Test+', 'Has Disease\n& Test-', 'Healthy\n& Test-']
probabilities = [
    p_test_pos_and_disease,
    p_test_pos_and_healthy,
    p_disease * (1 - p_test_pos_given_disease),
    p_healthy * (1 - p_test_pos_given_healthy)
]
colors = ['red', 'orange', 'darkred', 'green']

ax1.bar(categories, probabilities, color=colors, alpha=0.7, edgecolor='black')
ax1.set_ylabel('Probability', fontweight='bold')
ax1.set_title('Population Breakdown', fontweight='bold', fontsize=14)
ax1.grid(axis='y', alpha=0.3)

for i, (cat, prob) in enumerate(zip(categories, probabilities)):
    ax1.text(i, prob, f'{prob:.4f}', ha='center', va='bottom', fontweight='bold')

# Conditional probability visualization
test_pos_breakdown = ['Has Disease\n(given Test+)', 'Healthy\n(given Test+)']
conditional_probs = [
    p_disease_given_test_pos,
    1 - p_disease_given_test_pos
]

ax2.bar(test_pos_breakdown, conditional_probs, color=['red', 'green'], alpha=0.7, edgecolor='black')
ax2.set_ylabel('Conditional Probability', fontweight='bold')
ax2.set_title('Given Positive Test Result', fontweight='bold', fontsize=14)
ax2.set_ylim(0, 1)
ax2.grid(axis='y', alpha=0.3)

for i, (cat, prob) in enumerate(zip(test_pos_breakdown, conditional_probs)):
    ax2.text(i, prob, f'{prob:.3f}', ha='center', va='bottom', fontweight='bold')

plt.tight_layout()
plt.show()

## Part 3: Independence

### What is Independence?

Events A and B are **independent** if:
$$P(A|B) = P(A)$$

Or equivalently:
$$P(A \cap B) = P(A) \cdot P(B)$$

**Intuition**: Knowing B tells us nothing about A.

**Examples**:
- Coin flips are independent
- Die rolls are independent
- Drawing cards **with replacement** is independent
- Drawing cards **without replacement** is NOT independent


In [None]:
def are_independent(p_a: float, p_b: float, p_a_and_b: float, tolerance: float = 1e-6) -> bool:
    """
    Check if events A and B are independent.
    
    Args:
        p_a: P(A)
        p_b: P(B)
        p_a_and_b: P(A ∩ B)
        tolerance: Numerical tolerance for floating point comparison
    
    Returns:
        True if events are independent
    """
    expected = p_a * p_b
    return abs(p_a_and_b - expected) < tolerance


# Example 1: Two independent coin flips
print("Example 1: Two Independent Coin Flips")
print("=" * 50)
p_first_heads = 0.5
p_second_heads = 0.5
p_both_heads = 0.25

independent = are_independent(p_first_heads, p_second_heads, p_both_heads)
print(f"P(1st heads) = {p_first_heads}")
print(f"P(2nd heads) = {p_second_heads}")
print(f"P(both heads) = {p_both_heads}")
print(f"Expected if independent: {p_first_heads * p_second_heads}")
print(f"Are they independent? {independent}")

# Example 2: Drawing cards without replacement
print("\nExample 2: Drawing Cards Without Replacement")
print("=" * 50)
# P(1st card is Ace) = 4/52
# P(2nd card is Ace | 1st is Ace) = 3/51
# P(both Aces) = (4/52) * (3/51)
p_first_ace = 4/52
p_second_ace = 4/52  # Marginal probability
p_both_aces = (4/52) * (3/51)

independent = are_independent(p_first_ace, p_second_ace, p_both_aces)
print(f"P(1st Ace) = {p_first_ace:.4f}")
print(f"P(2nd Ace) = {p_second_ace:.4f}")
print(f"P(both Aces) = {p_both_aces:.4f}")
print(f"Expected if independent: {p_first_ace * p_second_ace:.4f}")
print(f"Are they independent? {independent}")
print("Not independent! First draw affects the second.")

## Part 4: Joint and Marginal Probabilities

### Joint Probability

**Joint Probability**: Probability of multiple events occurring together.
- P(A, B) or P(A ∩ B)

### Marginal Probability

**Marginal Probability**: Probability of one event, ignoring others.
- Obtained by summing joint probabilities
- P(A) = Σ P(A, B) for all B


In [None]:
class JointDistribution:
    """Represent a joint probability distribution."""
    
    def __init__(self):
        self.distribution = {}
    
    def set_prob(self, event: Tuple, prob: float):
        """Set probability for a joint event."""
        self.distribution[event] = prob
    
    def get_prob(self, event: Tuple) -> float:
        """Get probability of joint event."""
        return self.distribution.get(event, 0.0)
    
    def marginal(self, variable_index: int) -> Dict:
        """Calculate marginal probability for a variable."""
        marginals = {}
        for event, prob in self.distribution.items():
            value = event[variable_index]
            marginals[value] = marginals.get(value, 0) + prob
        return marginals
    
    def conditional(self, given: Dict[int, any]) -> 'JointDistribution':
        """Calculate conditional distribution given some variables."""
        # Filter events matching the given values
        filtered = {}
        for event, prob in self.distribution.items():
            match = all(event[idx] == val for idx, val in given.items())
            if match:
                filtered[event] = prob
        
        # Normalize
        total = sum(filtered.values())
        if total > 0:
            filtered = {e: p/total for e, p in filtered.items()}
        
        result = JointDistribution()
        result.distribution = filtered
        return result
    
    def __repr__(self):
        return f"JointDistribution with {len(self.distribution)} events"


# Example: Weather and Traffic
# Variables: Weather (Sunny/Rainy), Traffic (Light/Heavy)
joint = JointDistribution()

# Set joint probabilities
joint.set_prob(('Sunny', 'Light'), 0.40)
joint.set_prob(('Sunny', 'Heavy'), 0.10)
joint.set_prob(('Rainy', 'Light'), 0.15)
joint.set_prob(('Rainy', 'Heavy'), 0.35)

print("Joint Distribution: P(Weather, Traffic)")
print("=" * 50)
for event, prob in joint.distribution.items():
    print(f"P{event} = {prob:.2f}")

# Calculate marginals
print("\nMarginal Probabilities:")
print("=" * 50)
weather_marginal = joint.marginal(0)
print("P(Weather):")
for weather, prob in weather_marginal.items():
    print(f"  P({weather}) = {prob:.2f}")

traffic_marginal = joint.marginal(1)
print("\nP(Traffic):")
for traffic, prob in traffic_marginal.items():
    print(f"  P({traffic}) = {prob:.2f}")

# Conditional probability
print("\nConditional Probability:")
print("=" * 50)
given_rainy = joint.conditional({0: 'Rainy'})
print("P(Traffic | Rainy):")
traffic_given_rainy = given_rainy.marginal(1)
for traffic, prob in traffic_given_rainy.items():
    print(f"  P({traffic} | Rainy) = {prob:.2f}")

### Visualizing Joint Distributions

In [None]:
# Create heatmap for joint distribution
weather_values = ['Sunny', 'Rainy']
traffic_values = ['Light', 'Heavy']

joint_matrix = np.array([
    [joint.get_prob(('Sunny', 'Light')), joint.get_prob(('Sunny', 'Heavy'))],
    [joint.get_prob(('Rainy', 'Light')), joint.get_prob(('Rainy', 'Heavy'))]
])

fig, ax = plt.subplots(figsize=(8, 6))
im = ax.imshow(joint_matrix, cmap='YlOrRd', aspect='auto')

# Set ticks and labels
ax.set_xticks(range(len(traffic_values)))
ax.set_yticks(range(len(weather_values)))
ax.set_xticklabels(traffic_values)
ax.set_yticklabels(weather_values)

# Add text annotations
for i in range(len(weather_values)):
    for j in range(len(traffic_values)):
        text = ax.text(j, i, f'{joint_matrix[i, j]:.2f}',
                      ha="center", va="center", color="black",
                      fontsize=16, fontweight='bold')

ax.set_xlabel('Traffic', fontsize=12, fontweight='bold')
ax.set_ylabel('Weather', fontsize=12, fontweight='bold')
ax.set_title('Joint Probability: P(Weather, Traffic)', fontsize=14, fontweight='bold')

# Add colorbar
cbar = plt.colorbar(im, ax=ax)
cbar.set_label('Probability', fontsize=12, fontweight='bold')

plt.tight_layout()
plt.show()

## Part 5: Random Variables and Distributions

### Random Variables

A **random variable** is a function that maps outcomes to numbers.
- Discrete: Finite or countable values (coin flips, die rolls)
- Continuous: Infinite values in a range (height, temperature)

### Common Distributions

1. **Uniform**: All outcomes equally likely
2. **Bernoulli**: Single binary trial (coin flip)
3. **Binomial**: n independent Bernoulli trials
4. **Normal (Gaussian)**: Bell curve, ubiquitous in nature


In [None]:
# Visualize common distributions
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# 1. Uniform distribution
ax = axes[0, 0]
x_uniform = np.arange(1, 7)
p_uniform = np.ones(6) / 6
ax.bar(x_uniform, p_uniform, color='skyblue', edgecolor='navy', alpha=0.7)
ax.set_title('Uniform Distribution (Fair Die)', fontweight='bold', fontsize=12)
ax.set_xlabel('Outcome')
ax.set_ylabel('Probability')
ax.set_ylim(0, 0.3)
ax.grid(axis='y', alpha=0.3)

# 2. Bernoulli distribution
ax = axes[0, 1]
x_bernoulli = [0, 1]
p_bernoulli = [0.3, 0.7]
ax.bar(x_bernoulli, p_bernoulli, color='lightcoral', edgecolor='darkred', alpha=0.7)
ax.set_title('Bernoulli Distribution (p=0.7)', fontweight='bold', fontsize=12)
ax.set_xlabel('Outcome')
ax.set_ylabel('Probability')
ax.set_xticks([0, 1])
ax.set_xticklabels(['Failure', 'Success'])
ax.set_ylim(0, 1)
ax.grid(axis='y', alpha=0.3)

# 3. Binomial distribution
ax = axes[1, 0]
n, p = 10, 0.5
x_binomial = np.arange(0, n+1)
p_binomial = stats.binom.pmf(x_binomial, n, p)
ax.bar(x_binomial, p_binomial, color='lightgreen', edgecolor='darkgreen', alpha=0.7)
ax.set_title(f'Binomial Distribution (n={n}, p={p})', fontweight='bold', fontsize=12)
ax.set_xlabel('Number of Successes')
ax.set_ylabel('Probability')
ax.grid(axis='y', alpha=0.3)

# 4. Normal distribution
ax = axes[1, 1]
mu, sigma = 0, 1
x_normal = np.linspace(-4, 4, 100)
p_normal = stats.norm.pdf(x_normal, mu, sigma)
ax.plot(x_normal, p_normal, 'b-', linewidth=2, label=f'μ={mu}, σ={sigma}')
ax.fill_between(x_normal, p_normal, alpha=0.3)
ax.set_title('Normal (Gaussian) Distribution', fontweight='bold', fontsize=12)
ax.set_xlabel('Value')
ax.set_ylabel('Probability Density')
ax.legend()
ax.grid(alpha=0.3)

plt.tight_layout()
plt.show()

## Part 6: Law of Total Probability

### The Law

If {B₁, B₂, ..., Bₙ} partition the sample space, then:

$$P(A) = \sum_{i=1}^{n} P(A|B_i) \cdot P(B_i)$$

**Intuition**: Break down a complex probability into simpler conditional probabilities.


In [None]:
def total_probability(conditionals: List[float], priors: List[float]) -> float:
    """
    Calculate P(A) using law of total probability.
    
    Args:
        conditionals: List of P(A|Bi) values
        priors: List of P(Bi) values
    
    Returns:
        P(A)
    """
    if len(conditionals) != len(priors):
        raise ValueError("Must have same number of conditionals and priors")
    
    return sum(c * p for c, p in zip(conditionals, priors))


# Example: Factory defect rates
print("Factory Defect Example")
print("=" * 50)
print("Three factories produce components:")
print("- Factory A: 50% of production, 1% defect rate")
print("- Factory B: 30% of production, 2% defect rate")
print("- Factory C: 20% of production, 5% defect rate")
print()

# P(defect | factory)
p_defect_given_factory = [0.01, 0.02, 0.05]

# P(factory)
p_factory = [0.50, 0.30, 0.20]

# P(defect) = sum of P(defect|factory) * P(factory)
p_defect = total_probability(p_defect_given_factory, p_factory)

print(f"Overall defect rate: {p_defect:.4f} or {p_defect*100:.2f}%")
print()
print("Calculation:")
for factory, (p_d_f, p_f) in enumerate(zip(p_defect_given_factory, p_factory), ord('A')):
    contribution = p_d_f * p_f
    print(f"  Factory {chr(factory)}: {p_d_f} × {p_f} = {contribution:.4f}")
print(f"  Total: {p_defect:.4f}")

## Part 7: Simulation and Monte Carlo Methods

When analytical solutions are hard, we can **simulate**!

**Monte Carlo Method**: Use random sampling to approximate probabilities.


In [None]:
def monte_carlo_probability(event_function, n_trials: int = 10000) -> float:
    """
    Estimate probability using Monte Carlo simulation.
    
    Args:
        event_function: Function that returns True when event occurs
        n_trials: Number of simulation trials
    
    Returns:
        Estimated probability
    """
    successes = sum(1 for _ in range(n_trials) if event_function())
    return successes / n_trials


# Example 1: Birthday paradox
def birthday_collision(n_people: int = 23) -> bool:
    """Simulate if any two people share a birthday."""
    birthdays = np.random.randint(1, 366, size=n_people)
    return len(birthdays) != len(set(birthdays))

print("Birthday Paradox Simulation")
print("=" * 50)
for n in [10, 23, 30, 50]:
    prob = monte_carlo_probability(lambda: birthday_collision(n), n_trials=10000)
    print(f"P(collision with {n:2d} people) ≈ {prob:.3f}")

print("\nSurprise! With just 23 people, ~50% chance of shared birthday!")


# Example 2: Estimating π
def point_in_circle() -> bool:
    """Check if random point in unit square is inside unit circle."""
    x, y = np.random.uniform(-1, 1, 2)
    return x**2 + y**2 <= 1

print("\nEstimating π using Monte Carlo")
print("=" * 50)
for n_trials in [100, 1000, 10000, 100000]:
    prob_in_circle = monte_carlo_probability(point_in_circle, n_trials)
    pi_estimate = prob_in_circle * 4  # Area of circle / area of square = π/4
    error = abs(pi_estimate - np.pi)
    print(f"Trials: {n_trials:6d} | π ≈ {pi_estimate:.5f} | Error: {error:.5f}")

print(f"\nTrue value: π = {np.pi:.5f}")

## Exercises

### Exercise 1: Dice Probabilities
Calculate the probability of rolling a sum of 7 with two fair dice.

In [None]:
# TODO: Calculate P(sum = 7) with two dice
# Hint: Count favorable outcomes and divide by total outcomes

# Your code here
pass

### Exercise 2: Conditional Probability
A bag contains 3 red balls and 5 blue balls. Two balls are drawn without replacement.
What's P(second is red | first is red)?

In [None]:
# TODO: Calculate conditional probability
# Your code here
pass

### Exercise 3: Monte Carlo Simulation
Estimate the probability of getting at least one 6 when rolling a die 4 times.

In [None]:
# TODO: Use Monte Carlo to estimate P(at least one 6 in 4 rolls)
# Your code here
pass

## Summary

### Key Takeaways

1. **Probability measures uncertainty** - Essential for real-world AI
2. **Conditional probability** - How evidence updates beliefs
3. **Independence** - When events don't affect each other
4. **Joint & Marginal** - Working with multiple variables
5. **Distributions** - Common patterns of randomness
6. **Simulation** - Powerful tool when math is hard

### Next Steps

In Lab 2, we'll learn about **Bayes' Theorem** - the foundation of probabilistic AI!

### Further Reading

- Khan Academy: Probability and Statistics
- "Probability Theory: The Logic of Science" by E.T. Jaynes
- "Introduction to Probability" by Bertsekas & Tsitsiklis
