# Lab 2: Bayes' Theorem and Inference

## Learning Objectives

By the end of this lab, you will:
- Understand and derive Bayes' theorem
- Apply Bayesian inference to real problems
- Build a naive Bayes classifier from scratch
- Use pgmpy for probabilistic inference
- Design systems that update beliefs with evidence

## Why Bayes' Theorem?

Bayes' theorem is the **foundation of probabilistic AI**. It tells us how to:
- Update beliefs when we get new evidence
- Reason backward from effects to causes
- Make decisions under uncertainty

**Real-world applications**:
- 🏥 Medical diagnosis from symptoms
- 📧 Spam filtering from email content  
- 🔍 Search engines ranking results
- 🤖 Robot localization from sensors
- 💬 Language understanding


In [None]:
# Import libraries
import numpy as np
import matplotlib.pyplot as plt
from typing import Dict, List, Tuple, Optional
from collections import defaultdict, Counter
import pandas as pd
from scipy import stats

# Set random seed
np.random.seed(42)

# Plot settings
plt.style.use('seaborn-v0_8-darkgrid')
plt.rcParams['figure.figsize'] = (12, 6)

## Part 1: Understanding Bayes' Theorem

### The Formula

$$P(H|E) = \frac{P(E|H) \cdot P(H)}{P(E)}$$

Where:
- **P(H|E)** = **Posterior**: Probability of hypothesis H given evidence E
- **P(E|H)** = **Likelihood**: Probability of evidence E if hypothesis H is true
- **P(H)** = **Prior**: Initial probability of hypothesis H
- **P(E)** = **Evidence**: Total probability of evidence E (normalization constant)

### Intuition

Think of Bayes' theorem as **updating beliefs**:
1. Start with a prior belief P(H)
2. Observe evidence E
3. Update to posterior belief P(H|E)

### Derivation

From conditional probability:
- P(H|E) = P(H,E) / P(E)
- P(E|H) = P(H,E) / P(H)
- Therefore: P(H,E) = P(E|H) × P(H)
- Substitute: P(H|E) = P(E|H) × P(H) / P(E)


In [None]:
def bayes_theorem(prior: float, likelihood: float, evidence: float) -> float:
    """
    Calculate posterior probability using Bayes' theorem.
    
    Args:
        prior: P(H) - initial probability of hypothesis
        likelihood: P(E|H) - probability of evidence given hypothesis
        evidence: P(E) - total probability of evidence
    
    Returns:
        posterior: P(H|E) - updated probability of hypothesis
    """
    if evidence == 0:
        raise ValueError("Evidence probability cannot be zero")
    
    posterior = (likelihood * prior) / evidence
    return posterior


def bayes_with_normalization(prior: float, likelihood: float, 
                            prior_complement: float, likelihood_complement: float) -> float:
    """
    Calculate posterior using law of total probability for normalization.
    
    P(E) = P(E|H)·P(H) + P(E|¬H)·P(¬H)
    
    Args:
        prior: P(H)
        likelihood: P(E|H)
        prior_complement: P(¬H)
        likelihood_complement: P(E|¬H)
    
    Returns:
        posterior: P(H|E)
    """
    # Calculate evidence using law of total probability
    evidence = likelihood * prior + likelihood_complement * prior_complement
    
    return bayes_theorem(prior, likelihood, evidence)


# Example: Medical diagnosis revisited
print("Medical Test Example with Bayes' Theorem")
print("=" * 60)

# Given information
p_disease = 0.01  # 1% of population has disease (prior)
p_test_pos_given_disease = 0.95  # 95% sensitivity (likelihood)
p_test_pos_given_healthy = 0.05  # 5% false positive rate

# Calculate P(disease | test positive)
p_healthy = 1 - p_disease
posterior = bayes_with_normalization(
    prior=p_disease,
    likelihood=p_test_pos_given_disease,
    prior_complement=p_healthy,
    likelihood_complement=p_test_pos_given_healthy
)

print(f"Prior probability: P(disease) = {p_disease:.3f}")
print(f"Likelihood: P(test+ | disease) = {p_test_pos_given_disease:.3f}")
print(f"False positive rate: P(test+ | healthy) = {p_test_pos_given_healthy:.3f}")
print()
print(f"Posterior probability: P(disease | test+) = {posterior:.3f}")
print()
print(f"Interpretation: Even with a positive test, only {posterior*100:.1f}% chance of disease.")
print(f"The prior matters! Disease is rare, so most positives are false positives.")

### Visualizing Bayesian Updating

In [None]:
def visualize_bayesian_update(prior: float, likelihood: float, 
                             likelihood_complement: float, title: str = "Bayesian Update"):
    """
    Visualize how Bayes' theorem updates beliefs.
    """
    prior_complement = 1 - prior
    
    # Calculate posterior
    posterior = bayes_with_normalization(
        prior, likelihood, prior_complement, likelihood_complement
    )
    posterior_complement = 1 - posterior
    
    # Create visualization
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
    
    # Prior distribution
    categories = ['Hypothesis', '¬Hypothesis']
    prior_probs = [prior, prior_complement]
    colors = ['#3498db', '#e74c3c']
    
    bars1 = ax1.bar(categories, prior_probs, color=colors, alpha=0.7, edgecolor='black', linewidth=2)
    ax1.set_ylabel('Probability', fontsize=12, fontweight='bold')
    ax1.set_title('Prior Beliefs (Before Evidence)', fontsize=13, fontweight='bold')
    ax1.set_ylim(0, 1)
    ax1.grid(axis='y', alpha=0.3)
    
    for bar, prob in zip(bars1, prior_probs):
        height = bar.get_height()
        ax1.text(bar.get_x() + bar.get_width()/2., height,
                f'{prob:.3f}', ha='center', va='bottom', fontsize=14, fontweight='bold')
    
    # Posterior distribution
    posterior_probs = [posterior, posterior_complement]
    bars2 = ax2.bar(categories, posterior_probs, color=colors, alpha=0.7, edgecolor='black', linewidth=2)
    ax2.set_ylabel('Probability', fontsize=12, fontweight='bold')
    ax2.set_title('Posterior Beliefs (After Evidence)', fontsize=13, fontweight='bold')
    ax2.set_ylim(0, 1)
    ax2.grid(axis='y', alpha=0.3)
    
    for bar, prob in zip(bars2, posterior_probs):
        height = bar.get_height()
        ax2.text(bar.get_x() + bar.get_width()/2., height,
                f'{prob:.3f}', ha='center', va='bottom', fontsize=14, fontweight='bold')
    
    # Add change arrows
    change = posterior - prior
    change_pct = (change / prior * 100) if prior > 0 else 0
    
    fig.text(0.5, 0.02, 
            f'Evidence increases belief in hypothesis by {change_pct:.1f}%',
            ha='center', fontsize=12, fontweight='bold',
            bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))
    
    plt.tight_layout(rect=[0, 0.05, 1, 1])
    plt.show()


# Visualize the medical test example
visualize_bayesian_update(
    prior=0.01,
    likelihood=0.95,
    likelihood_complement=0.05,
    title="Medical Test"
)

## Part 2: Sequential Bayesian Updating

One of the most powerful aspects of Bayesian reasoning is **sequential updating**.

**Key insight**: Today's posterior becomes tomorrow's prior!

When we get multiple pieces of evidence:
1. Start with prior P(H)
2. Update with evidence E₁ → get posterior P(H|E₁)
3. Use P(H|E₁) as new prior
4. Update with evidence E₂ → get P(H|E₁,E₂)
5. Repeat...


In [None]:
class BayesianUpdater:
    """Track belief updates as evidence arrives."""
    
    def __init__(self, prior: float, hypothesis_name: str = "H"):
        """
        Initialize Bayesian updater.
        
        Args:
            prior: Initial probability of hypothesis
            hypothesis_name: Name of hypothesis for display
        """
        self.current_belief = prior
        self.hypothesis_name = hypothesis_name
        self.history = [prior]
        self.evidence_log = []
    
    def update(self, likelihood_if_true: float, likelihood_if_false: float, 
              evidence_name: str = "Evidence"):
        """
        Update belief with new evidence.
        
        Args:
            likelihood_if_true: P(E|H)
            likelihood_if_false: P(E|¬H)
            evidence_name: Description of evidence
        """
        prior = self.current_belief
        prior_complement = 1 - prior
        
        # Update using Bayes' theorem
        posterior = bayes_with_normalization(
            prior, likelihood_if_true,
            prior_complement, likelihood_if_false
        )
        
        self.current_belief = posterior
        self.history.append(posterior)
        self.evidence_log.append(evidence_name)
        
        return posterior
    
    def get_belief(self) -> float:
        """Get current belief."""
        return self.current_belief
    
    def plot_history(self):
        """Plot belief evolution over time."""
        plt.figure(figsize=(12, 6))
        
        steps = range(len(self.history))
        plt.plot(steps, self.history, 'o-', linewidth=2, markersize=10, 
                color='#3498db', markeredgecolor='navy', markeredgewidth=2)
        
        # Add labels for each evidence
        for i, evidence in enumerate(self.evidence_log, 1):
            plt.annotate(evidence, 
                        xy=(i, self.history[i]), 
                        xytext=(10, 10),
                        textcoords='offset points',
                        ha='left',
                        fontsize=9,
                        bbox=dict(boxstyle='round,pad=0.3', facecolor='yellow', alpha=0.5),
                        arrowprops=dict(arrowstyle='->', connectionstyle='arc3,rad=0'))
        
        plt.xlabel('Evidence Step', fontsize=12, fontweight='bold')
        plt.ylabel(f'P({self.hypothesis_name})', fontsize=12, fontweight='bold')
        plt.title(f'Bayesian Belief Updating for "{self.hypothesis_name}"', 
                 fontsize=14, fontweight='bold')
        plt.ylim(0, 1)
        plt.grid(alpha=0.3)
        plt.tight_layout()
        plt.show()
    
    def __repr__(self):
        return f"BayesianUpdater(belief={self.current_belief:.4f})"


# Example: Doctor diagnosing a patient
print("Sequential Diagnosis Example")
print("=" * 60)

# Start with base rate of flu in population
updater = BayesianUpdater(prior=0.05, hypothesis_name="Has Flu")
print(f"Initial belief: P(Flu) = {updater.get_belief():.3f}")
print()

# Evidence 1: Patient has fever
# P(fever | flu) = 0.90, P(fever | no flu) = 0.10
belief = updater.update(
    likelihood_if_true=0.90,
    likelihood_if_false=0.10,
    evidence_name="Has Fever"
)
print(f"After observing fever: P(Flu | fever) = {belief:.3f}")

# Evidence 2: Patient has cough
# P(cough | flu) = 0.85, P(cough | no flu) = 0.15
belief = updater.update(
    likelihood_if_true=0.85,
    likelihood_if_false=0.15,
    evidence_name="Has Cough"
)
print(f"After observing cough: P(Flu | fever, cough) = {belief:.3f}")

# Evidence 3: Rapid flu test positive
# P(test+ | flu) = 0.95, P(test+ | no flu) = 0.05
belief = updater.update(
    likelihood_if_true=0.95,
    likelihood_if_false=0.05,
    evidence_name="Positive Flu Test"
)
print(f"After positive test: P(Flu | all evidence) = {belief:.3f}")
print()
print(f"Final diagnosis: {belief*100:.1f}% confident patient has flu")

# Visualize the updating process
updater.plot_history()

## Part 3: Naive Bayes Classifier

### What is Naive Bayes?

A **classification algorithm** based on Bayes' theorem with a "naive" independence assumption.

**Goal**: Given features x₁, x₂, ..., xₙ, predict class C

$$P(C|x_1,...,x_n) = \frac{P(x_1,...,x_n|C) \cdot P(C)}{P(x_1,...,x_n)}$$

**Naive assumption**: Features are conditionally independent given the class.

$$P(x_1,...,x_n|C) = P(x_1|C) \cdot P(x_2|C) \cdot ... \cdot P(x_n|C)$$

This simplifies computation dramatically!

### Why "Naive"?

Features are rarely truly independent (e.g., in spam detection, "free" and "money" often appear together).
But surprisingly, **Naive Bayes works well even when the assumption is violated**!


In [None]:
class NaiveBayesClassifier:
    """Naive Bayes classifier from scratch."""
    
    def __init__(self):
        self.class_priors = {}  # P(C)
        self.feature_likelihoods = {}  # P(feature|C)
        self.classes = []
        self.features = []
    
    def fit(self, X: List[List], y: List):
        """
        Train the classifier.
        
        Args:
            X: List of feature lists (samples)
            y: List of class labels
        """
        n_samples = len(X)
        self.classes = list(set(y))
        
        # Calculate class priors P(C)
        class_counts = Counter(y)
        for cls in self.classes:
            self.class_priors[cls] = class_counts[cls] / n_samples
        
        # Calculate feature likelihoods P(feature|C)
        self.feature_likelihoods = {cls: defaultdict(lambda: defaultdict(int)) 
                                   for cls in self.classes}
        
        # Count occurrences
        for features, label in zip(X, y):
            for feature_idx, feature_value in enumerate(features):
                self.feature_likelihoods[label][feature_idx][feature_value] += 1
        
        # Convert counts to probabilities with Laplace smoothing
        for cls in self.classes:
            n_samples_class = class_counts[cls]
            for feature_idx in self.feature_likelihoods[cls]:
                n_unique_values = len(self.feature_likelihoods[cls][feature_idx])
                for feature_value in self.feature_likelihoods[cls][feature_idx]:
                    count = self.feature_likelihoods[cls][feature_idx][feature_value]
                    # Laplace smoothing: add 1 to numerator and n_unique to denominator
                    self.feature_likelihoods[cls][feature_idx][feature_value] = \
                        (count + 1) / (n_samples_class + n_unique_values)
    
    def predict_proba(self, features: List) -> Dict[str, float]:
        """
        Calculate probability for each class.
        
        Args:
            features: List of feature values
        
        Returns:
            Dictionary mapping class to probability
        """
        posteriors = {}
        
        for cls in self.classes:
            # Start with prior
            posterior = np.log(self.class_priors[cls])
            
            # Multiply by likelihoods (add in log space)
            for feature_idx, feature_value in enumerate(features):
                if feature_value in self.feature_likelihoods[cls][feature_idx]:
                    likelihood = self.feature_likelihoods[cls][feature_idx][feature_value]
                else:
                    # Unseen feature value, use smoothing
                    likelihood = 1e-6
                posterior += np.log(likelihood)
            
            posteriors[cls] = posterior
        
        # Convert from log space and normalize
        max_log_posterior = max(posteriors.values())
        posteriors = {cls: np.exp(p - max_log_posterior) for cls, p in posteriors.items()}
        total = sum(posteriors.values())
        posteriors = {cls: p / total for cls, p in posteriors.items()}
        
        return posteriors
    
    def predict(self, features: List) -> str:
        """
        Predict the most likely class.
        
        Args:
            features: List of feature values
        
        Returns:
            Predicted class
        """
        posteriors = self.predict_proba(features)
        return max(posteriors, key=posteriors.get)
    
    def __repr__(self):
        return f"NaiveBayesClassifier(classes={self.classes})"


# Example: Simple spam classifier
print("Spam Classifier Example")
print("=" * 60)

# Training data: [has_word_free, has_word_money, has_word_meeting]
X_train = [
    [1, 1, 0],  # spam
    [1, 1, 0],  # spam
    [0, 1, 0],  # spam
    [1, 0, 0],  # spam
    [0, 0, 1],  # ham
    [0, 0, 1],  # ham
    [0, 0, 1],  # ham
    [0, 1, 1],  # ham
]

y_train = ['spam', 'spam', 'spam', 'spam', 'ham', 'ham', 'ham', 'ham']

# Train classifier
nb = NaiveBayesClassifier()
nb.fit(X_train, y_train)

print("Class priors:")
for cls, prob in nb.class_priors.items():
    print(f"  P({cls}) = {prob:.3f}")

# Test email: has "free" and "money", no "meeting"
test_email = [1, 1, 0]
print("\nTest email features: [has_free=1, has_money=1, has_meeting=0]")
print()

probs = nb.predict_proba(test_email)
prediction = nb.predict(test_email)

print("Posterior probabilities:")
for cls, prob in probs.items():
    print(f"  P({cls} | features) = {prob:.4f}")

print(f"\nPrediction: {prediction}")

## Part 4: Text Classification with Naive Bayes

Let's build a more realistic text classifier that works with actual text documents.


In [None]:
class TextNaiveBayes:
    """Naive Bayes for text classification."""
    
    def __init__(self):
        self.class_priors = {}
        self.word_probs = {}  # P(word|class)
        self.vocab = set()
        self.classes = []
    
    def tokenize(self, text: str) -> List[str]:
        """Simple tokenization: lowercase and split."""
        return text.lower().split()
    
    def fit(self, documents: List[str], labels: List[str]):
        """
        Train on text documents.
        
        Args:
            documents: List of text documents
            labels: List of class labels
        """
        self.classes = list(set(labels))
        n_docs = len(documents)
        
        # Calculate class priors
        class_counts = Counter(labels)
        for cls in self.classes:
            self.class_priors[cls] = class_counts[cls] / n_docs
        
        # Count words per class
        word_counts = {cls: defaultdict(int) for cls in self.classes}
        total_words = {cls: 0 for cls in self.classes}
        
        for doc, label in zip(documents, labels):
            words = self.tokenize(doc)
            self.vocab.update(words)
            for word in words:
                word_counts[label][word] += 1
                total_words[label] += 1
        
        # Calculate word probabilities with Laplace smoothing
        vocab_size = len(self.vocab)
        for cls in self.classes:
            self.word_probs[cls] = {}
            for word in self.vocab:
                count = word_counts[cls][word]
                # Laplace smoothing
                self.word_probs[cls][word] = (count + 1) / (total_words[cls] + vocab_size)
    
    def predict_proba(self, document: str) -> Dict[str, float]:
        """Calculate probability for each class."""
        words = self.tokenize(document)
        posteriors = {}
        
        for cls in self.classes:
            # Start with log prior
            log_posterior = np.log(self.class_priors[cls])
            
            # Add log likelihood for each word
            for word in words:
                if word in self.word_probs[cls]:
                    log_posterior += np.log(self.word_probs[cls][word])
                else:
                    # Unknown word, use smoothing
                    log_posterior += np.log(1 / (len(self.vocab) + 1))
            
            posteriors[cls] = log_posterior
        
        # Normalize
        max_log = max(posteriors.values())
        posteriors = {cls: np.exp(p - max_log) for cls, p in posteriors.items()}
        total = sum(posteriors.values())
        posteriors = {cls: p / total for cls, p in posteriors.items()}
        
        return posteriors
    
    def predict(self, document: str) -> str:
        """Predict class for document."""
        probs = self.predict_proba(document)
        return max(probs, key=probs.get)


# Example: Movie review sentiment analysis
print("Movie Review Sentiment Analysis")
print("=" * 60)

# Training data
train_reviews = [
    "this movie is amazing I loved it",
    "great film wonderful acting",
    "best movie ever so good",
    "loved every minute fantastic",
    "terrible movie waste of time",
    "awful film so boring",
    "worst movie ever horrible",
    "hated it very bad",
]

train_labels = ['positive', 'positive', 'positive', 'positive',
                'negative', 'negative', 'negative', 'negative']

# Train classifier
text_nb = TextNaiveBayes()
text_nb.fit(train_reviews, train_labels)

print(f"Vocabulary size: {len(text_nb.vocab)}")
print(f"Classes: {text_nb.classes}")
print()

# Test on new reviews
test_reviews = [
    "this movie is great I loved it",
    "terrible film very boring",
    "amazing acting wonderful story",
]

print("Test Reviews:")
print("=" * 60)
for review in test_reviews:
    probs = text_nb.predict_proba(review)
    prediction = text_nb.predict(review)
    
    print(f"\nReview: \"{review}\"")
    print(f"Prediction: {prediction}")
    print(f"Confidence:")
    for cls, prob in probs.items():
        print(f"  {cls}: {prob:.4f}")

## Part 5: Using pgmpy for Bayesian Inference

Now let's use the professional library **pgmpy** for more sophisticated probabilistic reasoning.


In [None]:
# Note: pgmpy needs to be installed
# pip install pgmpy

try:
    from pgmpy.factors.discrete import TabularCPD
    from pgmpy.models import BayesianNetwork
    from pgmpy.inference import VariableElimination
    
    PGMPY_AVAILABLE = True
except ImportError:
    print("pgmpy not installed. Run: pip install pgmpy")
    PGMPY_AVAILABLE = False

if PGMPY_AVAILABLE:
    # Create a simple Bayesian network
    # Model: Rain → Grass Wet
    #       Sprinkler → Grass Wet
    
    print("Bayesian Network with pgmpy")
    print("=" * 60)
    
    # Define network structure
    model = BayesianNetwork([('Rain', 'GrassWet'), ('Sprinkler', 'GrassWet')])
    
    # Define CPDs (Conditional Probability Distributions)
    cpd_rain = TabularCPD(variable='Rain', variable_card=2, values=[[0.7], [0.3]])
    cpd_sprinkler = TabularCPD(variable='Sprinkler', variable_card=2, values=[[0.6], [0.4]])
    
    # P(GrassWet | Rain, Sprinkler)
    cpd_grass = TabularCPD(
        variable='GrassWet',
        variable_card=2,
        values=[
            [0.99, 0.2, 0.1, 0.01],  # P(GrassWet=0 | ...)
            [0.01, 0.8, 0.9, 0.99]   # P(GrassWet=1 | ...)
        ],
        evidence=['Rain', 'Sprinkler'],
        evidence_card=[2, 2]
    )
    
    # Add CPDs to model
    model.add_cpds(cpd_rain, cpd_sprinkler, cpd_grass)
    
    # Check if model is valid
    assert model.check_model()
    print("Model is valid!")
    print()
    
    # Perform inference
    inference = VariableElimination(model)
    
    # Query: What's P(Rain | GrassWet=1)?
    result = inference.query(variables=['Rain'], evidence={'GrassWet': 1})
    print("Query: P(Rain | Grass is wet)")
    print(result)
    print()
    
    # Query: What's P(Sprinkler | GrassWet=1)?
    result = inference.query(variables=['Sprinkler'], evidence={'GrassWet': 1})
    print("Query: P(Sprinkler | Grass is wet)")
    print(result)

## Exercises

### Exercise 1: Drug Testing
A drug test is 98% accurate (both sensitivity and specificity). Only 0.5% of the population uses the drug.
If someone tests positive, what's the probability they actually use the drug?

In [None]:
# TODO: Calculate P(uses drug | positive test)
# Your code here
pass

### Exercise 2: Email Spam Filter
Build a spam filter that uses word frequencies. Train on provided data and test accuracy.

In [None]:
# TODO: Build and evaluate spam filter
# Your code here
pass

### Exercise 3: Sequential Diagnosis
Create a medical diagnosis system that updates beliefs as symptoms are reported.
Use at least 3 symptoms and show the belief evolution.

In [None]:
# TODO: Build sequential diagnosis system
# Your code here
pass

## Summary

### Key Takeaways

1. **Bayes' Theorem** - The foundation of probabilistic AI
2. **Prior → Posterior** - How evidence updates beliefs
3. **Sequential updating** - Yesterday's posterior is today's prior
4. **Naive Bayes** - Simple but powerful classification
5. **Text classification** - Real-world NLP application
6. **pgmpy** - Professional probabilistic programming

### Why Bayes Matters

- **Principled uncertainty**: Formal framework for reasoning
- **Incorporates prior knowledge**: Use domain expertise
- **Updates with evidence**: Learns from data
- **Explains predictions**: Interpretable probabilities

### Next Steps

In Lab 3, we'll learn about **Bayesian Networks** - powerful graphical models for complex reasoning!
