[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/buildLittleWorlds/ml-math-with-densworld/blob/main/modules/01-statistics-probability/notebooks/05-bayesian-classification.ipynb)

# Lesson 5: Bayesian Classification

*"The Archives hold 300 manuscripts attributed to Grigsu Haldo, the great Stone School philosopher. But how many did he actually write? Mink Pavar's forgeries are legendary—even experts cannot always tell them apart. To convict Mink, we must ask not 'Is this a forgery?' but 'Given what we observe, how likely is it to be a forgery?'"*  
— Opening statement, Senate Inquiry into the Pavar Forgeries

---

## The Core Problem

The Capital Archives have discovered that some manuscripts attributed to famous philosophers are actually forgeries, likely created by Mink Pavar. We have a labeled training set of 300 manuscripts—some genuine, some known forgeries—with measurable features like vocabulary richness, sentence length, and philosophical term density.

Given a new manuscript, how should we combine these features with our prior beliefs to estimate the probability it's a forgery?

This is the domain of **Bayesian classification**.

---

## Learning Objectives

By the end of this lesson, you will:
1. Understand Bayes' Theorem and its components (prior, likelihood, posterior)
2. Build intuition for how evidence updates beliefs
3. Evaluate classifiers using confusion matrices, precision, and recall
4. Implement Naive Bayes classification from scratch

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats

# Set random seed for reproducibility
np.random.seed(42)

# Nice plotting defaults
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (10, 6)

# Colab-ready data loading
BASE_URL = "https://raw.githubusercontent.com/buildLittleWorlds/ml-math-with-densworld/main/data/"

# Load the manuscript features dataset
manuscripts = pd.read_csv(BASE_URL + "manuscript_features.csv")

print(f"Loaded {len(manuscripts)} manuscript records")
print(f"Forgeries: {manuscripts['is_forgery'].sum()} ({manuscripts['is_forgery'].mean():.1%})")
manuscripts.head()

## Part 1: The Forgery Problem

### Exploring the Data

Mink Pavar was a brilliant forger who created manuscripts in the style of famous philosophers, then sold them to collectors and archives. Let's examine the features that might distinguish forgeries from genuine manuscripts.

In [None]:
# Compare features between genuine and forged manuscripts
genuine = manuscripts[manuscripts['is_forgery'] == False]
forged = manuscripts[manuscripts['is_forgery'] == True]

features = ['avg_sentence_length', 'vocabulary_richness', 'philosophical_term_density', 
            'stylometric_variance', 'era_marker_score']

print("Feature Comparison: Genuine vs Forged Manuscripts")
print("=" * 70)
print(f"{'Feature':<30} {'Genuine Mean':<15} {'Forged Mean':<15} {'Difference':<10}")
print("-" * 70)

for feature in features:
    gen_mean = genuine[feature].mean()
    forg_mean = forged[feature].mean()
    diff = forg_mean - gen_mean
    print(f"{feature:<30} {gen_mean:<15.4f} {forg_mean:<15.4f} {diff:+.4f}")

In [None]:
# Visualize the distributions
fig, axes = plt.subplots(2, 3, figsize=(15, 10))
axes = axes.flatten()

for idx, feature in enumerate(features):
    ax = axes[idx]
    ax.hist(genuine[feature], bins=20, alpha=0.6, label='Genuine', color='steelblue', density=True)
    ax.hist(forged[feature], bins=20, alpha=0.6, label='Forged', color='coral', density=True)
    ax.set_xlabel(feature.replace('_', ' ').title())
    ax.set_ylabel('Density')
    ax.legend()
    ax.set_title(f'Distribution of {feature}')

# Hide the 6th subplot if we only have 5 features
if len(features) < 6:
    axes[5].axis('off')

plt.suptitle('Feature Distributions: Genuine vs Forged Manuscripts', fontsize=14, y=1.02)
plt.tight_layout()
plt.show()

print("\nKey observations:")
print("  - Some features show clear separation (good for classification)")
print("  - Others overlap substantially (less discriminative)")
print("  - No single feature perfectly separates the classes")

## Part 2: Bayes' Theorem

### The Foundation of Bayesian Thinking

Bayes' Theorem tells us how to update our beliefs in light of new evidence:

$$P(\text{Forgery} | \text{Evidence}) = \frac{P(\text{Evidence} | \text{Forgery}) \times P(\text{Forgery})}{P(\text{Evidence})}$$

Let's break down each component:

| Term | Name | Meaning |
|------|------|--------|
| P(Forgery \| Evidence) | **Posterior** | Probability it's a forgery, given what we observed |
| P(Evidence \| Forgery) | **Likelihood** | How likely would we see this evidence if it IS a forgery? |
| P(Forgery) | **Prior** | Our initial belief before seeing evidence |
| P(Evidence) | **Normalizing constant** | Probability of seeing this evidence under any scenario |

### A Simple Example

Let's start with one feature: `stylometric_variance`. Mink Pavar was brilliant but inconsistent—forged manuscripts tend to have higher variance in writing style.

In [None]:
# Calculate class statistics for stylometric_variance
feature = 'stylometric_variance'

# Prior probabilities
p_forgery = manuscripts['is_forgery'].mean()
p_genuine = 1 - p_forgery

print("Prior Probabilities (before seeing any evidence)")
print("=" * 50)
print(f"P(Forgery) = {p_forgery:.3f}  ({p_forgery:.1%})")
print(f"P(Genuine) = {p_genuine:.3f}  ({p_genuine:.1%})")

# Class-conditional distributions (assuming Gaussian)
genuine_mean = genuine[feature].mean()
genuine_std = genuine[feature].std()
forged_mean = forged[feature].mean()
forged_std = forged[feature].std()

print(f"\nClass-Conditional Distributions for '{feature}'")
print("-" * 50)
print(f"Genuine: μ = {genuine_mean:.4f}, σ = {genuine_std:.4f}")
print(f"Forged:  μ = {forged_mean:.4f}, σ = {forged_std:.4f}")

In [None]:
# Visualize Bayesian updating for a specific observation
test_value = 0.15  # A manuscript with stylometric_variance = 0.15

# Calculate likelihoods P(Evidence | Class)
likelihood_genuine = stats.norm.pdf(test_value, genuine_mean, genuine_std)
likelihood_forged = stats.norm.pdf(test_value, forged_mean, forged_std)

# Calculate unnormalized posteriors
unnorm_genuine = likelihood_genuine * p_genuine
unnorm_forged = likelihood_forged * p_forgery

# Normalize
normalizer = unnorm_genuine + unnorm_forged
posterior_genuine = unnorm_genuine / normalizer
posterior_forged = unnorm_forged / normalizer

print(f"Bayesian Classification for a manuscript with {feature} = {test_value}")
print("=" * 60)
print(f"\n1. Prior probabilities:")
print(f"   P(Genuine) = {p_genuine:.3f}")
print(f"   P(Forgery) = {p_forgery:.3f}")

print(f"\n2. Likelihoods P(Evidence | Class):")
print(f"   P({feature}={test_value} | Genuine) = {likelihood_genuine:.4f}")
print(f"   P({feature}={test_value} | Forgery) = {likelihood_forged:.4f}")

print(f"\n3. Posterior probabilities (after seeing evidence):")
print(f"   P(Genuine | Evidence) = {posterior_genuine:.3f}  ({posterior_genuine:.1%})")
print(f"   P(Forgery | Evidence) = {posterior_forged:.3f}  ({posterior_forged:.1%})")

print(f"\n4. Classification: {'FORGERY' if posterior_forged > 0.5 else 'GENUINE'}")

In [None]:
# Visualize how posterior changes with feature value
x_range = np.linspace(0, 0.4, 200)

posteriors = []
for x in x_range:
    lik_gen = stats.norm.pdf(x, genuine_mean, genuine_std)
    lik_forg = stats.norm.pdf(x, forged_mean, forged_std)
    unnorm_forg = lik_forg * p_forgery
    unnorm_gen = lik_gen * p_genuine
    post_forg = unnorm_forg / (unnorm_forg + unnorm_gen)
    posteriors.append(post_forg)

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Left: Class-conditional distributions
x_pdf = np.linspace(0, 0.4, 200)
axes[0].plot(x_pdf, stats.norm.pdf(x_pdf, genuine_mean, genuine_std), 
             'b-', linewidth=2, label='P(x | Genuine)')
axes[0].plot(x_pdf, stats.norm.pdf(x_pdf, forged_mean, forged_std), 
             'r-', linewidth=2, label='P(x | Forgery)')
axes[0].axvline(test_value, color='green', linestyle='--', linewidth=2, label=f'x = {test_value}')
axes[0].set_xlabel('Stylometric Variance')
axes[0].set_ylabel('Likelihood')
axes[0].set_title('Class-Conditional Distributions')
axes[0].legend()

# Right: Posterior probability
axes[1].plot(x_range, posteriors, 'purple', linewidth=2)
axes[1].axhline(0.5, color='gray', linestyle='--', label='Decision boundary')
axes[1].axvline(test_value, color='green', linestyle='--', linewidth=2, label=f'x = {test_value}')
axes[1].fill_between(x_range, posteriors, 0.5, where=np.array(posteriors) > 0.5, 
                     alpha=0.3, color='red', label='Classify as Forgery')
axes[1].fill_between(x_range, posteriors, 0.5, where=np.array(posteriors) <= 0.5, 
                     alpha=0.3, color='blue', label='Classify as Genuine')
axes[1].set_xlabel('Stylometric Variance')
axes[1].set_ylabel('P(Forgery | x)')
axes[1].set_title('Posterior Probability of Forgery')
axes[1].legend(loc='upper left')
axes[1].set_ylim(0, 1)

plt.tight_layout()
plt.show()

## Part 3: Confusion Matrix and Performance Metrics

Before building a full classifier, let's understand how to evaluate one.

### The Confusion Matrix

| | Predicted Genuine | Predicted Forgery |
|---|---|---|
| **Actually Genuine** | True Negative (TN) | False Positive (FP) |
| **Actually Forgery** | False Negative (FN) | True Positive (TP) |

### Key Metrics

- **Accuracy**: (TP + TN) / Total — Overall correctness
- **Precision**: TP / (TP + FP) — When we say "forgery," how often are we right?
- **Recall**: TP / (TP + FN) — Of all forgeries, how many did we catch?
- **F1 Score**: Harmonic mean of precision and recall

In [None]:
def evaluate_classifier(y_true, y_pred):
    """Calculate and display classification metrics."""
    
    # Confusion matrix components
    tp = np.sum((y_true == True) & (y_pred == True))
    tn = np.sum((y_true == False) & (y_pred == False))
    fp = np.sum((y_true == False) & (y_pred == True))
    fn = np.sum((y_true == True) & (y_pred == False))
    
    # Metrics
    accuracy = (tp + tn) / len(y_true)
    precision = tp / (tp + fp) if (tp + fp) > 0 else 0
    recall = tp / (tp + fn) if (tp + fn) > 0 else 0
    f1 = 2 * precision * recall / (precision + recall) if (precision + recall) > 0 else 0
    
    # Display confusion matrix
    print("Confusion Matrix")
    print("=" * 40)
    print(f"{'':20} {'Pred Genuine':>12} {'Pred Forgery':>12}")
    print(f"{'Actual Genuine':20} {tn:>12} {fp:>12}")
    print(f"{'Actual Forgery':20} {fn:>12} {tp:>12}")
    
    print(f"\nMetrics")
    print("-" * 40)
    print(f"Accuracy:  {accuracy:.3f}  (Overall correctness)")
    print(f"Precision: {precision:.3f}  (When we say 'forgery', are we right?)")
    print(f"Recall:    {recall:.3f}  (Of all forgeries, how many caught?)")
    print(f"F1 Score:  {f1:.3f}  (Harmonic mean of precision & recall)")
    
    return {'accuracy': accuracy, 'precision': precision, 'recall': recall, 'f1': f1}

# Test with our single-feature classifier
def classify_single_feature(x, feature='stylometric_variance'):
    """Classify using a single feature with Gaussian Naive Bayes."""
    gen_mean = genuine[feature].mean()
    gen_std = genuine[feature].std()
    forg_mean = forged[feature].mean()
    forg_std = forged[feature].std()
    
    lik_gen = stats.norm.pdf(x, gen_mean, gen_std)
    lik_forg = stats.norm.pdf(x, forg_mean, forg_std)
    
    unnorm_gen = lik_gen * p_genuine
    unnorm_forg = lik_forg * p_forgery
    
    post_forg = unnorm_forg / (unnorm_forg + unnorm_gen)
    return post_forg > 0.5

# Apply to all manuscripts
predictions = manuscripts['stylometric_variance'].apply(classify_single_feature)

print("Single-Feature Classifier (stylometric_variance)")
print("" + "="*50 + "\n")
metrics = evaluate_classifier(manuscripts['is_forgery'].values, predictions.values)

## Part 4: Naive Bayes with Multiple Features

### The "Naive" Assumption

Naive Bayes assumes that features are **conditionally independent** given the class. This means:

$$P(x_1, x_2, ..., x_n | \text{Class}) = P(x_1 | \text{Class}) \times P(x_2 | \text{Class}) \times ... \times P(x_n | \text{Class})$$

This assumption is almost always wrong! But it often works surprisingly well in practice.

### Implementation

In [None]:
class NaiveBayesClassifier:
    """Gaussian Naive Bayes classifier implemented from scratch."""
    
    def __init__(self):
        self.class_priors = {}
        self.class_means = {}
        self.class_stds = {}
        self.classes = None
        self.features = None
    
    def fit(self, X, y):
        """Learn class-conditional distributions from training data."""
        self.features = X.columns.tolist()
        self.classes = y.unique()
        
        for cls in self.classes:
            # Prior probability
            self.class_priors[cls] = (y == cls).mean()
            
            # Class-conditional means and stds
            cls_data = X[y == cls]
            self.class_means[cls] = cls_data.mean()
            self.class_stds[cls] = cls_data.std() + 1e-6  # Add small value to avoid division by zero
        
        return self
    
    def predict_proba(self, X):
        """Return posterior probabilities for each class."""
        posteriors = []
        
        for idx in range(len(X)):
            row = X.iloc[idx]
            class_posteriors = {}
            
            for cls in self.classes:
                # Start with log prior
                log_posterior = np.log(self.class_priors[cls])
                
                # Add log likelihoods for each feature
                for feature in self.features:
                    mean = self.class_means[cls][feature]
                    std = self.class_stds[cls][feature]
                    log_likelihood = stats.norm.logpdf(row[feature], mean, std)
                    log_posterior += log_likelihood
                
                class_posteriors[cls] = log_posterior
            
            # Convert from log to probability and normalize
            max_log = max(class_posteriors.values())
            exp_posteriors = {cls: np.exp(lp - max_log) for cls, lp in class_posteriors.items()}
            total = sum(exp_posteriors.values())
            normalized = {cls: p / total for cls, p in exp_posteriors.items()}
            posteriors.append(normalized)
        
        return posteriors
    
    def predict(self, X, threshold=0.5):
        """Predict class labels."""
        posteriors = self.predict_proba(X)
        predictions = []
        for p in posteriors:
            # For binary classification, use threshold
            if len(self.classes) == 2:
                predictions.append(p[True] > threshold)
            else:
                predictions.append(max(p, key=p.get))
        return np.array(predictions)

# Train the classifier
X = manuscripts[features]
y = manuscripts['is_forgery']

nb_classifier = NaiveBayesClassifier()
nb_classifier.fit(X, y)

print("Naive Bayes Classifier Trained")
print("=" * 50)
print(f"Features used: {features}")
print(f"\nPrior probabilities:")
for cls, prior in nb_classifier.class_priors.items():
    label = "Forgery" if cls else "Genuine"
    print(f"  P({label}) = {prior:.3f}")

In [None]:
# Evaluate multi-feature classifier
predictions_multi = nb_classifier.predict(X)

print("Multi-Feature Naive Bayes Classifier")
print("=" * 50 + "\n")
metrics_multi = evaluate_classifier(y.values, predictions_multi)

print(f"\nComparison with single-feature classifier:")
print(f"  Single feature accuracy: {metrics['accuracy']:.3f}")
print(f"  Multi-feature accuracy:  {metrics_multi['accuracy']:.3f}")
print(f"  Improvement: {(metrics_multi['accuracy'] - metrics['accuracy'])*100:.1f} percentage points")

## Part 5: The Precision-Recall Tradeoff

In the forgery investigation, there's a fundamental tension:

- **High Precision**: Only accuse manuscripts we're very confident are forgeries (risk: miss some forgeries)
- **High Recall**: Catch all forgeries (risk: falsely accuse genuine manuscripts)

By adjusting our decision threshold, we can trade off between these goals.

In [None]:
# Calculate precision and recall at various thresholds
thresholds = np.linspace(0.1, 0.9, 17)
posteriors = nb_classifier.predict_proba(X)
p_forgery = np.array([p[True] for p in posteriors])

precisions = []
recalls = []

for thresh in thresholds:
    preds = p_forgery > thresh
    tp = np.sum((y == True) & (preds == True))
    fp = np.sum((y == False) & (preds == True))
    fn = np.sum((y == True) & (preds == False))
    
    precision = tp / (tp + fp) if (tp + fp) > 0 else 0
    recall = tp / (tp + fn) if (tp + fn) > 0 else 0
    
    precisions.append(precision)
    recalls.append(recall)

# Plot precision-recall tradeoff
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Left: Precision and Recall vs Threshold
axes[0].plot(thresholds, precisions, 'b-', linewidth=2, label='Precision')
axes[0].plot(thresholds, recalls, 'r-', linewidth=2, label='Recall')
axes[0].axvline(0.5, color='gray', linestyle='--', label='Default threshold')
axes[0].set_xlabel('Decision Threshold', fontsize=12)
axes[0].set_ylabel('Score', fontsize=12)
axes[0].set_title('Precision and Recall vs. Threshold', fontsize=14)
axes[0].legend()
axes[0].set_ylim(0, 1)

# Right: Precision-Recall Curve
axes[1].plot(recalls, precisions, 'purple', linewidth=2)
axes[1].scatter([recalls[len(thresholds)//2]], [precisions[len(thresholds)//2]], 
                color='red', s=100, zorder=5, label='Threshold = 0.5')
axes[1].set_xlabel('Recall', fontsize=12)
axes[1].set_ylabel('Precision', fontsize=12)
axes[1].set_title('Precision-Recall Curve', fontsize=14)
axes[1].legend()
axes[1].set_xlim(0, 1)
axes[1].set_ylim(0, 1)

plt.tight_layout()
plt.show()

print("Interpretation:")
print("  - Higher threshold → Higher precision, lower recall")
print("  - Lower threshold → Lower precision, higher recall")
print("  - The 'ideal' point depends on the costs of each error type")

## Part 6: Case Study — A Specific Manuscript

Let's walk through the Bayesian classification of a specific manuscript that the Senate is investigating.

In [None]:
# Pick a manuscript to investigate
suspect_manuscript = manuscripts[manuscripts['attributed_author'] == 'Grigsu Haldo'].iloc[0]

print("Senate Investigation: Manuscript Analysis")
print("=" * 60)
print(f"Manuscript ID: {suspect_manuscript['manuscript_id']}")
print(f"Attributed to: {suspect_manuscript['attributed_author']}")
print(f"School: {suspect_manuscript['attributed_school']}")
print(f"Composition Year: {suspect_manuscript['composition_year']}")
print(f"\nActual status: {'FORGERY' if suspect_manuscript['is_forgery'] else 'GENUINE'}")
if suspect_manuscript['is_forgery']:
    print(f"True author: {suspect_manuscript['true_author']}")

print(f"\n{'='*60}")
print("Feature Analysis")
print(f"{'='*60}")

# Calculate contribution of each feature
X_suspect = suspect_manuscript[features].to_frame().T

print(f"\n{'Feature':<30} {'Value':<10} {'Gen Prob':<10} {'Forg Prob':<10} {'Favors':<10}")
print("-" * 70)

log_odds_contributions = []

for feature in features:
    val = suspect_manuscript[feature]
    
    gen_prob = stats.norm.pdf(val, genuine[feature].mean(), genuine[feature].std())
    forg_prob = stats.norm.pdf(val, forged[feature].mean(), forged[feature].std())
    
    log_ratio = np.log(forg_prob / gen_prob) if gen_prob > 0 else 0
    log_odds_contributions.append(log_ratio)
    
    favors = "Forgery" if forg_prob > gen_prob else "Genuine"
    print(f"{feature:<30} {val:<10.4f} {gen_prob:<10.4f} {forg_prob:<10.4f} {favors:<10}")

# Final posterior
posteriors = nb_classifier.predict_proba(X_suspect)
p_forg = posteriors[0][True]

print(f"\n{'='*60}")
print(f"Bayesian Conclusion")
print(f"{'='*60}")
print(f"Prior P(Forgery): {p_forgery:.3f}")
print(f"Posterior P(Forgery | Evidence): {p_forg:.3f}")
print(f"\nClassification: {'FORGERY' if p_forg > 0.5 else 'GENUINE'}")
print(f"Confidence: {max(p_forg, 1-p_forg):.1%}")

## Part 7: Cross-Validation

We've been evaluating on the same data we trained on, which can lead to overly optimistic results. Let's use cross-validation to get a more honest estimate of performance.

In [None]:
from sklearn.model_selection import StratifiedKFold

# 5-fold cross-validation
kfold = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

cv_results = {'accuracy': [], 'precision': [], 'recall': [], 'f1': []}

print("5-Fold Cross-Validation Results")
print("=" * 60)
print(f"{'Fold':<6} {'Accuracy':<12} {'Precision':<12} {'Recall':<12} {'F1':<12}")
print("-" * 60)

for fold, (train_idx, test_idx) in enumerate(kfold.split(X, y), 1):
    X_train, X_test = X.iloc[train_idx], X.iloc[test_idx]
    y_train, y_test = y.iloc[train_idx], y.iloc[test_idx]
    
    # Train
    clf = NaiveBayesClassifier()
    clf.fit(X_train, y_train)
    
    # Predict
    preds = clf.predict(X_test)
    
    # Calculate metrics
    tp = np.sum((y_test == True) & (preds == True))
    tn = np.sum((y_test == False) & (preds == False))
    fp = np.sum((y_test == False) & (preds == True))
    fn = np.sum((y_test == True) & (preds == False))
    
    accuracy = (tp + tn) / len(y_test)
    precision = tp / (tp + fp) if (tp + fp) > 0 else 0
    recall = tp / (tp + fn) if (tp + fn) > 0 else 0
    f1 = 2 * precision * recall / (precision + recall) if (precision + recall) > 0 else 0
    
    cv_results['accuracy'].append(accuracy)
    cv_results['precision'].append(precision)
    cv_results['recall'].append(recall)
    cv_results['f1'].append(f1)
    
    print(f"{fold:<6} {accuracy:<12.3f} {precision:<12.3f} {recall:<12.3f} {f1:<12.3f}")

print("-" * 60)
print(f"{'Mean':<6} {np.mean(cv_results['accuracy']):<12.3f} {np.mean(cv_results['precision']):<12.3f} "
      f"{np.mean(cv_results['recall']):<12.3f} {np.mean(cv_results['f1']):<12.3f}")
print(f"{'Std':<6} {np.std(cv_results['accuracy']):<12.3f} {np.std(cv_results['precision']):<12.3f} "
      f"{np.std(cv_results['recall']):<12.3f} {np.std(cv_results['f1']):<12.3f}")

## Summary

| Concept | Key Insight | Application |
|---------|-------------|-------------|
| Bayes' Theorem | Posterior ∝ Likelihood × Prior | Update beliefs with evidence |
| Prior | Initial belief before seeing evidence | Base rate of forgeries |
| Likelihood | P(evidence \| class) | How typical is this manuscript for each class? |
| Posterior | P(class \| evidence) | Final probability of forgery |
| Naive Bayes | Assume feature independence | Multiply likelihoods |
| Confusion Matrix | TP, TN, FP, FN | Understanding error types |
| Precision | TP / (TP + FP) | When we accuse, are we right? |
| Recall | TP / (TP + FN) | How many forgeries do we catch? |

---

## Exercises

### Exercise 1: Feature Importance

Which feature contributes most to the classification? Calculate the average absolute log-odds contribution for each feature across all manuscripts.

In [None]:
# Exercise 1: Your code here
# Hint: For each feature, calculate log(P(x|Forgery) / P(x|Genuine)) for each manuscript


### Exercise 2: Threshold Optimization

The Senate wants to minimize false accusations (false positives) while still catching at least 80% of forgeries (recall ≥ 0.80). What threshold should we use?

In [None]:
# Exercise 2: Your code here
# Hint: Search through thresholds to find the highest precision that achieves recall ≥ 0.80


### Exercise 3: Prior Sensitivity

How sensitive is our classifier to the prior? Suppose a skeptic believes only 5% of manuscripts are forgeries, while an accuser believes 40% are. How would their posterior probabilities differ for the same manuscript?

In [None]:
# Exercise 3: Your code here
# Hint: Modify the prior and recalculate posteriors for a specific manuscript


### Exercise 4: Author-Specific Classification

Build a classifier specifically for manuscripts attributed to Grigsu Haldo. Does focusing on one author improve performance?

In [None]:
# Exercise 4: Your code here


---

## Module Complete!

Congratulations! You've completed Module 1: Statistics & Probability. You now understand:

1. **Uncertainty Intuition**: Populations, samples, and random variables
2. **Distributions as Terrain**: Normal, skewed, and fat-tailed distributions
3. **Central Limit Theorem**: Why averaging works
4. **Hypothesis Testing**: Signal vs. noise, p-values, multiple comparisons
5. **Bayesian Classification**: Updating beliefs with evidence

In Module 2, we'll explore **Linear Algebra**—the mathematics of vectors and matrices—through the lens of creature similarity, map projections, and feature engineering.

*"The Archives are full of numbers. But numbers without interpretation are just ink on parchment. What we've learned today is how to make those numbers speak."*  
— Closing statement, Senate Inquiry into the Pavar Forgeries