[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/buildLittleWorlds/ml-math-with-densworld/blob/main/modules/03-calculus/notebooks/01-derivatives-sensitivity.ipynb)

# Lesson 1: Derivatives as Sensitivity

*"If I adjust my stratagem by the smallest fraction, how much closer—or further—does the Tower's fall become? This is the question that haunts my nights. Every siege is a function; every choice, a variable. I must learn to feel the gradient of war."*  
— The Colonel, private journals, Year 15 of the Siege

---

## The Core Reframe

In school, you learned:
> "The derivative is the slope of the tangent line."

That's geometrically true but **not useful** for machine learning—or for understanding the Colonel's siege of the Tower of Mirado.

Here's the ML interpretation:
> "The derivative measures **sensitivity** — how much the output changes when we nudge the input."

For the Colonel besieging the Tower:
> "If I adjust my attack strategy slightly, how much does my progress toward breaching the Tower change?"

---

## Learning Objectives

By the end of this lesson, you will:
1. Think of derivatives as sensitivity measures, not just slopes
2. Understand what "high" vs "low" derivatives mean practically
3. See why this matters for training ML models—and for the Colonel's siege
4. Compute numerical derivatives and interpret their meaning
5. Connect derivatives to the concept of optimization

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Set random seed for reproducibility
np.random.seed(42)

# Nice plotting defaults
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (10, 6)

# Colab-ready data loading
BASE_URL = "https://raw.githubusercontent.com/buildLittleWorlds/ml-math-with-densworld/main/data/"

# Load the siege progress data
siege = pd.read_csv(BASE_URL + "siege_progress.csv")
stratagem = pd.read_csv(BASE_URL + "stratagem_details.csv")

print(f"Loaded {len(siege)} months of siege records")
print(f"Loaded {len(stratagem)} individual stratagem attempts")
print(f"Siege duration: {siege['year'].max()} years")
siege.head()

## The Colonel's Optimization Problem

*"Twenty years I have spent beneath the shadow of this Tower. Twenty years of trial and failure, of blood and patience. The Tower does not negotiate. It simply waits. But I have learned something in these decades: there is a landscape beneath the landscape—a terrain of cause and effect that I navigate blind, feeling for the downhill slope."*  
— The Colonel, addressing his officers, Year 19

The Colonel has been besieging the Tower of Mirado for 20 years. His **loss function** is simple: the distance from breaching the Tower (1 - progress_score). His **parameters** are his stratagems—ladder assaults, grappling hooks, tunneling, parley, waiting.

**The Question**: If I change my approach, how much does my loss change?

- If the answer is "a lot" → High sensitivity → Large derivative
- If the answer is "barely at all" → Low sensitivity → Small derivative

---

## Part 1: The Sensitivity Intuition

In [None]:
# Visualize progress over time
fig, axes = plt.subplots(2, 1, figsize=(12, 8), sharex=True)

# Progress score over time
axes[0].plot(siege['month_total'], siege['progress_score'], 'b-', linewidth=1.5)
axes[0].fill_between(siege['month_total'], 0, siege['progress_score'], alpha=0.3)
axes[0].set_ylabel('Progress Score', fontsize=11)
axes[0].set_title('The Colonel\'s Siege: 20 Years of Effort', fontsize=13)
axes[0].set_ylim(0, 0.3)

# Loss (1 - progress) over time
axes[1].plot(siege['month_total'], siege['loss'], 'r-', linewidth=1.5)
axes[1].fill_between(siege['month_total'], siege['loss'], 1, alpha=0.3, color='red')
axes[1].set_ylabel('Loss (1 - Progress)', fontsize=11)
axes[1].set_xlabel('Month of Siege', fontsize=11)
axes[1].set_title('What We Want to Minimize: Distance from Breaching the Tower', fontsize=13)

plt.tight_layout()
plt.show()

print(f"Starting progress: {siege['progress_score'].iloc[0]:.1%}")
print(f"Final progress: {siege['progress_score'].iloc[-1]:.1%}")
print(f"\nThe Colonel has made {siege['progress_score'].iloc[-1]:.1%} progress in 20 years.")
print("The Tower of Mirado does not fall easily.")

## Part 2: Sensitivity Changes Over Time

For non-linear systems, the sensitivity **depends on where you are**.

Early in the siege, when progress is low, the Tower's defenses may respond differently than later. The same change in strategy might have dramatically different effects depending on the current state.

*"In the early years, every assault seemed to matter. A ladder gained or lost could shift the momentum. Now, in Year 15, the increments grow smaller. The Tower has adapted to us, as we have adapted to it."*  
— The Colonel

Let's look at how sensitivity (the derivative of progress with respect to effort) changes:

In [None]:
# The 'progress_delta' column shows how much progress changed each month
# This is approximately the derivative!

fig, axes = plt.subplots(2, 1, figsize=(12, 8), sharex=True)

# Progress delta (the "derivative")
colors = ['green' if d > 0 else 'red' for d in siege['progress_delta']]
axes[0].bar(siege['month_total'], siege['progress_delta'], color=colors, alpha=0.7, width=1)
axes[0].axhline(0, color='black', linewidth=0.5)
axes[0].set_ylabel('Progress Change (Δ)', fontsize=11)
axes[0].set_title('Monthly Change in Progress\n(Green = Progress, Red = Regression)', fontsize=13)
axes[0].set_ylim(-0.03, 0.08)

# Rolling average of progress delta (smoothed derivative)
rolling_avg = siege['progress_delta'].rolling(window=12).mean()
axes[1].plot(siege['month_total'], rolling_avg, 'purple', linewidth=2)
axes[1].axhline(0, color='black', linewidth=0.5)
axes[1].fill_between(siege['month_total'], 0, rolling_avg, 
                     where=rolling_avg > 0, alpha=0.3, color='green')
axes[1].fill_between(siege['month_total'], rolling_avg, 0, 
                     where=rolling_avg < 0, alpha=0.3, color='red')
axes[1].set_ylabel('12-Month Rolling Average', fontsize=11)
axes[1].set_xlabel('Month of Siege', fontsize=11)
axes[1].set_title('Smoothed Sensitivity: How Effective is the Colonel\'s Effort?', fontsize=13)

plt.tight_layout()
plt.show()

print("Notice how the derivative (rate of progress) varies over time.")
print("Some periods show high sensitivity—small changes yield big results.")
print("Other periods show near-zero sensitivity—the Colonel is stuck.")

## Part 3: Why ML Cares About Sensitivity

In machine learning, we have:
- **Parameters** (weights): the knobs we can turn (like choosing a stratagem)
- **Loss** (error): what we want to minimize (like distance from breaching the Tower)

The derivative tells us: **"If I adjust this parameter a tiny bit, how much will the error change?"**

This is exactly what the Colonel needs to know!

### Connecting to the Siege

The Colonel has been recording an `estimated_gradient`—his sense of which direction reduces loss. But his estimates are noisy. He can't see the true loss landscape; he navigates by feel.

*"I cannot see the true gradient. The Tower reveals nothing. I estimate, I guess, I intuit. Some months my intuition is sharp; others, I am blind. This is the curse of optimization in the dark."*  
— The Colonel

In [None]:
# Compare actual progress to the Colonel's estimated gradient
fig, ax = plt.subplots(figsize=(10, 6))

scatter = ax.scatter(siege['estimated_gradient'], siege['progress_delta'], 
                     alpha=0.5, c=siege['month_total'], cmap='viridis', s=40)
ax.axhline(0, color='black', linewidth=0.5)
ax.axvline(0, color='black', linewidth=0.5)

# Add regression line
z = np.polyfit(siege['estimated_gradient'], siege['progress_delta'], 1)
p = np.poly1d(z)
x_line = np.linspace(siege['estimated_gradient'].min(), siege['estimated_gradient'].max(), 100)
ax.plot(x_line, p(x_line), 'r--', linewidth=2, label='Trend')

ax.set_xlabel('Colonel\'s Estimated Gradient (his sense of direction)', fontsize=11)
ax.set_ylabel('Actual Progress Delta', fontsize=11)
ax.set_title('How Well Does the Colonel Estimate the True Gradient?\n(Color = Month of Siege)', fontsize=13)
ax.legend()
plt.colorbar(scatter, label='Month')
plt.tight_layout()
plt.show()

correlation = siege['estimated_gradient'].corr(siege['progress_delta'])
print(f"Correlation between estimated and actual gradient: {correlation:.3f}")
print("\nThe Colonel's estimates are noisy—he can't see the true loss landscape.")
print("This is exactly the challenge faced by stochastic gradient descent!")

## Part 4: The Sign of the Derivative

The **sign** of the derivative is crucial:

| Derivative Sign | Meaning | What to do |
|-----------------|---------|------------|
| Positive (+) | More effort → More loss | **Reduce** effort in this direction |
| Negative (-) | More effort → Less loss | **Increase** effort in this direction |
| Zero (0) | At a minimum (or maximum) | You might be done! |

For the Colonel: if a stratagem consistently leads to negative progress, he should try something else.

Let's analyze which stratagems work:

In [None]:
# Analyze effectiveness by stratagem
stratagem_analysis = siege.groupby('stratagem_attempted').agg({
    'progress_delta': ['mean', 'std', 'count'],
    'morale_index': 'mean',
    'supply_level': 'mean'
}).round(4)

stratagem_analysis.columns = ['avg_progress', 'std_progress', 'count', 'avg_morale', 'avg_supplies']
stratagem_analysis = stratagem_analysis.sort_values('avg_progress', ascending=False)

print("Stratagem Effectiveness Analysis:")
print("=" * 70)
print(stratagem_analysis.to_string())

print("\n" + "=" * 70)
print("Interpretation:")
print("- Positive avg_progress = stratagem tends to help (negative derivative of loss)")
print("- Negative avg_progress = stratagem tends to hurt (positive derivative of loss)")
print("- High std = unpredictable results—noisy gradient estimates")

In [None]:
# Visualize stratagem effectiveness
fig, ax = plt.subplots(figsize=(12, 6))

stratagems = stratagem_analysis.index
progress = stratagem_analysis['avg_progress']
errors = stratagem_analysis['std_progress']
counts = stratagem_analysis['count']

colors = ['green' if p > 0 else 'red' for p in progress]
bars = ax.bar(stratagems, progress, color=colors, alpha=0.7, edgecolor='black')
ax.errorbar(stratagems, progress, yerr=errors, fmt='none', color='black', capsize=5)

# Add count labels
for bar, count in zip(bars, counts):
    height = bar.get_height()
    ax.annotate(f'n={count}',
                xy=(bar.get_x() + bar.get_width() / 2, height),
                xytext=(0, 3 if height > 0 else -15),
                textcoords="offset points",
                ha='center', va='bottom', fontsize=9)

ax.axhline(0, color='black', linewidth=1)
ax.set_ylabel('Average Progress per Month', fontsize=11)
ax.set_xlabel('Stratagem', fontsize=11)
ax.set_title('Which Stratagems Move the Colonel Toward His Goal?\n(Error bars = standard deviation)', fontsize=13)
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()

## Part 5: Computing Derivatives Numerically

You don't always need the mathematical formula for a derivative.

The **numerical derivative** approximates it by actually nudging the input and measuring the change:

$$f'(x) \approx \frac{f(x + h) - f(x)}{h}$$

where $h$ is a tiny number (like 0.0001).

This is literally: "nudge x by h, see how much f changes, divide by the nudge size."

*"I do this instinctively. I try a slight variation—one more ladder, one fewer sapper—and observe the result. The Tower is my function; my choices are the input; progress is the output. I am computing gradients without knowing the formula."*  
— The Colonel

In [None]:
def numerical_derivative(f, x, h=1e-5):
    """Compute derivative of f at point x using finite differences."""
    return (f(x + h) - f(x)) / h

def numerical_derivative_centered(f, x, h=1e-5):
    """Compute derivative using centered differences (more accurate)."""
    return (f(x + h) - f(x - h)) / (2 * h)

# Example: A simple loss function (like the Colonel's distance from the Tower)
def loss_function(effort):
    """Simulated loss: decreases with effort, but with diminishing returns."""
    return 1 / (1 + 0.1 * effort)  # Starts at 1, approaches 0

# Test the numerical derivative at different effort levels
effort_levels = [0, 5, 10, 20, 50, 100]
print("Numerical Derivatives of Loss Function:")
print("-" * 60)
print(f"{'Effort':>10} | {'Loss':>10} | {'Derivative':>12} | Interpretation")
print("-" * 60)

for effort in effort_levels:
    loss = loss_function(effort)
    deriv = numerical_derivative(loss_function, effort)
    interp = "High sensitivity" if abs(deriv) > 0.005 else "Low sensitivity"
    print(f"{effort:>10} | {loss:>10.4f} | {deriv:>12.6f} | {interp}")

print("\nNote: All derivatives are negative (more effort → less loss)")
print("But the magnitude decreases—diminishing returns!")

In [None]:
# Visualize the loss function and its derivative
effort_range = np.linspace(0, 100, 200)
losses = [loss_function(e) for e in effort_range]
derivatives = [numerical_derivative(loss_function, e) for e in effort_range]

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Loss function
axes[0].plot(effort_range, losses, 'b-', linewidth=2)
axes[0].set_xlabel('Effort Level', fontsize=11)
axes[0].set_ylabel('Loss (Distance from Goal)', fontsize=11)
axes[0].set_title('The Loss Function\n(What the Colonel wants to minimize)', fontsize=12)
axes[0].grid(True, alpha=0.3)

# Derivative (sensitivity)
axes[1].plot(effort_range, derivatives, 'r-', linewidth=2)
axes[1].axhline(0, color='black', linewidth=0.5)
axes[1].set_xlabel('Effort Level', fontsize=11)
axes[1].set_ylabel('Derivative (Sensitivity)', fontsize=11)
axes[1].set_title('The Derivative\n(How much does effort reduce loss?)', fontsize=12)
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("Left: The loss decreases with effort, but flattens out.")
print("Right: The derivative (sensitivity) is always negative but approaches zero.")
print("\nThis is diminishing returns: early effort is highly effective, later effort less so.")

## Part 6: The Stratagem Details — Gradient Estimation in Practice

The Colonel's records include detailed stratagem-level data with both his **estimated gradient** and the **actual gradient** (computed after the fact). This lets us see how well he navigates the loss landscape.

In [None]:
# Examine the stratagem details
print("Stratagem Details — The Colonel's Gradient Estimation Record:")
print("=" * 90)
cols = ['stratagem_id', 'stratagem_type', 'estimated_gradient', 'actual_gradient', 
        'gradient_error', 'progress_delta', 'outcome_category']
print(stratagem[cols].head(15).to_string(index=False))

In [None]:
# How accurate are the Colonel's gradient estimates?
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Scatter: estimated vs actual
axes[0].scatter(stratagem['estimated_gradient'], stratagem['actual_gradient'], 
                alpha=0.5, c='steelblue', edgecolor='white', s=50)
# Perfect estimation line
lims = [min(stratagem['estimated_gradient'].min(), stratagem['actual_gradient'].min()),
        max(stratagem['estimated_gradient'].max(), stratagem['actual_gradient'].max())]
axes[0].plot(lims, lims, 'r--', linewidth=2, label='Perfect estimation')
axes[0].set_xlabel('Colonel\'s Estimated Gradient', fontsize=11)
axes[0].set_ylabel('Actual Gradient', fontsize=11)
axes[0].set_title('Gradient Estimation Accuracy', fontsize=12)
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Distribution of gradient errors
axes[1].hist(stratagem['gradient_error'], bins=30, color='coral', edgecolor='black', alpha=0.7)
axes[1].axvline(0, color='black', linewidth=1, linestyle='--')
axes[1].axvline(stratagem['gradient_error'].mean(), color='red', linewidth=2, 
                label=f'Mean error: {stratagem["gradient_error"].mean():.4f}')
axes[1].set_xlabel('Gradient Error (Estimated - Actual)', fontsize=11)
axes[1].set_ylabel('Frequency', fontsize=11)
axes[1].set_title('Distribution of the Colonel\'s Estimation Errors', fontsize=12)
axes[1].legend()

plt.tight_layout()
plt.show()

correlation = stratagem['estimated_gradient'].corr(stratagem['actual_gradient'])
mean_abs_error = stratagem['gradient_error'].abs().mean()
print(f"Correlation between estimated and actual: {correlation:.3f}")
print(f"Mean absolute error: {mean_abs_error:.4f}")
print("\nThe Colonel's estimates are correlated with truth, but noisy.")
print("This is stochastic gradient descent in action!")

## Part 7: What Makes a Good Gradient Estimate?

Looking at which stratagems give the Colonel better gradient estimates:

In [None]:
# Analyze gradient estimation quality by stratagem type
gradient_quality = stratagem.groupby('stratagem_type').agg({
    'gradient_error': ['mean', 'std', 'count'],
    'was_optimal_direction': 'mean'  # How often did he get the direction right?
}).round(4)

gradient_quality.columns = ['mean_error', 'std_error', 'count', 'correct_direction_%']
gradient_quality['correct_direction_%'] = (gradient_quality['correct_direction_%'] * 100).round(1)
gradient_quality = gradient_quality.sort_values('correct_direction_%', ascending=False)

print("Gradient Estimation Quality by Stratagem Type:")
print("=" * 70)
print(gradient_quality.to_string())

print("\nInterpretation:")
print("- 'correct_direction_%' = how often the Colonel correctly identified uphill/downhill")
print("- Some stratagems give clearer feedback than others")
print("- Reconnaissance-type actions tend to improve gradient estimation")

---

## Exercises

### Exercise 1: Stratagem Analysis

The Colonel used 'parley' many times with near-zero average progress. Using the concept of derivatives, explain why he might have kept trying it despite poor results.

*Hint: Consider the variance and the cost of the stratagem.*

In [None]:
# Exercise 1: Why did the Colonel keep trying parley?
# Analyze parley's characteristics

parley_data = stratagem[stratagem['stratagem_type'] == 'parley']

print("Parley Stratagem Analysis:")
print(f"Number of attempts: {len(parley_data)}")
print(f"Mean progress delta: {parley_data['progress_delta'].mean():.4f}")
print(f"Std progress delta: {parley_data['progress_delta'].std():.4f}")
print(f"Risk level (mean): {parley_data['risk_level'].mean():.2f}")
print(f"Personnel committed (mean): {parley_data['personnel_committed'].mean():.1f}")
print(f"Supply cost (mean): {parley_data['supply_cost'].mean():.1f}")
print(f"Casualties (total): {parley_data['casualties'].sum()}")

# Your interpretation:
print("\n" + "="*50)
print("INTERPRETATION:")
print("Parley has near-zero expected gradient (no progress),")
print("but it has LOW COST and LOW RISK.")
print("The Colonel uses it when other options are too expensive.")
print("It's like a 'safe' step in gradient descent—no progress, but no regression either.")

### Exercise 2: Numerical Derivatives from Siege Data

Compute the numerical derivative of the actual siege progress score at months 50, 100, and 200. How does sensitivity change over time?

In [None]:
# Exercise 2: Compute numerical derivatives from siege data
# The progress_delta column IS the discrete derivative!

months_to_analyze = [50, 100, 150, 200]

print("Sensitivity Analysis at Different Time Points:")
print("-" * 60)
print(f"{'Month':>8} | {'Progress':>10} | {'Delta (Deriv)':>14} | {'12-mo Avg':>12}")
print("-" * 60)

for month in months_to_analyze:
    if month < len(siege):
        row = siege[siege['month_total'] == month].iloc[0]
        # 12-month rolling average around this point
        start = max(0, month - 6)
        end = min(len(siege), month + 6)
        rolling = siege.iloc[start:end]['progress_delta'].mean()
        print(f"{month:>8} | {row['progress_score']:>10.4f} | {row['progress_delta']:>14.4f} | {rolling:>12.4f}")

print("\nInterpretation:")
print("The derivative fluctuates month-to-month (noise),")
print("but the rolling average shows the underlying trend.")
print("Sensitivity tends to decrease as the siege progresses—")
print("the Tower becomes harder to crack with each year.")

### Exercise 3: Perfect Information

If the Colonel had perfect knowledge of the gradient (no estimation error), how might his strategy have been different? Analyze the cases where his estimate was most wrong.

In [None]:
# Exercise 3: What if the Colonel had perfect gradient knowledge?

# Find the stratagems with the largest gradient errors
stratagem_sorted = stratagem.sort_values('gradient_error', key=abs, ascending=False)

print("Top 10 Largest Gradient Estimation Errors:")
print("=" * 100)
cols = ['stratagem_id', 'stratagem_type', 'estimated_gradient', 'actual_gradient', 
        'gradient_error', 'outcome_category', 'casualties']
print(stratagem_sorted[cols].head(10).to_string(index=False))

# How many disasters could have been avoided?
disasters = stratagem[stratagem['outcome_category'] == 'disaster']
wrong_direction_disasters = disasters[~disasters['was_optimal_direction']]

print(f"\n\nTotal disasters: {len(disasters)}")
print(f"Disasters where Colonel went wrong direction: {len(wrong_direction_disasters)}")
print(f"Total casualties in wrong-direction disasters: {wrong_direction_disasters['casualties'].sum()}")
print("\nWith perfect gradient information, the Colonel could have avoided")
print(f"approximately {len(wrong_direction_disasters)} disasters and {wrong_direction_disasters['casualties'].sum()} casualties.")

### Exercise 4: Design Your Own Stratagem

Based on the gradient analysis, propose an optimal stratagem mix for the Colonel. Which stratagems should he favor? Which should he avoid?

In [None]:
# Exercise 4: Design optimal stratagem mix

# Calculate efficiency: progress per casualty and per supply
strat_efficiency = stratagem.groupby('stratagem_type').agg({
    'progress_delta': 'mean',
    'casualties': 'mean',
    'supply_cost': 'mean',
    'gradient_error': lambda x: x.abs().mean(),  # Mean absolute error
    'was_optimal_direction': 'mean'
}).round(4)

strat_efficiency.columns = ['avg_progress', 'avg_casualties', 'avg_supply', 'mae_gradient', 'direction_accuracy']

# Calculate efficiency metrics
strat_efficiency['progress_per_casualty'] = (
    strat_efficiency['avg_progress'] / (strat_efficiency['avg_casualties'] + 0.1)
).round(4)
strat_efficiency['progress_per_supply'] = (
    strat_efficiency['avg_progress'] / (strat_efficiency['avg_supply'] + 1)
).round(4)

print("Stratagem Efficiency Analysis:")
print("=" * 100)
print(strat_efficiency.to_string())

print("\n" + "="*50)
print("OPTIMAL STRATEGY RECOMMENDATION:")
print("="*50)
best_progress = strat_efficiency['avg_progress'].idxmax()
best_efficiency = strat_efficiency['progress_per_casualty'].idxmax()
best_gradient = strat_efficiency['direction_accuracy'].idxmax()

print(f"Highest average progress: {best_progress}")
print(f"Best progress per casualty: {best_efficiency}")
print(f"Best gradient estimation: {best_gradient}")
print("\nRecommendation: Mix high-progress stratagems with low-cost")
print("reconnaissance to maintain accurate gradient estimates.")

---

## Summary

| Concept | Key Insight | Colonel's Siege Example |
|---------|-------------|------------------------|
| **Derivative** | Measures sensitivity: how output changes per unit input change | How much does progress change when the Colonel adjusts his stratagem? |
| **Sign of Derivative** | Tells direction: positive = uphill, negative = downhill | Negative derivative of loss means the stratagem helps |
| **Magnitude** | Tells intensity: large = high sensitivity, small = diminishing returns | Early siege: high sensitivity; Late siege: low sensitivity |
| **Numerical Derivative** | Approximate by nudging: f'(x) ≈ (f(x+h) - f(x)) / h | Try a variation, observe the result, estimate the gradient |
| **Gradient Estimation** | In practice, we estimate gradients with noise | The Colonel can't see the true loss landscape |
| **Diminishing Returns** | Sensitivity decreases as you approach the optimum | The Tower becomes harder to crack with each year |

---

## Key Takeaways

1. **Derivatives measure sensitivity**: "If I nudge input, how much does output change?"

2. **The magnitude tells you how sensitive**: Large derivative = small input changes cause big output changes

3. **The sign tells you the direction**: Negative derivative of loss = more effort helps!

4. **In ML, we use this to improve models**: The derivative of loss with respect to parameters tells us how to adjust them

5. **The Colonel's challenge is our challenge**: He can't see the true loss landscape, only noisy estimates—just like stochastic gradient descent.

---

## Next Lesson

In **Lesson 2: The Gradient — Your Compass**, we'll extend this idea to functions with multiple inputs. The Colonel must choose between multiple stratagems simultaneously—he needs a whole *vector* of sensitivities. That's the gradient, his compass in the fog of war.

*"One derivative tells me how hard to push. But I have many levers to pull—ladders, tunnels, grapples, siege engines. I need a compass that points in n dimensions. I need the gradient."*  
— The Colonel