# Day 3 — "Rolling Down the Hill": Gradient Descent and Optimization Intuition

Training a neural network is like rolling a marble through fog: gradients tell you which way is downhill, even when you cannot see the full landscape.

## 1. Core Intuition

- The loss surface is a mountain; weights are the marble's coordinates.
- Gradients are the local slopes that guide you toward lower loss.
- Large slopes accelerate movement; gentle slopes slow progress.
- Each update nudges the model downhill, sculpting it into a better predictor.

## 2. Mathematical Story — The Downhill Equation

| Concept | Formula | Intuitive Meaning |
| --- | --- | --- |
| Loss function | `L(θ)` | Landscape we minimize (prediction error). |
| Gradient | `∇_θ L = [∂L/∂θ₁, ∂L/∂θ₂, …]` | Local slope / direction of steepest ascent. |
| Update rule | `θ_{t+1} = θ_t - η ∇_θ L(θ_t)` | Step opposite the gradient to descend. |
| Learning rate | `η` | Step size controlling how far each update travels. |
| Optimum | `∇_θ L = 0` | Flat spot / valley bottom. |

Example bowl: `L(x, y) = 0.5 (x² + y²)` ⇒ `∇L = [x, y]` and each iteration shrinks both coordinates by `(1 - η)`.

## 3. Python Implementation — The Math as Code

`days/day03/code/gradient_descent.py` encapsulates the quadratic bowl, its gradient, and a reusable runner.

In [None]:
from __future__ import annotations

import sys
from pathlib import Path

import numpy as np


def find_repo_root(marker: str = "days") -> Path:
    path = Path.cwd()
    while path != path.parent:
        if (path / marker).exists():
            return path
        path = path.parent
    raise RuntimeError("Run this notebook from inside the repository tree.")

REPO_ROOT = find_repo_root()
if str(REPO_ROOT) not in sys.path:
    sys.path.append(str(REPO_ROOT))

from days.day03.code.gradient_descent import GradientDescentRunner, QuadraticBowl

runner = GradientDescentRunner(bowl=QuadraticBowl(), lr=0.2, steps=10)
path = runner.run([2.5, -2.0])
print(path)


The coordinates shrink toward `(0, 0)`—our valley minimum. Increasing the learning rate moves faster but risks oscillation; decreasing it slows convergence.

## 4. Visualization — Watching the Marble Roll Downhill

`days/day03/code/visualizations.py` generates:
1. Gradient descent trajectory on a contour plot.
2. Learning-rate comparison (slow vs. ideal vs. oscillatory).
3. Gradient field quiver plot (optional still image).

In [None]:
from days.day03.code.visualizations import (
    anim_gradient_descent,
    anim_learning_rates,
    render_gradient_field,
)

RUN_ANIMATIONS = False

if RUN_ANIMATIONS:
    assets = [
        anim_gradient_descent(),
        anim_learning_rates(),
        render_gradient_field(),
    ]
    for asset in assets:
        print(f"Saved asset → {asset}")
else:
    print('Set RUN_ANIMATIONS = True to regenerate GIFs/PNGs in days/day03/outputs/.')


## 5. Deep Learning & Computer Vision Connections

| Concept | In Practice |
| --- | --- |
| Gradient descent | Core learning rule for neural nets (SGD, Adam build on it). |
| Learning rate | Governs stability vs. speed of convergence. |
| Loss surface | Encodes model quality; diagnostics via loss landscapes. |
| Gradient | Guides saliency maps and adversarial example crafting. |
| Local minima | Different solutions with similar performance; initialization matters. |

## 6. Mini Exercises

1. Sweep `η` from `0.05` → `0.5` and observe when oscillations/divergence appear.
2. Modify `bowl_loss` to include stronger cross terms (e.g., `+ 2xy`) to twist the valley.
3. Start from multiple initial points and plot their trajectories on the same contour.
4. Add random Gaussian noise to the gradient to mimic stochastic updates.


## 7. Key Takeaways

| Concept | Meaning |
| --- | --- |
| Gradient | Local direction of steepest ascent (flip sign to descend). |
| Descent step | `-η ∇L` — the nudge that lowers loss. |
| Learning rate | Trade-off between speed and stability. |
| Loss surface | High-dimensional landscape encoding model performance. |
| Convergence | Landing in a flat, low-loss region ⇒ learned parameters. |

> Optimization is not magic—it’s just a marble following gravity in a high-dimensional valley.