# Day 8 — "Hessians, Curvature & Second-Order Optimization"

Gradients tell you which way is downhill; Hessians describe how that downhill path bends. Second-order information reshapes the loss landscape into something easier to navigate.

## 1. Core Intuition

- Gradient = slope; Hessian = how the slope curves.
- Knowing curvature means you can slow down in steep directions and speed up in flat ones.
- Second-order methods use this curvature map to avoid zigzags.


## 2. Curvature Concepts

| Curvature type | Hessian sign | Behavior |
| --- | --- | --- |
| Positive | > 0 | Bowl: one global min. |
| Negative | < 0 | Concave: you're on a peak. |
| Mixed | varies | Saddles: up in one direction, down in another. |


## 3. Hessian Matrix

- `H = ∇²L(θ)` collects all second partial derivatives.
- Eigenvectors of `H` give directions; eigenvalues give curvature strength.


## 4. Python Implementation — Hessian & Newton Steps

`days/day08/code/hessian_demo.py` computes loss, gradient, Hessian, and Newton updates.

In [1]:
from __future__ import annotations

import sys
from pathlib import Path
import numpy as np


def find_repo_root(marker: str = "days") -> Path:
    path = Path.cwd()
    while path != path.parent:
        if (path / marker).exists():
            return path
        path = path.parent
    raise RuntimeError("Run this notebook from inside the repository tree.")

REPO_ROOT = find_repo_root()
if str(REPO_ROOT) not in sys.path:
    sys.path.append(str(REPO_ROOT))

from days.day08.code.hessian_demo import grad, hessian, loss, newton_step

w = np.array([2.5, -2.0])
print('Loss at w:', loss(w))
print('Gradient at w:', grad(w))
print('Hessian:\n', hessian(None))
print('Newton step:', newton_step(w))


Loss at w: 18.75
Gradient at w: [13.4 -2. ]
Hessian:
 [[6.  0.8]
 [0.8 2. ]]
Newton step: [ 0.00000000e+00 -2.22044605e-16]


## 5. Visualization — Newton vs Gradient Descent

`days/day08/code/visualizations.py` renders the surface and an animation comparing GD to Newton.

In [2]:
from days.day08.code.visualizations import render_surface, anim_newton_vs_gd

RUN_ANIMATIONS = False

if RUN_ANIMATIONS:
    surf = render_surface('Quadratic Loss Surface', '00_loss_surface.png')
    gif = anim_newton_vs_gd()
    print('Saved assets:', surf, gif)
else:
    print('Set RUN_ANIMATIONS = True to regenerate Day 8 figures in days/day08/outputs/.')


Set RUN_ANIMATIONS = True to regenerate Day 8 figures in days/day08/outputs/.


## 7. Why Newton’s Method Is Rare in DL

- Hessian is huge (millions × millions).
- Computing and inverting it is expensive.
- Stochasticity makes Hessian noisy.

Yet modern optimizers mimic second-order ideas (Adam, RMSProp, natural gradient, K-FAC).

## 8. Mini Exercises

1. Reduce the learning rate to see GD converge more gracefully on curved valleys.
2. Inject noise into gradients and observe how Newton jumps become unstable.
3. Approximate Hessian eigenvalues numerically using finite differences.
4. Compare Adam steps to Newton’s step on this quadratic (diagonal approximation).


## 9. Key Takeaways

| Concept | Meaning |
| --- | --- |
| Hessian | Describes curvature; second derivatives all in one matrix. |
| Eigenvalues | Curvature magnitude/direction. |
| Newton step | Rescales gradients by curvature; no zigzag on quadratics. |
| Second-order optimizers | Adam/RMSProp approximate Hessian inverses cheaply. |
| Practicality | Full Hessians are infeasible for deep nets; approximations rule. |