# Day 6 — "Convexity, Curvature & Loss Landscapes": Seeing the Shape of Learning

Understanding optimization means understanding the terrain gradients travel across. Bowls are easy; twisted valleys and wavy surfaces explain why deep nets need momentum, learning-rate tuning, and careful initialization.

## 1. Core Intuition

- Convex landscapes (smooth bowls) guarantee a single global minimum.
- Non-convex landscapes contain ridges, saddles, and plateaus typical in deep nets.
- Optimization is hiking with gradients; landscape shape dictates speed and stability.
- Better terrain intuition explains tricks like momentum, weight decay, and overparameterization.

## 2. Concepts Explained Intuitively

| Concept | Intuition | Why it matters |
| --- | --- | --- |
| Convexity | One bowl, one bottom; lines between points stay above the surface. | No traps → guaranteed convergence. |
| Curvature | How sharp or flat the bowl is; captured by Hessian eigenvalues. | Determines learning-rate choices and zigzag behavior. |
| Non-convexity | Multiple valleys, saddles, plateaus. | Explains slow training & need for momentum/initialization. |
| Sharp vs flat minima | Narrow vs wide basins. | Flat minima correlate with better generalization. |

## 3. Python Implementation — Surfaces & Paths

`days/day06/code/landscapes.py` defines convex bowls, banana valleys, wavy surfaces, and gradient descent paths.

In [1]:
from __future__ import annotations

import sys
from pathlib import Path


def find_repo_root(marker: str = "days") -> Path:
    path = Path.cwd()
    while path != path.parent:
        if (path / marker).exists():
            return path
        path = path.parent
    raise RuntimeError("Run this notebook from inside the repository tree.")

REPO_ROOT = find_repo_root()
if str(REPO_ROOT) not in sys.path:
    sys.path.append(str(REPO_ROOT))

from days.day06.code.landscapes import convex_bowl, banana, waves, banana_grad, gd_path, GDConfig

print('Convex bowl at (1,1) =', convex_bowl(1.0, 1.0))
print('Banana valley at (1,1) =', banana(1.0, 1.0))
print('Waves at (1,1) =', waves(1.0, 1.0))

config = GDConfig(lr=1e-3, steps=100)
path = gd_path([-1.5, 1.5], banana_grad, config)
print('GD path tail on banana valley:', path[-3:])


Convex bowl at (1,1) = 1.0
Banana valley at (1,1) = 1.0
Waves at (1,1) = 0.6546487134128409
GD path tail on banana valley: [[-1.19588247  1.5148514 ]
 [-1.19551692  1.51400423]
 [-1.19515214  1.5131568 ]]


## 4. Visualization — Surfaces & Banana Valley GD

`days/day06/code/visualizations.py` renders the surfaces and animates gradient descent on the banana valley.

In [2]:
from days.day06.code.visualizations import (
    render_surface,
    anim_banana_descent,
)

RUN_ANIMATIONS = False

if RUN_ANIMATIONS:
    convex = render_surface(convex_bowl, 'Convex Bowl', '01_convex_bowl.png')
    banana_img = render_surface(banana, 'Rosenbrock Banana Valley', '02_banana_valley.png', cmap='viridis', azim=20)
    waves_img = render_surface(waves, 'Non-Convex Wavy Landscape', '03_wavy_surface.png', cmap='plasma', azim=30)
    gd_gif = anim_banana_descent()
    for asset in [convex, banana_img, waves_img, gd_gif]:
        print('Saved asset →', asset)
else:
    print('Set RUN_ANIMATIONS = True to regenerate Day 6 figures in days/day06/outputs/.')


Set RUN_ANIMATIONS = True to regenerate Day 6 figures in days/day06/outputs/.


## 5. Deep Learning & CV Connections

| Concept | Why it matters in DL/CV |
| --- | --- |
| Convexity | Rare but foundational; provides theoretical guarantees. |
| Curvature/Hessian | Governs stability, informs learning-rate schedules, and motivates second-order/adaptive methods. |
| Sharp vs flat minima | Sharp minima often overfit; flat minima generalize better (influenced by batch size & weight decay). |
| Saddle points | Common in wide networks; cause stalls without momentum. |
| Overparameterization | Larger models often produce smoother landscapes, making optimization easier. |

## 6. Mini Exercises

1. Modify the wavy surface to add multiple frequencies and render again.
2. Vary the learning rate (1e-4 vs 1e-2) in `anim_banana_descent` and observe overshoot vs progress.
3. Inject momentum from Day 4 into the banana valley GD path and compare speed.
4. Plot gradient magnitudes (quiver plots) to visualize saddle points.


## 7. Key Takeaways

| Concept | Meaning |
| --- | --- |
| Convexity | Single global minimum, easy optimization. |
| Curvature | Direction-dependent steepness shaping gradient paths. |
| Hessian eigenvalues | Encode landscape shape and difficulty. |
| Saddle points | Flat-ish spots where GD slows dramatically. |
| Overparameterization | Smooths loss surfaces, helping large nets converge. |

> To understand learning, study the land your gradients traverse.