# üß™ Workshop 2: From Gradient Structure to Optimisation Dynamics

This workshop builds directly on Workshop 1 and marks the transition from gradient interpretation to explicit optimisation dynamics, opening Part 2 of the series.

Where Workshop 1 focused on what gradients are and how they are structured, this workshop focuses on what gradients do when they are applied repeatedly. Gradients are no longer treated as static sensitivity maps, but as drivers of parameter evolution.

The central shift in perspective is:
- from *‚Äúwhat does this gradient look like?‚Äù*
- to *‚Äúwhat happens when I follow it?‚Äù*

---

**Conceptual emphasis**

The workshop develops intuition for:
- how a single gradient descent step turns sensitivity into motion,
- how repeated local updates accumulate into global behaviour,
- how objective structure influences optimisation trajectories,
- and how gradient geometry affects stability, speed, and convergence.

Rather than introducing full training pipelines, the focus remains on controlled, interpretable systems where optimisation dynamics can be reasoned about directly.

--- 

**Key ideas explored include**:
- gradient descent as repeated application of vector‚ÄìJacobian products,
- implicit objective functions defined by upstream weighting,
- the relationship between gradient magnitude, direction, and parameter motion,
- conditioning and anisotropy in gradient-driven updates,
- how symmetry, nonlinearity, and curvature shape optimisation paths,
- and visualising optimisation as movement through parameter space.

--- 

**How this workshop fits in the series**

This workshop serves as the conceptual bridge between:
- gradient flow and sensitivity analysis (Workshop 1),
- and more advanced optimisation topics such as learning rates, curvature, and second-order effects (later in Part 2).

By the end of this workshop, gradient descent is no longer a formula, but a geometric process whose behaviour can be anticipated from gradient structure alone.

---

**What this workshop deliberately does not cover**
- Neural network modules (nn.Module)
- Optimisers such as Adam, RMSProp, etc.
- Datasets, batching, or training loops

Those elements are introduced only after optimisation dynamics are conceptually understood.

---

**Recommended prerequisites**
- Completion of Tutorials 1‚Äì4
- Workshop 1: From Gradient Flow to Optimisation Intuition
- Comfort with gradients, Jacobians, and basic optimisation ideas
- Familiarity with linear algebra and nonlinear mappings

---

**Author: Angze Li**

**Last updated: 2026-02-19**

**Version: v1.0**

## üß© Problem: Designing an Objective via Upstream Gradients

> Optimisation is not only about how to minimise a loss
> ‚Äî it is also about what objective you choose.

In this problem, you will implicitly define an objective by choosing an upstream gradient.

Consider:
```python
X = torch.randn(5, 3, requires_grad=True)

Y = torch.tanh(X @ X.T)
```
Here:
- `Y` is a **5√ó5 tensor** measuring pairwise interactions,
- the output is *symmetric and non-scalar*.

---

### üîß Task
1. Construct an upstream gradient matrix V such that:
    - diagonal entries of Y are emphasised,
    - off-diagonal entries are penalised.
2. Call:
```python
Y.backward(V)
```
3. Inspect `X.grad`.

---

### üß† Questions to think about
- What implicit scalar objective are you optimising?
- How does changing the diagonal/off-diagonal weighting affect `X.grad`?
- Which entries of `X` are encouraged to grow or shrink?
- Can you interpret this as encouraging **self-similarity** over **cross-similarity**?

---

### üí° Hint (optional)

> You are not optimising `Y` directly.
> You are optimising a **weighted trace-like** functional of `Y`.

---

### üéØ Why this problem matters (Bridge to Part 2)

This problem quietly introduces:
- custom objective design,
- structure-aware optimisation,
- gradients as *design tools*, not just training signals.

Without using:
- optimisers,
- learning rates,
- training loops,

you have already answered:

>‚ÄúIf I *were* to optimise this system, what direction would the parameters move?‚Äù

That is exactly the mindset needed for Part 2.

## Solution

**After you have finished this problem, please refer to the full version of this notebook for the model answer and the trailer to Part 2.**