# Goal of this Lecture Note

In these notes, we want to investigate the role of the *learning rate*, in particular its magnitude, in the gradient descent algorithm, starting with the simplest possible setting.

---

## Setup

Consider the general formulation of gradient descent.  
Let $\beta \in \mathbb{R}^d$ be the parameters we want to estimate, and let $\mathcal{L}(\beta)$ be the objective (e.g., loss function) we want to minimize.  
Gradient descent updates the parameters according to

$$
\hat{\beta}_{t+1} = \hat{\beta}_t - \lambda \nabla_{\beta} \mathcal{L}(\hat{\beta}_t),
$$

where $\lambda > 0$ is the **learning rate**.

---

## Why the Magnitude of the Learning Rate Matters

The learning rate controls the fundamental trade-off between **speed** and **stability**.

The gradient provides the *direction* of steepest descent, but the learning rate determines the *distance* traveled along that direction at each step.

- If **$\lambda$** is *small*, each update is cautious. Convergence is stable but slow.
- If **$\lambda$** is *large but still below the stability threshold*, updates are aggressive and convergence is fast.
- If **$\lambda$** is *too large*, the algorithm overshoots the minimum, oscillates, and can eventually diverge.

To study these behaviors precisely and transparently, we now turn to a simple model where everything can be computed explicitly: **linear regression with an intercept and slope**.

---

## A Simple Test Case: Linear Regression

We consider the parameter vector

$$
\beta =
\begin{pmatrix}
\beta^{(0)} \\
\beta^{(1)}
\end{pmatrix}
\in \mathbb{R}^2,
$$

and the (mean squared error) loss function

$$
\mathcal{L}(\beta)
=
\frac{1}{N}
\sum_{i=1}^N \big( y_i - \beta^{(0)} - \beta^{(1)} x_i \big)^2,
$$

where the dataset is

$$
\mathcal{D} = \{(x_i, y_i)\}_{i=1}^N.
$$

This setting is simple enough to allow us to derive the gradient, Hessian, and the exact dynamical system generated by gradient descent, making it ideal for studying the role of the learning rate.


## Calculating the gradient