Here are **in-depth notes** for **L8.4: Solving an Unconstrained Optimization Problem (Part 2)**, covering all the concepts mentioned in the lecture transcript:

---

# 📘 Lecture 8.4: Solving an Unconstrained Optimization Problem (Part 2)

## 🧠 Objective

To understand the importance of step size (also called learning rate) in iterative optimization algorithms like gradient descent, especially for unconstrained optimization problems.

---

## 🧮 Recap of the Basic Gradient Descent Update Rule

We aim to minimize a differentiable function $f(x)$.
Gradient descent update rule:

$$
x_{t+1} = x_t - \eta_t \cdot f'(x_t)
$$

* $x_t$: Current iterate
* $f'(x_t)$: Gradient (or derivative) at current iterate
* $\eta_t$: Step size (also called learning rate) at time $t$

---

## ⚠️ Motivation: Why Step Size Matters

* The **direction** of update (i.e., $-f'(x_t)$) may be correct, but **magnitude** (i.e., step size $\eta_t$) determines whether:

  * We **oscillate** (steps too large)
  * We **stall** (steps too small)
  * We **converge** (steps chosen properly)

---

## 🧩 Issue 1: No Step Size (Constant Step Size = 1)

Example: $f(x) = (x - 5)^2$

* If you always use $\eta_t = 1$:

  * You may **oscillate** around the minimum.
  * Particularly problematic if starting far from the optimum.

---

## 💡 Introducing Step Size $\eta_t$

To control how much we move in the direction of the negative gradient:

* Multiply the direction by a **scalar** $\eta_t$
* $\eta_t \in \mathbb{R}_+$
* Ideally, $\eta_t$ should be:

  * **Positive**
  * **Decreasing** with time (to ensure convergence)
  * **Not too small** (to prevent stagnation)

---

## 🔁 First Attempt: Exponentially Decreasing Step Size

$$
\eta_t = \frac{1}{2^t}
$$

Example:

* $\eta_0 = 1$
* $\eta_1 = \frac{1}{2}$
* $\eta_2 = \frac{1}{4}$
* $\eta_3 = \frac{1}{8}$
* …

### 📉 Problem with this Step Size

* Suppose:

  * Start at $x_0 = 2$
  * Minimum is at $x^\star = 5$
  * Assume gradient is always 1 (i.e., direction is always correct and constant)
* Update:

  * $x_1 = x_0 + 1 = 3$
  * $x_2 = x_1 + 0.5 = 3.5$
  * $x_3 = x_2 + 0.25 = 3.75$
  * ...
  * Eventually, $x_t \to 4$, **never reaches** 5

### ❌ Root Cause

* The total distance covered is a geometric series:

  $$
  \sum_{t=0}^\infty \frac{1}{2^t} = 2
  $$

* If you need to travel a distance more than 2, this step size **can never get you there**.

* So, though the **direction** is always correct, the **magnitude** is insufficient → we get stuck.

---

## 🎯 Conflicting Goals for Step Size

1. **Reduce step size** over time → avoid oscillation.
2. **Don’t reduce it too much** → ensure you can reach the minimum, even from far.

This tension motivates the search for a better strategy.

---

## ✅ Improved Step Size Strategy: Harmonic Sequence

$$
\eta_t = \frac{1}{t+1}
$$

Sequence:

* $\eta_0 = 1$
* $\eta_1 = \frac{1}{2}$
* $\eta_2 = \frac{1}{3}$
* $\eta_3 = \frac{1}{4}$
* …

### ✨ Why This Works Better

* Though values decrease, the total sum **diverges**:

$$
\sum_{t=0}^\infty \frac{1}{t+1} = \infty \quad \text{(harmonic series)}
$$

* So: even with decreasing step size, the **total movement** can reach any point.
* Ensures that the algorithm **can reach any minimum**, regardless of starting point.

---

## ✅ Comparison of Step Size Sequences

| Step Size | Formula         | Total Sum              | Behavior        |
| --------- | --------------- | ---------------------- | --------------- |
| Bad       | $\frac{1}{2^t}$ | Converges (e.g., to 2) | May get stuck   |
| Good      | $\frac{1}{t+1}$ | Diverges ($\infty$)    | Reaches minimum |

---

## 🔬 Intuition Summary

* Gradient gives **direction**
* Step size decides **how far**
* Step size must:

  * Be **decreasing** (avoid overshooting)
  * But **not too quickly** (to avoid stagnation)
* A good step size like $\eta_t = \frac{1}{t+1}$ balances both goals.

---

## 🧠 Key Takeaways

* **Direction vs. Magnitude**: Right direction isn’t enough—magnitude matters.
* **Too large step size** → oscillation
* **Too small step size** → stagnation
* **Good step size**:

  * Decreases with time
  * Has infinite cumulative sum (e.g., harmonic series)

---

## 📘 Learning Outcome Recap

* Understood the critical role of **step size** in optimization.
* Analyzed **failures** and **successes** of different step size schedules.
* Recognized the need to balance **convergence** and **progress**.
* Learned how the **harmonic sequence** helps resolve this tension effectively.