Here are **in-depth notes** for **L8.3: Solving an Unconstrained Optimization Problem (Part 1)** from the *Foundations of Machine Learning Theory* course. The lecture focuses on understanding optimization from a computational perspective, starting with a very simple unconstrained example, and introduces the foundational idea behind **gradient descent**.

---

## 🔍 Lecture Focus

* How to solve unconstrained optimization problems.
* Deriving computational methods (for computers) to solve problems where manual/analytical solutions may not scale.
* Establishing the motivation for **iterative gradient-based methods**.

---

## 🧩 1. What is an Unconstrained Optimization Problem?

### General Optimization Problem:

Minimize:

$$
f(x)
$$

Subject to:

* $g_i(x) \leq 0$ (Inequality constraints)
* $h_j(x) = 0$ (Equality constraints)

> ✅ **Unconstrained Optimization** means **no constraints** on $x$:

$$
\min_{x \in \mathbb{R}} f(x)
$$

---

## 🧪 2. Simple Motivating Example

### Problem:

$$
\min_{x \in \mathbb{R}} (x - 5)^2
$$

### Intuition:

* Clearly minimized at $x = 5$
* Minimum value = $0$
* Reason: Squared function ≥ 0 for all real $x$, and becomes 0 only when $x = 5$

---

## 🧮 3. Classical (Analytical) Approach

Let $f(x) = (x - 5)^2$

### Step-by-step:

* Compute derivative:

  $$
  f'(x) = 2(x - 5)
  $$
* Set derivative to 0 to find stationary point:

  $$
  f'(x) = 0 \Rightarrow x = 5
  $$
* Confirm it’s a minimum (via second derivative test or function shape)

> ⚠️ This method works for simple functions but **fails** for complex or higher-degree polynomials due to:

* Nonlinearities
* Higher-order equations that are not easily solvable (e.g., degree 5+)

---

## 🧪 Example That Breaks the Classical Method

$$
\min_{x \in \mathbb{R}} 3x^6 + 2x^5 + 3x^3 + 5x^2 + 2
$$

* First derivative:

  $$
  f'(x) = 18x^5 + 10x^4 + 9x^2 + 10x
  $$
* Difficult to solve $f'(x) = 0$ analytically due to:

  * Degree 5 polynomial
  * No general formula for roots of degree > 4

---

## ⚙️ 4. Need for Systematic Computational Procedure

We need a method that:

* Doesn’t rely on solving high-degree polynomial equations
* Is **iterative** and **computer-friendly**
* Works for a wide variety of functions

---

## 🌀 5. Iterative Update Idea (Gradient Descent)

Start from an arbitrary $x_0 \in \mathbb{R}$

### Goal:

Improve guess $x_t$ over iterations to reach the minimum

### General Update Rule:

$$
x_{t+1} = x_t + d
$$

* $d$ is the **direction of movement**

---

## 🧭 6. Direction: What Makes a Good $d$?

Let’s look at the function again:

$$
f(x) = (x - 5)^2
$$

Plot reveals:

* If $x > 5$, move **left** to reduce $f(x)$
* If $x < 5$, move **right** to reduce $f(x)$

### Therefore:

* If $x > 5$, we want $d < 0$
* If $x < 5$, we want $d > 0$

So $d$ must:

* Be a **function of $x$**
* Change **direction** depending on whether $x$ is greater or less than 5

---

## 🧮 7. Using the Derivative to Determine Direction

Let:

$$
f(x) = (x - 5)^2 \Rightarrow f'(x) = 2(x - 5)
$$

Observation:

* If $x > 5$, $f'(x) > 0$
* If $x < 5$, $f'(x) < 0$

But we want the **opposite sign** for movement!

### So we define:

$$
d = -f'(x)
$$

This gives us:

* $d < 0$ when $x > 5$ → move left
* $d > 0$ when $x < 5$ → move right

✅ Matches desired behavior!

---

## 🔁 8. Iterative Gradient Descent Algorithm (Prototype)

### Update Rule:

$$
x_{t+1} = x_t - f'(x_t)
$$

For $f(x) = (x - 5)^2$, we have:

$$
x_{t+1} = x_t - 2(x_t - 5)
$$

---

## 🧪 9. Example with Bad Behavior (Why We Need Step Size)

Start with $x_0 = 10$

### First step:

$$
f'(10) = 2(10 - 5) = 10 \Rightarrow x_1 = 10 - 10 = 0
$$

### Second step:

$$
f'(0) = 2(0 - 5) = -10 \Rightarrow x_2 = 0 - (-10) = 10
$$

And this continues:

$$
x_3 = 0, \quad x_4 = 10, \quad x_5 = 0, \ldots
$$

⛔ **Oscillation!** Never converges to minimum.

---

## 🚨 10. Root Cause: Overshooting the Minimum

* Direction is **correct**
* But **step is too large**

### Insight:

We need to **control the magnitude** of the update:

* Moving too far = overshooting
* Need to take **smaller steps**

---

## 🔧 11. Fix: Introduce a Step Size (Learning Rate)

Introduce $\eta > 0$ (small scalar step size)

### Modified Update Rule:

$$
x_{t+1} = x_t - \eta f'(x_t)
$$

Where:

* $\eta$ controls how fast/slow we move
* Choosing $\eta$ is crucial:

  * Too large: oscillate or diverge
  * Too small: converge slowly

---

## 📌 Summary of Key Concepts

| Concept                           | Description                                                  |
| --------------------------------- | ------------------------------------------------------------ |
| **Unconstrained Optimization**    | No constraints; only objective function to minimize          |
| **Derivative-based Minimization** | Set $f'(x) = 0$ to find minima (only works for simple cases) |
| **Gradient Descent (1D)**         | Use update rule $x_{t+1} = x_t - \eta f'(x_t)$               |
| **Direction**                     | Chosen as negative of gradient to descend function           |
| **Problem Without Step Size**     | Oscillates between values; fails to converge                 |
| **Solution**                      | Add step size $\eta$ to control movement                     |

---

## 🎯 Learning Outcomes Recap

* ✅ Understood how to model an unconstrained optimization problem.
* ✅ Recognized the limitation of analytic methods for complex functions.
* ✅ Learned the principle of gradient descent.
* ✅ Saw the necessity of adding a step size to prevent oscillation.