# Gradients and Linear Approximations

We consider a scalar function:

$$
f: \mathbb{R}^2 \rightarrow \mathbb{R}
$$

Let $v = (v_1, v_2) \in \mathbb{R}^2$ be a point where we evaluate the gradient, and $y = (y_1, y_2) \in \mathbb{R}^2$ be a nearby point.

### 🔹 Step-by-step Linear Approximation

#### Step 1: Approximate along $x_1$-direction (keeping $x_2 = v_2$)

$$
f(y_1, v_2) \approx f(v_1, v_2) + \frac{\partial f}{\partial x_1}(v) \cdot (y_1 - v_1)
$$

#### Step 2: Rearranged Form

$$
f(y_1, v_2) - f(v_1, v_2) \approx \frac{\partial f}{\partial x_1}(v) \cdot (y_1 - v_1)
$$

#### Step 3: Approximate along $x_2$-direction (keeping $x_1 = v_1$)

$$
f(v_1, y_2) - f(v_1, v_2) \approx \frac{\partial f}{\partial x_2}(v) \cdot (y_2 - v_2)
$$

#### Step 4: Combine both directions

$$
f(y_1, y_2) - f(v_1, v_2) \approx \frac{\partial f}{\partial x_1}(v) \cdot (y_1 - v_1) + \frac{\partial f}{\partial x_2}(v) \cdot (y_2 - v_2)
$$

#### ✅ Final Compact Form (Using Gradient Vector)

$$
f(y_1, y_2) \approx f(v_1, v_2) + \nabla f(v)^T \cdot (y - v)
$$

Where:

* $\nabla f(v) = \begin{bmatrix} \frac{\partial f}{\partial x_1}(v) \\ \frac{\partial f}{\partial x_2}(v) \end{bmatrix}$
* $y - v = \begin{bmatrix} y_1 - v_1 \\ y_2 - v_2 \end{bmatrix}$

So the dot product:

$$
\nabla f(v)^T (y - v) = \frac{\partial f}{\partial x_1}(v)(y_1 - v_1) + \frac{\partial f}{\partial x_2}(v)(y_2 - v_2)
$$

This is the **first-order Taylor approximation** of a multivariable function near point $v$.

Let:

$$
f(x_1, x_2) = x_1^2 + x_2^2, \quad \nabla f(x) = \begin{bmatrix} 2x_1 \\ 2x_2 \end{bmatrix}
$$

### i) Approximate \( f \) around \( (6, 2) \):

Given:

$$
f(v) = 40, \quad \nabla f(v) = \begin{bmatrix} 12 \\ 4 \end{bmatrix}
$$

Then:

$$
f(x) \approx 40 + \begin{bmatrix} 12 & 4 \end{bmatrix} \begin{bmatrix} x_1 - 6 \\ x_2 - 2 \end{bmatrix}
$$

Expanding:

$$
= 40 + 12(x_1 - 6) + 4(x_2 - 2)
$$

$$
= 40 + 12x_1 + 4x_2 - 72 - 8
$$

$$
= 12x_1 + 4x_2 - 40
$$

Where \( (x_1, x_2) \in \mathbb{R}^2 \)


## Gradients and Tangent Planes
- The graph of $ L_v[f] $ is a **plane** that is **tangent** to the graph of \( f \) at the point \( (v, f(v)) \).

## Gradients and Contours

Let:

$$
v = \begin{bmatrix} -6 \\ 2 \end{bmatrix}, \quad \nabla f(v) = \begin{bmatrix} -12 \\ 4 \end{bmatrix}
$$

We know that the gradient at \( v \) is perpendicular to the level set (contour line) of \( f \) at that point.

$$
\nabla f(v) \perp \left\{ x \in \mathbb{R}^d : f(x) = f(v) \right\}
$$

This also implies:

$$
\nabla f(v) \perp \left\{ x \in \mathbb{R}^d : L_v[f](x) = f(v) \right\}
$$

Which leads to:

$$
\left\{ x \in \mathbb{R}^d : f(v) + \nabla f(v)^T (x - v) = f(v) \right\}
$$

$$
\left\{ x \in \mathbb{R}^d : \nabla f(v)^T (x - v) = 0 \right\}
$$

$$
\left\{ x \in \mathbb{R}^d : w^T x = b \right\}
$$

This represents a **hyperplane** that is tangent to the level set at \( v \), and the gradient is orthogonal (normal) to this hyperplane.


## Directional Derivative

The directional derivative of $ f $ at the point $ v $, along direction $ u $, is defined as:

$$
D_u[f](v) = \lim_{\alpha \to 0} \frac{f(v + \alpha u) - f(v)}{\alpha}
$$

Using the first-order Taylor expansion:

$$
= \lim_{\alpha \to 0} \frac{f(v) + \nabla f(v)^T \alpha u - f(v)}{\alpha}
$$

Simplifying:

$$
= \nabla f(v)^T u
$$

## Cauchy–Schwarz Inequality

Let 

$$
a = (a_1, a_2, \dots, a_d), \quad b = (b_1, b_2, \dots, b_d)
$$

The norm of a vector \( a \) is:

$$
\|a\| = \sqrt{a_1^2 + \cdots + a_d^2}
$$

The Cauchy–Schwarz inequality states:

$$
- \|a\| \cdot \|b\| \leq a^T b \leq \|a\| \cdot \|b\|
$$

- If $ a = \alpha b $ and $ \alpha < 0 $, then $ a^T b \leq 0 $
- If $ a = \alpha b $ and $ \alpha > 0 $, then $ a^T b \geq 0 $

## Direction of Steepest Ascent

Let $ f $ be a differentiable function.  
We aim to find a direction $ u $ that maximizes the rate of change of $ f $ as you move from $ v $ along $ u $.

Maximize:

$$
D_u[f](v)
$$

Subject to:

$$
u \in \mathbb{R}^d, \quad \|u\| = 1
$$

Then,

$$
D_u[f](v) = \nabla f(v)^T u
$$

To maximize $ \nabla f(v)^T u $ under the constraint $ \|u\| = 1 $,  
choose:

$$
u = \alpha \nabla f(v)
$$

for some scalar $ \alpha $.


### Descent Directions

Let  
$f: \mathbb{R}^d \rightarrow \mathbb{R}$  
$v \in \mathbb{R}^d$

What are the **valid directions**, such that $f$ decreases?

For what values of $u$:  
$D_u[f](v) < 0$  
$\Downarrow$  
$\nabla f(v)^T u < 0$

**Descent directions**:  
$\left\{ u \in \mathbb{R}^d : \nabla f(v)^T u < 0 \right\}$

### Higher Order Approximations

Let  
$f: \mathbb{R}^d \rightarrow \mathbb{R}$

---

**First-order approximation:**

$$
f(x) \approx f(v) + \nabla f(v)^T (x - v)
\quad \text{(Valid around } x = v \text{)}
$$

---

**Second-order approximation:**

$$
f(x) \approx f(v) + \nabla f(v)^T (x - v) + \frac{1}{2} (x - v)^T \nabla^2 f(v) (x - v)
$$

Here,  
$\nabla^2 f(v)$ is the **Hessian**, a $d \times d$ matrix.

### Maxima, Minima, and Saddle Points

If $f(x)$ is minimised at $v$,  
then  
$$
\nabla f(v) = 0
$$

---

The set of points  
$$
\left\{ v : \nabla f(v) = 0 \right\}
$$  
are called **critical points**.