# Gradients, Jacobians, and Hessians: Mathematical Notation

---

### **Gradient**
Consider a scalar valued function $f(\text{X})$ defined on $n$ variables; that is, $f: \mathbb{R}^n \to \mathbb{R}$, where $\text{X}=(x_1,x_2,\cdots \cdots, x_n)^T$ $= \begin{pmatrix} x_1 \\ x_2 \\ \dots \\ x_n \end{pmatrix}_{n \times 1}$

The **gradient** of  $f(\text{X})$ defined as:

$$
\nabla f(\text{X}) = \begin{pmatrix} \frac{\partial f}{\partial x_1} \\ \frac{\partial f}{\partial x_2} \\ \vdots \\ \frac{\partial f}{\partial x_n} \end{pmatrix}_{n \times 1}
$$

Where $\frac{\partial f}{\partial x_i}$ is the partial derivative of $f$ with respect to $x_i$ for $i = 1, 2, \dots, n$.

#### **Example**:

For a function $f(\text{X}) = x_1^2 + x_2^2 + \dots + x_n^2$, the gradient is:

$$
\nabla f(\text{X}) = \begin{pmatrix} 2x_1 \\ 2x_2 \\ \vdots \\ 2x_n \end{pmatrix}
$$

---

### **Hessian**

The **Hessian** matrix is a square matrix of second-order partial derivatives of a scalar-valued function $f: \mathbb{R}^n \to \mathbb{R}$ defined on a vector $\text{X}=(x_1,x_2,\cdots \cdots, x_n)^T$. Then, the Hessian matrix $H(\text{X})$ is defined as:

$$
H(\text{X}) = \begin{pmatrix}
\frac{\partial^2 f}{\partial x_1^2} & \frac{\partial^2 f}{\partial x_1 \partial x_2} & \dots & \frac{\partial^2 f}{\partial x_1 \partial x_n} \\
\frac{\partial^2 f}{\partial x_2 \partial x_1} & \frac{\partial^2 f}{\partial x_2^2} & \dots & \frac{\partial^2 f}{\partial x_2 \partial x_n} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial^2 f}{\partial x_n \partial x_1} & \frac{\partial^2 f}{\partial x_n \partial x_2} & \dots & \frac{\partial^2 f}{\partial x_n^2}
\end{pmatrix}
$$

Where each entry $\frac{\partial^2 f}{\partial x_i \partial x_j}$ is the second partial derivative of $f$ with respect to $x_i$ and $x_j$.

#### **Example**:

For a function $f(\text{X}) = x_1^2 + x_2^2$, the Hessian matrix is:

$$
H(\text{X}) = \begin{pmatrix} 2 & 0 \\ 0 & 2 \end{pmatrix}
$$

This Hessian matrix represents the second derivatives of $f$ with respect to each component of $\text{X}$.

---

### **Jacobian**

The **Jacobian** matrix is a generalization of the gradient for vector-valued functions. For a vector function $g: \mathbb{R}^n \to \mathbb{R}^m$, where $\text{X} = \begin{pmatrix} x_1 \\ x_2 \\ \dots \\ x_n \end{pmatrix}$ is an $n$-dimensional vector, and the output is a vector $\mathbf{y} = \begin{pmatrix} y_1 \\ y_2 \\ \dots \\ y_m \end{pmatrix}$ in $\mathbb{R}^m$

That is,

$$y_1=f_1(x_1,x_2, \cdots \cdots, x_n)$$

$$y_2=f_2(x_1,x_2, \cdots \cdots, x_n)$$

$$\ddots \quad \ddots \quad \ddots  \quad \ddots \quad \ddots$$

$$y_m=f_1(x_1,x_2, \cdots \cdots, x_n)$$


Then, the Jacobian matrix $J(\text{X})=[\frac{\partial y_i}{\partial x_j}]_{i=1,2,\cdots, m; j=1,2,\cdots, n}$ is defined as:

$$
J(\text{X}) = \begin{pmatrix} 
\frac{\partial y_1}{\partial x_1} & \frac{\partial y_1}{\partial x_2} & \dots & \frac{\partial y_1}{\partial x_n} \\
\frac{\partial y_2}{\partial x_1} & \frac{\partial y_2}{\partial x_2} & \dots & \frac{\partial y_2}{\partial x_n} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial y_m}{\partial x_1} & \frac{\partial y_m}{\partial x_2} & \dots & \frac{\partial y_m}{\partial x_n} 
\end{pmatrix}_{m \times n}
$$

Each entry $\frac{\partial y_i}{\partial x_j}$ is the partial derivative of the $i$-th output component $y_i$ with respect to the $j$-th input component $x_j$.

#### **Example**:

For a vector function $g(\text{X}) = \begin{pmatrix} x_1^2 + x_2 \\ x_1 x_2 \end{pmatrix}$, the Jacobian matrix is:

$$
J(\text{X}) = \begin{pmatrix}
2x_1 & 1 \\
x_2 & x_1
\end{pmatrix}
$$

---


### **Summary**

- The **gradient** is a vector $(n \times 1)$ of first-order partial derivatives of a scalar-valued function $f(x_1,x_2\cdots\cdots,x_n)$.

- The **Hessian** is a square matrix $(n \times n)$ of second-order partial derivatives of a scalar-valued function $f(x_1,x_2\cdots\cdots,x_n)$.

- The **Jacobian** is a $m \times n$ matrix of first-order partial derivatives for vector-valued function $\text{Y}=f(\text{X})$.

---

### Gradients, Jacobians, and Hessians Example (2x2 Matrix)

### 1. Gradient

The **gradient** of a scalar function $ f(x, y) $ is a vector that contains the partial derivatives of the function with respect to each variable.

Consider the function:

$$
f(x, y) = x^2 + 3xy + y^2
$$

The **gradient** $ \nabla f(x, y) $ is given by:

$$
\nabla f(x, y) = \begin{bmatrix} \frac{\partial f}{\partial x} \\ \frac{\partial f}{\partial y} \end{bmatrix}
$$

Now, we compute the partial derivatives:

1. **Partial derivative with respect to $ x $:**

$$
\frac{\partial f}{\partial x} = 2x + 3y
$$

2. **Partial derivative with respect to $ y $:**

$$
\frac{\partial f}{\partial y} = 3x + 2y
$$

Thus, the gradient is:

$$
\nabla f(x, y) = \begin{bmatrix} 2x + 3y \\ 3x + 2y \end{bmatrix}
$$

If we want to evaluate the gradient at the point $ (x, y) = (1, 2) $:

$$
\nabla f(1, 2) = \begin{bmatrix} 2(1) + 3(2) \\ 3(1) + 2(2) \end{bmatrix} = \begin{bmatrix} 2 + 6 \\ 3 + 4 \end{bmatrix} = \begin{bmatrix} 8 \\ 7 \end{bmatrix}
$$

---

### 2. Jacobian

The **Jacobian matrix** is the matrix of all first-order partial derivatives of a vector-valued function. If you have a vector function $ \mathbf{F}(x, y) = \begin{bmatrix} f_1(x, y) \\ f_2(x, y) \end{bmatrix} $, the Jacobian matrix $ \mathbf{J}(x, y) $ is defined as:

$$
\mathbf{J}(x, y) = \begin{bmatrix} \frac{\partial f_1}{\partial x} & \frac{\partial f_1}{\partial y} \\ \frac{\partial f_2}{\partial x} & \frac{\partial f_2}{\partial y} \end{bmatrix}
$$

Consider the vector function:

$$
\mathbf{F}(x, y) = \begin{bmatrix} f_1(x, y) \\ f_2(x, y) \end{bmatrix} = \begin{bmatrix} x^2 + 3xy \\ 2xy + y^2 \end{bmatrix}
$$

Now, we compute the partial derivatives:

1. For $ f_1(x, y) = x^2 + 3xy $:

   -$$\frac{\partial f_1}{\partial x} = 2x + 3y$$
   -$$\frac{\partial f_1}{\partial y} = 3x$$

2. For $ f_2(x, y) = 2xy + y^2 $:

   -$$\frac{\partial f_2}{\partial x} = 2y$$
   -$$\frac{\partial f_2}{\partial y} = 2x + 2y$$

Thus, the **Jacobian matrix** is:

$$
\mathbf{J}(x, y) = \begin{bmatrix} 2x + 3y & 3x \\ 2y & 2x + 2y \end{bmatrix}
$$

If we evaluate the Jacobian at $ (x, y) = (1, 2) $:

$$
\mathbf{J}(1, 2) = \begin{bmatrix} 2(1) + 3(2) & 3(1) \\ 2(2) & 2(1) + 2(2) \end{bmatrix} = \begin{bmatrix} 2 + 6 & 3 \\ 4 & 2 + 4 \end{bmatrix} = \begin{bmatrix} 8 & 3 \\ 4 & 6 \end{bmatrix}
$$

---

### 3. Hessian

The **Hessian matrix** is the square matrix of second-order mixed partial derivatives of a scalar function. If $ f(x, y) $ is a scalar function, the Hessian matrix $ H(x, y) $ is defined as:

$$
H(x, y) = \begin{bmatrix} \frac{\partial^2 f}{\partial x^2} & \frac{\partial^2 f}{\partial x \partial y} \\ \frac{\partial^2 f}{\partial y \partial x} & \frac{\partial^2 f}{\partial y^2} \end{bmatrix}
$$

For the function:

$$
f(x, y) = x^2 + 3xy + y^2
$$

We compute the second-order partial derivatives:

1. **Second derivative with respect to $ x $:**

$$
\frac{\partial^2 f}{\partial x^2} = 2
$$

2. **Mixed partial derivative $ \frac{\partial^2 f}{\partial x \partial y} $:**

$$
\frac{\partial^2 f}{\partial x \partial y} = 3
$$

3. **Mixed partial derivative $ \frac{\partial^2 f}{\partial y \partial x} $:**

$$
\frac{\partial^2 f}{\partial y \partial x} = 3
$$

4. **Second derivative with respect to $ y $:**

$$
\frac{\partial^2 f}{\partial y^2} = 2
$$

Thus, the **Hessian matrix** is:

$$
H(x, y) = \begin{bmatrix} 2 & 3 \\ 3 & 2 \end{bmatrix}
$$

---

### Conclusion

We computed the **gradient**, **Jacobian**, and **Hessian** for the scalar and vector-valued functions.

- The **gradient** for $ f(x, y) = x^2 + 3xy + y^2 $ at $ (x, y) = (1, 2) $ is:

 $$
  \nabla f(1, 2) = \begin{bmatrix} 8 \\ 7 \end{bmatrix}
 $$

- The **Jacobian** for $ \mathbf{F}(x, y) = \begin{bmatrix} x^2 + 3xy \\ 2xy + y^2 \end{bmatrix} $ at $ (x, y) = (1, 2) $ is:

 $$
  \mathbf{J}(1, 2) = \begin{bmatrix} 8 & 3 \\ 4 & 6 \end{bmatrix}
 $$

- The **Hessian** for $ f(x, y) = x^2 + 3xy + y^2 $ is:

 $$
  H(x, y) = \begin{bmatrix} 2 & 3 \\ 3 & 2 \end{bmatrix}
 $$

---

# First and Second-Order Approximation (Newton's Method)

---

### **First-Order Approximation**

A first-order approximation of a function $f(\mathbf({\text{x}))$ at a point $\mathbf{x_0}$ uses the **tangent line** of the function at $\mathbf{x_0}$. This is essentially the **linear approximation** of the function, given by the first derivative (gradient in multivariable functions).

For a scalar function $f: \mathbb{R}^n \to \mathbb{R}$, the first-order approximation around a point $\mathbf{x_0}$ is:

$$
f(\mathbf({\text{x})) \approx f(\mathbf{x_0}) + \nabla f(\mathbf{x_0})^T (\mathbf({\text{x}) - \mathbf{x_0})
$$

Where:
- $f(\mathbf{x_0})$ is the value of the function at $\mathbf{x_0}$.
- $\nabla f(\mathbf{x_0})$ is the gradient (first derivative) of the function at $\mathbf{x_0}$.
- $(\mathbf({\text{x}) - \mathbf{x_0})$ is the displacement vector from $\mathbf{x_0}$ to $\mathbf({\text{x})$.

---

#### **Example**:

Consider the simple polynomial function $f(x) = x^2 + 2x + 1$. 

To find the first-order approximation around $\mathbf{x_0} = 1$:

1. Compute $f(\mathbf{x_0})$:

$$
f(1) = 1^2 + 2(1) + 1 = 4
$$

2. Compute the gradient (first derivative):

$$
\nabla f(x) = 2x + 2 \quad \text{so} \quad \nabla f(1) = 2(1) + 2 = 4
$$

3. The first-order approximation of $f(x)$ around $\mathbf{x_0} = 1$ is:

$$
f(x) \approx 4 + 4(x - 1)
$$

Thus, the first-order approximation is a linear function: $f(x) \approx 4 + 4x - 4 = 4x$.

---

### **Second-Order Approximation (Newton's Method)**

The second-order approximation (or **Newton's method**) provides a better approximation by using both the first and second derivatives of the function. It involves the **Hessian matrix** (second-order partial derivatives in multivariable functions).

For a scalar function $f: \mathbb{R}^n \to \mathbb{R}$, the second-order approximation around a point $\mathbf{x_0}$ is:

$$
f(\mathbf({\text{x})) \approx f(\mathbf{x_0}) + \nabla f(\mathbf{x_0})^T (\mathbf({\text{x}) - \mathbf{x_0}) + \frac{1}{2} (\mathbf({\text{x}) - \mathbf{x_0})^T H(\mathbf{x_0}) (\mathbf({\text{x}) - \mathbf{x_0})
$$

Where:
- $H(\mathbf{x_0})$ is the Hessian matrix (matrix of second derivatives) evaluated at $\mathbf{x_0}$.

---

#### **Example**:

Consider the same polynomial function $f(x) = x^2 + 2x + 1$.

To find the second-order approximation around $\mathbf{x_0} = 1$:

1. Compute $f(\mathbf{x_0})$ (as before):

$$
f(1) = 4
$$

2. Compute the gradient (first derivative) $\nabla f(x)$ and evaluate at $x_0 = 1$:

$$
\nabla f(x) = 2x + 2 \quad \text{so} \quad \nabla f(1) = 4
$$

3. Compute the second derivative (Hessian) and evaluate at $x_0 = 1$:

$$
H(x) = 2 \quad \text{so} \quad H(1) = 2
$$

4. The second-order approximation of $f(x)$ around $\mathbf{x_0} = 1$ is:

$$
f(x) \approx 4 + 4(x - 1) + \frac{1}{2} (x - 1)^2 \cdot 2
$$

Simplifying:

$$
f(x) \approx 4 + 4(x - 1) + (x - 1)^2
$$

This is the second-order approximation of the function.

---



# Second-Order Approximation with Newton's Method: Examples

---

### **Example 1: Second-Order Approximation for a Two-Variable Function**

Consider the function:

$$
f(x_1, x_2) = x_1^2 + x_2^2 + 2x_1x_2 + 3x_1 + 4x_2
$$

We will compute the **second-order approximation** around the point $\mathbf{x_0} = (1, 1)$.

#### **Step 1: Compute the Gradient**

The gradient of $f(x_1, x_2)$ is the vector of partial derivatives with respect to $x_1$ and $x_2$:

$$
\nabla f(x_1, x_2) = \begin{pmatrix} \frac{\partial f}{\partial x_1} \\ \frac{\partial f}{\partial x_2} \end{pmatrix}
$$

Compute each partial derivative:

$$
\frac{\partial f}{\partial x_1} = 2x_1 + 2x_2 + 3, \quad \frac{\partial f}{\partial x_2} = 2x_2 + 2x_1 + 4
$$

Thus, the gradient is:

$$
\nabla f(x_1, x_2) = \begin{pmatrix} 2x_1 + 2x_2 + 3 \\ 2x_2 + 2x_1 + 4 \end{pmatrix}
$$

Evaluate the gradient at $\mathbf{x_0} = (1, 1)$:

$$
\nabla f(1, 1) = \begin{pmatrix} 2(1) + 2(1) + 3 \\ 2(1) + 2(1) + 4 \end{pmatrix} = \begin{pmatrix} 7 \\ 8 \end{pmatrix}
$$

#### **Step 2: Compute the Hessian Matrix**

The Hessian matrix $H(x_1, x_2)$ is the matrix of second-order partial derivatives:

$$
H(x_1, x_2) = \begin{pmatrix} \frac{\partial^2 f}{\partial x_1^2} & \frac{\partial^2 f}{\partial x_1 \partial x_2} \\ \frac{\partial^2 f}{\partial x_2 \partial x_1} & \frac{\partial^2 f}{\partial x_2^2} \end{pmatrix}
$$

Compute each second-order partial derivative:

$$
\frac{\partial^2 f}{\partial x_1^2} = 2, \quad \frac{\partial^2 f}{\partial x_1 \partial x_2} = 2, \quad \frac{\partial^2 f}{\partial x_2^2} = 2
$$

Thus, the Hessian matrix is:

$$
H(x_1, x_2) = \begin{pmatrix} 2 & 2 \\ 2 & 2 \end{pmatrix}
$$

Evaluate the Hessian matrix at $\mathbf{x_0} = (1, 1)$ (it is constant, so the value remains the same):

$$
H(1, 1) = \begin{pmatrix} 2 & 2 \\ 2 & 2 \end{pmatrix}
$$

#### **Step 3: Second-Order Approximation**

The second-order approximation of the function $f(x_1, x_2)$ around $\mathbf{x_0} = (1, 1)$ is given by:

$$
f(x_1, x_2) \approx f(1, 1) + \nabla f(1, 1)^T \begin{pmatrix} x_1 - 1 \\ x_2 - 1 \end{pmatrix} + \frac{1}{2} \begin{pmatrix} x_1 - 1 & x_2 - 1 \end{pmatrix} H(1, 1) \begin{pmatrix} x_1 - 1 \\ x_2 - 1 \end{pmatrix}
$$

First, compute $f(1, 1)$:

$$
f(1, 1) = 1^2 + 1^2 + 2(1)(1) + 3(1) + 4(1) = 11
$$

The second-order approximation becomes:

$$
f(x_1, x_2) \approx 11 + \begin{pmatrix} 7 & 8 \end{pmatrix} \begin{pmatrix} x_1 - 1 \\ x_2 - 1 \end{pmatrix} + \frac{1}{2} \begin{pmatrix} x_1 - 1 & x_2 - 1 \end{pmatrix} \begin{pmatrix} 2 & 2 \\ 2 & 2 \end{pmatrix} \begin{pmatrix} x_1 - 1 \\ x_2 - 1 \end{pmatrix}
$$

Simplifying:

$$
f(x_1, x_2) \approx 11 + 7(x_1 - 1) + 8(x_2 - 1) + (x_1 - 1)^2 + 2(x_1 - 1)(x_2 - 1) + (x_2 - 1)^2
$$

---

### **Example 2: Second-Order Approximation for a Three-Variable Function**

Consider the function:

$$
f(x_1, x_2, x_3) = x_1^2 + x_2^2 + x_3^2 + 2x_1x_2 + 3x_2x_3 + 4x_1 + 5x_2 + 6x_3
$$

We will compute the **second-order approximation** around the point $\mathbf{x_0} = (1, 1, 1)$.

#### **Step 1: Compute the Gradient**

The gradient of $f(x_1, x_2, x_3)$ is the vector of partial derivatives with respect to $x_1$, $x_2$, and $x_3$:

$$
\nabla f(x_1, x_2, x_3) = \begin{pmatrix} \frac{\partial f}{\partial x_1} \\ \frac{\partial f}{\partial x_2} \\ \frac{\partial f}{\partial x_3} \end{pmatrix}
$$

Compute each partial derivative:

$$
\frac{\partial f}{\partial x_1} = 2x_1 + 2x_2 + 4, \quad \frac{\partial f}{\partial x_2} = 2x_2 + 2x_1 + 3x_3 + 5, \quad \frac{\partial f}{\partial x_3} = 2x_3 + 3x_2 + 6
$$

Thus, the gradient is:

$$
\nabla f(x_1, x_2, x_3) = \begin{pmatrix} 2x_1 + 2x_2 + 4 \\ 2x_2 + 2x_1 + 3x_3 + 5 \\ 2x_3 + 3x_2 + 6 \end{pmatrix}
$$

Evaluate the gradient at $\mathbf{x_0} = (1, 1, 1)$:

$$
\nabla f(1, 1, 1) = \begin{pmatrix} 2(1) + 2(1) + 4 \\ 2(1) + 2(1) + 3(1) + 5 \\ 2(1) + 3(1) + 6 \end{pmatrix} = \begin{pmatrix} 8 \\ 12 \\ 11 \end{pmatrix}
$$

#### **Step 2: Compute the Hessian Matrix**

The Hessian matrix $H(x_1, x_2, x_3)$ is the matrix of second-order partial derivatives:

$$
H(x_1, x_2, x_3) = \begin{pmatrix} \frac{\partial^2 f}{\partial x_1^2} & \frac{\partial^2 f}{\partial x_1 \partial x_2} & \frac{\partial^2 f}{\partial x_1 \partial x_3} \\ \frac{\partial^2 f}{\partial x_2 \partial x_1} & \frac{\partial^2 f}{\partial x_2^2} & \frac{\partial^2 f}{\partial x_2 \partial x_3} \\ \frac{\partial^2 f}{\partial x_3 \partial x_1} & \frac{\partial^2 f}{\partial x_3 \partial x_2} & \frac{\partial^2 f}{\partial x_3^2} \end{pmatrix}
$$

Compute each second-order partial derivative:

$$
\frac{\partial^2 f}{\partial x_1^2} = 2, \quad \frac{\partial^2 f}{\partial x_1 \partial x_2} = 2, \quad \frac{\partial^2 f}{\partial x_1 \partial x_3} = 0
$$

$$
\frac{\partial^2 f}{\partial x_2^2} = 2, \quad \frac{\partial^2 f}{\partial x_2 \partial x_3} = 3, \quad \frac{\partial^2 f}{\partial x_3^2} = 2
$$

Thus, the Hessian matrix is:

$$
H(x_1, x_2, x_3) = \begin{pmatrix} 2 & 2 & 0 \\ 2 & 2 & 3 \\ 0 & 3 & 2 \end{pmatrix}
$$

Evaluate the Hessian matrix at $\mathbf{x_0} = (1, 1, 1)$ (it is constant, so the value remains the same):

$$
H(1, 1, 1) = \begin{pmatrix} 2 & 2 & 0 \\ 2 & 2 & 3 \\ 0 & 3 & 2 \end{pmatrix}
$$

#### **Step 3: Second-Order Approximation**

The second-order approximation of the function $f(x_1, x_2, x_3)$ around $\mathbf{x_0} = (1, 1, 1)$ is given by:

$$
f(x_1, x_2, x_3) \approx f(1, 1, 1) + \nabla f(1, 1, 1)^T \begin{pmatrix} x_1 - 1 \\ x_2 - 1 \\ x_3 - 1 \end{pmatrix} + \frac{1}{2} \begin{pmatrix} x_1 - 1 & x_2 - 1 & x_3 - 1 \end{pmatrix} H(1, 1, 1) \begin{pmatrix} x_1 - 1 \\ x_2 - 1 \\ x_3 - 1 \end{pmatrix}
$$

Simplifying as in the previous example leads to the second-order approximation expression.

---