# 1. What a Partial Derivative Really Is

A **partial derivative** measures the rate of change of a multivariable function **along a single coordinate direction**, while **all other variables are held constant**.

For a function

$$
f:\mathbb{R}^n \rightarrow \mathbb{R},
$$

the partial derivative with respect to $$x_i$$ at a point $$a$$ answers:

**“If I move infinitesimally only along the $$x_i$$-axis, what is the local slope of $$f$$?”**

It is a **coordinate-restricted derivative**, not a complete description of change.

---

# 2. Formal Definition (Axis-Aligned Limit)

The partial derivative of $$f$$ at $$a$$ with respect to $$x_i$$ is defined as

$$
\frac{\partial f}{\partial x_i}(a)
=
\lim_{h \to 0}
\frac{f(a + h e_i) - f(a)}{h},
$$

where $$e_i$$ is the unit vector in the $$i$$-th coordinate direction.

Key implications:

- only one coordinate is perturbed  
- all other coordinates are frozen  
- the derivative probes **one specific direction**  

---

# 3. Geometric Interpretation

Geometrically, a partial derivative is:

- the slope of a tangent line obtained by **slicing** the surface  
- the directional derivative along a **coordinate axis**  
- one tangent direction among infinitely many  

For a surface $$z = f(x,y)$$:

- $$\partial f/\partial x$$ → slope in the $$x$$-direction (parallel to the $$xz$$-plane)  
- $$\partial f/\partial y$$ → slope in the $$y$$-direction (parallel to the $$yz$$-plane)  

Partial derivatives describe how the surface tilts **along coordinate grid lines**.

---

# 4. Partial Derivative vs. Directional Derivative

A partial derivative is a **special case** of the directional derivative:

$$
\frac{\partial f}{\partial x_i}(a) = D_{e_i} f(a).
$$

Hierarchy:

- directional derivative → any direction  
- partial derivative → coordinate-axis direction only  

Thus:

- partial derivatives are **directionally incomplete**  
- they depend on the chosen coordinate system  

---

# 5. Partial Derivatives vs. Total Derivative (Critical Distinction)

| Concept | What varies | What it captures |
|---|---|---|
| Partial derivative | One variable | Axis-local slope |
| Total derivative | All variables | True local change |

A partial derivative **ignores dependencies** among variables.

This is why $$\partial f/\partial x$$ is **not** the true rate of change if $$y = y(x)$$.

Total derivatives are required for **constrained or coupled motion**.

---

# 6. Partial Derivatives Do Not Guarantee Differentiability

A crucial fact:

> A function may have **all partial derivatives** at a point and still **not be differentiable** there.

Why?

- partial derivatives test slopes only along axes  
- differentiability requires a **single linear approximation** valid in **all directions**  

A sufficient condition:

- partial derivatives exist in a neighborhood  
- they are continuous (class $$C^1$$)  

---

# 7. Partial Derivatives as Building Blocks

Partial derivatives are **atomic components**, not complete geometric objects.

They assemble into higher-level structures:

## 7.1 Gradient (scalar output)

$$
\nabla f
=
\left(
\frac{\partial f}{\partial x_1},
\ldots,
\frac{\partial f}{\partial x_n}
\right).
$$

## 7.2 Jacobian (vector output)

$$
J_f
=
\left[
\frac{\partial f_i}{\partial x_j}
\right].
$$

## 7.3 Hessian (second order)

$$
H_f
=
\left[
\frac{\partial^2 f}{\partial x_i \partial x_j}
\right].
$$

Partial derivatives are **coordinates**; gradients and Jacobians encode **geometry**.

---

# 8. Higher-Order and Mixed Partial Derivatives

Second-order partial derivatives capture curvature along coordinate directions.

Mixed partials:

$$
\frac{\partial^2 f}{\partial x_i \partial x_j}.
$$

Under mild smoothness conditions (Clairaut–Schwarz theorem):

$$
\frac{\partial^2 f}{\partial x_i \partial x_j}
=
\frac{\partial^2 f}{\partial x_j \partial x_i}.
$$

This symmetry:

- enables Hessian analysis  
- underpins convexity tests  
- is essential for second-order optimization  

---

# 9. Partial Integration (Inverse Perspective)

Integrating a partial derivative recovers the function **up to an unknown function** of the remaining variables:

$$
\int \frac{\partial f}{\partial x} \, dx
=
f(x,y,\ldots)
=
\text{known terms} + g(y,\ldots).
$$

This reflects:

- partial differentiation destroys information about other variables  
- reconstruction requires compatibility conditions (e.g., conservative fields)  

---

# 10. Partial Derivatives in Optimization and AI

## 10.1 Coordinate Sensitivity

$$
\frac{\partial L}{\partial \theta_i}
$$

measures sensitivity of the loss to parameter $$\theta_i$$.

Used in:

- coordinate descent  
- feature attribution  

## 10.2 Why Partial Derivatives Alone Are Insufficient

- learning does not occur along coordinate axes  
- optimization follows **gradient directions**, not axis directions  

Partial derivatives are **inputs** to gradients, not optimization paths.

---

# 11. Partial Derivatives and Coordinate Dependence

A deep insight:

> Partial derivatives depend on the chosen coordinate system.

Changing coordinates:

- alters partial derivatives  
- does **not** alter the total derivative  

This is why modern ML theory emphasizes:

- coordinate-free objects (gradients, Jacobians)  
- natural gradient methods that correct for coordinate distortion  

---

# 12. One-Sentence Expert Mental Model

A partial derivative is the slope you see when you force the world to move along one coordinate axis and pretend all other directions do not exist.

---

# 13. Final Conceptual Hierarchy (Unified)

- Partial derivative → axis-restricted slope  
- Directional derivative → slope along any direction  
- Gradient → steepest direction (scalar outputs)  
- Jacobian → full linear map (vector outputs)  
- Total derivative → fundamental object  

---

# 14. Ultimate Takeaway

- Partial derivatives are **local and coordinate-bound probes**.  
- They are **necessary but not sufficient** for understanding change.  
- True learning, geometry, and dynamics emerge only when partial derivatives are assembled into **total derivatives**.
