## 2.4.1. Derivatives and Differentiation

#### Problem Statement

In science, engineering and beyond, we are often faced with functions that describe how one quantity depends on another—distance as a function of time, concentration as a function of volume, cost as a function of units produced, and so on.  The **derivative** of a function provides a precise way to measure the **instantaneous rate of change** of that quantity: it tells us, at any given point, how fast the output is increasing or decreasing with respect to its input.

For example, if $s(t)$ denotes the position of a car at time $t$, then the derivative $s'(t)$ is the car’s **velocity**.  If $P(t)$ models the size of a bacterial population, then $P'(t)$ gives the **growth rate**, which might accelerate or slow over time depending on nutrients or competition.  In economics, if $R(q)$ is the revenue from selling $q$ units of a product, then the derivative $R'(q)$ is the **marginal revenue**, the additional income earned by selling one more unit.

Derivatives also appear in many less obvious contexts:

- **Chemical kinetics**: if $C(v)$ describes the concentration of a reactant as a function of volume $v$, then $dC/dv$ measures how dilution changes concentration.  
- **Computer networks**: if $T(t)$ is the cumulative data transferred by time $t$, then $T'(t)$ is the **throughput** or instantaneous data rate.  
- **Geometry and graphics**: the slope of a curve $y=f(x)$ at a point, given by $f'(x)$, defines the direction of its tangent line and underlies algorithms for rendering smooth shapes.

Because so many practical problems reduce to “how fast is this changing?”, mastering the basic rules of differentiation—the constant, power, exponential, logarithm, sum, product and quotient rules, and so on—allows us to turn a wide variety of real‑world questions into straightforward mechanical calculations.

Put simply, a *derivative* is the rate of change in a function with respect to changes in its arguments. Derivatives can tell us how rapidly a loss function would increase or decrease were we to *increase* or *decrease* each parameter by an infinitesimally small amount. 

Formally, for functions $ f : \mathbb{R} \rightarrow \mathbb{R} $, that map from scalars to scalars, the *derivative* of $ f $ at a point $ x $ is defined as

$$
f'(x) = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h}.                
\tag{2.4.1}
$$




This term on the right hand side is called a *limit* and it tells us what happens to the value of an expression as a specified variable approaches a particular value. This limit tells us what the ratio between a perturbation *h* and the change in the function value $f(x + h) - f(x) $ converges to as we shrink its size to zero.

When $f'(x)$ exists, $f$ is said to be differentiable at $x$; and when $f'(x)$ exists for all $x$ on a set, e.g., the interval $[a, b]$, we say that $f$ is differentiable on this set. Not all functions are differentiable, including many that we wish to optimize, such as accuracy and the area under the receiving operating characteristic (AUC). However, because computing the derivative of the loss is a crucial step in nearly all algorithms for training deep neural networks, we often optimize a differentiable *surrogate* instead.

We can interpret the derivative $f'(x)$ as the instantaneous rate of change of $f(x)$ with respect to $x$. Let’s develop some intuition with an example. Define $u = f(x) = 3x^2 - 4x$.

In [1]:
def f(x):
    return 3 * x ** 2 - 4 * x

Setting $x = 1$, we see that $\frac{f(x+h) - f(x)}{h}$ approaches $2$ as $h$ approaches $0$. While this experiment lacks the rigor of a mathematical proof, we can quickly see that indeed $f'(1) = 2$.

In [2]:
import numpy as np
for h in 10.0**np.arange(-1, -6, -1):
    print(f'h={h:.5f}, numerical limit={(f(1+h)-f(1))/h:.5f}')

h=0.10000, numerical limit=2.30000
h=0.01000, numerical limit=2.03000
h=0.00100, numerical limit=2.00300
h=0.00010, numerical limit=2.00030
h=0.00001, numerical limit=2.00003


There are several equivalent notational conventions for derivatives. Given \( y = f(x) \), the following expressions are equivalent:
$$
f'(x) = y' = \frac{dy}{dx} = \frac{df}{dx} = \frac{d}{dx} f(x) = Df(x) = D_x f(x),
\tag{2.4.2}
$$
where the symbols $\frac{d}{dx}$ and  $D$  are *differentiation operators*.  
Below, we present the derivatives of some common functions:



#### 1. Constant Rule
$$
\frac{d}{dx} C = 0 \quad \text{for any constant } C, \tag{2.4.3} 
$$ 
**Prove:** 
By definition,
$$
\frac{d}{dx} C = \lim_{h\to0}\frac{C - C}{h}
= \lim_{h\to0}\frac{0}{h}
= 0.
$$
Since the numerator is identically zero, the limit is zero.
**Example.**  
Let $f(x)=7$. Then $f'(x)=0.$

In [1]:
import sympy as sp

x = sp.symbols('x')
f = 7
sp.diff(f, x)
# → 0


0

#### 2. Power Rule
$$
\frac{d}{dx} x^n = nx^{n-1} \quad \text{for } n \neq 0 \tag{2.4.4}
$$
**Prove**:
**a)** $n$ a positive integer

Use the binomial expansion:
$$
(x+h)^n
= \sum_{k=0}^n \binom nk x^{\,n-k}h^k
= x^n + n x^{\,n-1}h + \sum_{k=2}^n\binom nk x^{\,n-k}h^k.
$$
Then
$$
\frac{(x+h)^n - x^n}{h}
= n\,x^{\,n-1} + O(h),
$$
so taking $h\to0$ gives
$$
\frac{d}{dx}x^n = n\,x^{\,n-1}.
$$

**b)** $n$ a negative integer

Write $n=-m$ with $m>0$. Then $x^n = 1/x^m$.  By the quotient (or chain) rule,
$$
\frac{d}{dx}x^{-m}
= -\,x^{-m-1}\,(m\,x^{\,m-1})
= -m\,x^{-(m+1)}
= n\,x^{\,n-1}.
$$
**Example.**  
Let $f(x)=x^5$. Then $f'(x)=5\,x^4$

In [2]:
import sympy as sp

x = sp.symbols('x')
f = x**5
sp.diff(f, x)
# → 5*x**4


5*x**4

#### 3. Exponential Rule
$$
\frac{d}{dx} e^x = e^x \tag{2.4.5}
$$
**Prove:**
Via power series

$$
e^x = \sum_{k=0}^\infty \frac{x^k}{k!}
\;\implies\;
\frac{d}{dx}e^x
= \sum_{k=1}^\infty \frac{k x^{k-1}}{k!}
= \sum_{j=0}^\infty \frac{x^j}{j!}
= e^x.
$$

Via the limit definition of \(e\)

$$
\frac{d}{dx}e^x
=\lim_{h\to0}\frac{e^{x+h}-e^x}{h}
=e^x\lim_{h\to0}\frac{e^h-1}{h}
=e^x\cdot1
= e^x.
$$

**Example.**  
Let $f(x)=e^x$. Then $f'(x)=e^x.$

In [None]:
import sympy as sp

x = sp.symbols('x')
f = sp.exp(x)
sp.diff(f, x)
# → exp(x)


#### 4. Logarithm Rule
$$
\frac{d}{dx} \ln x = x^{-1} \tag{2.4.6}
$$
**Prove**:
Since $y = \ln x$ is the inverse of $x = e^y$,
$$
1 = \frac{d}{dx}(e^y)
  = e^y\frac{dy}{dx}
  = x\frac{dy}{dx}
\quad\Longrightarrow\quad
\frac{dy}{dx} = \frac1x.
$$
Hence
$$
\frac{d}{dx}\ln x = \frac1x.
$$
**Example.**  
Let $f(x)=\ln(x)$. Then $f'(x)=\frac1x.$

In [3]:
import sympy as sp

x = sp.symbols('x')
f = sp.log(x)
sp.diff(f, x)
# → 1/x


1/x


Functions composed from differentiable functions are often themselves differentiable.  
The following rules come in handy for working with compositions of any differentiable functions $ f $ and $ g $, and constant $ C $:







#### 1. Constant Multiple Rule
$$
\frac{d}{dx} [Cf(x)] = C \frac{d}{dx} f(x) \quad \tag{2.4.7}
$$
**Prove:**
For any constant $C$:
$$
\frac{d}{dx}\bigl[C\,f(x)\bigr]
=\lim_{h\to0}\frac{C\,f(x+h)-C\,f(x)}{h}
=\lim_{h\to0}\frac{C\bigl[f(x+h)-f(x)\bigr]}{h}
=C\;\lim_{h\to0}\frac{f(x+h)-f(x)}{h}
=C\,f'(x).
$$
**Example.**  
Let $f(x)=5\,x^3$.  Then by the power rule,  
$$
f'(x)=5\cdot3\,x^2 = 15\,x^2.
$$

In [4]:
import sympy as sp

x = sp.symbols('x')
f = 5*x**3
sp.diff(f, x)
# → 15*x**2


15*x**2

#### 2. Sum Rule
$$
\frac{d}{dx} [f(x) + g(x)] = \frac{d}{dx} f(x) + \frac{d}{dx} g(x) \quad  \tag{2.4.8}
$$
**Prove:**
For any two functions $f,g$:
$$
\frac{d}{dx}\bigl[f(x)+g(x)\bigr]
=\lim_{h\to0}\frac{[f(x+h)+g(x+h)]-[f(x)+g(x)]}{h}
=\lim_{h\to0}\frac{f(x+h)-f(x)}{h}
  +\lim_{h\to0}\frac{g(x+h)-g(x)}{h}
=f'(x)+g'(x).
$$
**Example.**  
Let $f(x)=x^2 + 3x$.  Then $f'(x) = 2x + 3.$


In [None]:
import sympy as sp

x = sp.symbols('x')
f = x**2 + 3*x
sp.diff(f, x)
# → 2*x + 3


#### 3. Product Rule
$$
\frac{d}{dx} [f(x)g(x)] = f(x) \frac{d}{dx} g(x) + g(x) \frac{d}{dx} f(x) \quad \tag{2.4.9}
$$
**Prove:**
$$
\begin{aligned}
\frac{d}{dx}\bigl[f(x)g(x)\bigr]
&=\lim_{h\to0}\frac{f(x+h)g(x+h)-f(x)g(x)}{h}\\
&=\lim_{h\to0}\frac{f(x+h)g(x+h)-f(x)g(x+h)+f(x)g(x+h)-f(x)g(x)}{h}\\
&=\lim_{h\to0}\frac{\bigl[f(x+h)-f(x)\bigr]\,g(x+h)}{h}
  +\lim_{h\to0}\frac{f(x)\,\bigl[g(x+h)-g(x)\bigr]}{h}\\
&=\Bigl(\lim_{h\to0}\frac{f(x+h)-f(x)}{h}\Bigr)\,g(x)
  +f(x)\,\Bigl(\lim_{h\to0}\frac{g(x+h)-g(x)}{h}\Bigr)\\
&=f'(x)\,g(x)+f(x)\,g'(x).
\end{aligned}
$$
**Example.**  
Let $h(x)=x^2\sin x$.  Then $h'(x)=x^2\cos x + 2x\sin x.$

In [5]:
import sympy as sp

x = sp.symbols('x')
h = x**2 * sp.sin(x)
sp.diff(h, x)
# → x**2*cos(x) + 2*x*sin(x)


x**2*cos(x) + 2*x*sin(x)

#### 4. Quotient Rule
$$
\frac{d}{dx} \left( \frac{f(x)}{g(x)} \right) = \frac{g(x) \frac{d}{dx} f(x) - f(x) \frac{d}{dx} g(x)}{g^2(x)} \quad \tag{2.4.10}
$$
**Prove:**
Assume $g(x)\neq0$.  Write
$$
\frac{f(x)}{g(x)} = f(x)\,\bigl[g(x)\bigr]^{-1}.
$$
Then by the product rule and the chain rule (derivative of $u^{-1}$ is $-u^{-2}u'$):
$$
\frac{d}{dx}\frac{f}{g}
=\frac{d}{dx}\bigl(f\cdot g^{-1}\bigr)
=f'\,g^{-1}+f\;\bigl(-g^{-2}g'\bigr)
=\frac{f'}{g}-\frac{f\,g'}{g^2}
=\frac{g\,f' - f\,g'}{g^2}.
$$

**Example.**  
Let $q(x)=\frac{x^2}{1+x}$.  Then  
$$
q'(x)
= \frac{(1+x)\cdot2x - x^2\cdot1}{(1+x)^2}
= \frac{2x+2x^2 - x^2}{(1+x)^2}
= \frac{x(2+x)}{(1+x)^2}.
$$

In [6]:
import sympy as sp

x = sp.symbols('x')
q = x**2/(1+x)
sp.diff(q, x)
# → x*(x + 2)/(x + 1)**2


-x**2/(x + 1)**2 + 2*x/(x + 1)