# Forward mode automatic differentiation

---
<div style="text-align: justify;">

Automatic differentiation (AD) is a set of techniques to compute derivatives of functions in a computer program efficiently and accurately. Given a differentiable function $f: \mathbb{R}^n \to \mathbb{R}^m$, AD seeks to compute the $m\times n$ Jacobian matrix $J_{ij} = \partial_{j}f_i$. Our convention here is that each column $j$ is the gradient vector $\partial_{j} f$. If the domain is small and $f$ itself is computationally cheap, it may be efficient to compute each column separately. Denoting the computational cost of $f$ by $[f]$, this results in $\mathcal{O}(n\times[f])$ operations. Methods of this sort fall under the bracket of forward-mode AD (FWAD). When $n$ becomes large compared to $m$, it may be more efficient to consider other avenues. A popular method is backpropagation, where each row of the Jacobian (this is all $n$ derivatives corresponding to a single component of the function) is computed separately in a single pass. Methods such as these are called reverse-mode AD (RSAD), and carry a complexity of $\mathcal{O}(m\times[f])$.

We focus here on FWAD, and in particular, its implementation through dual numbers. A dual number $d$ is an ordered pair of real numbers $(a, b)$, with $a$ the real component and $b$ the dual component. Its arithmetic is defined as
\begin{align*}
&(a, b) + (c, d) = (a + c, b + d) \\
&(a, b) \times (c, d) = (ac, ad + bc)
\end{align*}
Note the similarity with complex numbers $z$, which can also represented by an ordered pair of reals, $z = (a, b)$, and have an arithmetic of 
\begin{align*}
&(a, b) + (c, d) = (a + c, b + d) \\
&(a, b) \times (c, d) = (ac - bd, ad + bc)
\end{align*}
The only difference is in the first term in the multiplication rule. By introducing the imaginary number $i$ that satisfies $i^2= - 1$, complex numbers can be represented as $z = a + bi$. A similar innovation can be applied to dual numbers, with $i$ replaced by the Grassmann number $\epsilon$. $\epsilon$ is nilpotent, with $\epsilon^2 = 0$. Showing that this is consistent with the arithmetic rules is left to the reader as an exercise. We will denote by $\mathbb{D}$ the space of dual numbers.

Dual numbers allow us to compute exact derivatives of functions. For simplicity, we focus this discussion on $n, m = 1$. We will consider the general case at the end. Let $f:\mathbb{R}\to\mathbb{R}$ be a smooth, real-valued function over the reals. We can extend it to be a dual-valued function over dual numbers by its Taylor series,
\begin{equation*}
f(a + b\epsilon) = \sum_{i=0}^{n} \frac{f^{(n)}(a)}{n!}(b\epsilon)^n \,,
\end{equation*}
where now $f: \mathbb{D}\to\mathbb{D}$. Since $\epsilon^2 = 0$, this truncates to
\begin{equation*}
f(a + b\epsilon) = f(a) + f^{(1)}(a)b\epsilon \,.
\end{equation*}
Setting $b = 1$, we find the dual part to be the derivative $f^{(1)}(a)$. By computing the extensions of functions to the duals, we are able to extract their (exact!) derivatives automatically. 

To generalise, define the $n$-dimensional dual space as $\mathbb{D}^n = \{a + b\epsilon: a, b \in \mathbb{R}^n\}$. We extend the smooth function $f: \mathbb{R}^n \to \mathbb{R}^m$ to the duals through its Taylor expansion as before, and find
\begin{equation*}
f(a + b\epsilon) = f(a) + \epsilon J(a)\cdot b \,,
\end{equation*}
where $J(a)$ is the Jacobian evaluated at $a$. The dual part of this expression is the derivative of $f$ at $a$ along the direction of $b$. Running this calculation $n$ times, each taking $b$ to be a unit vector on $\mathbb{R}^n$, then provides the full Jacobian matrix.

It remains to implement dual numbers computationally. We will do this for the $n,m=1$ case for clarity. The steps are as follows:
1) Define a dual number class that provides overrides of the standard arithmetic operators.
2) Provide a wrapper for functions defined over real numbers to dual numbers.
3) Implement the derivative operator as the dual component of a function $f$ evaluated at $d = a + \epsilon$.

For 1), we will present here the remaining arithmetic properties of dual numbers: division and exponentiation. Firstly, division. Let $a,b,c,d$ be generic real numbers, then
\begin{equation*}
\frac{a+b\epsilon}{c+d\epsilon} = \frac{(a+b\epsilon)(c-d\epsilon)}{c^2} = \frac{ac - (ad - bc)\epsilon}{c^2} \,.
\end{equation*}
Note that this is only valid for $c \neq 0$. If $c = 0$, the equation $a+b\epsilon = ud\epsilon$ is solvable for $u$ with $u=b/d$ provided $a=0$ and $d\neq 0$. Technically, $b/d + v\epsilon$ for any real number $v$ is a solution in this case: division by a dual number with 0 real part is not well-defined. In our implementation, we will restrict $v=0$. 

For exponentiation, let $c$ be a real number. We have, by Taylor expansion,
\begin{equation*}
(a + b\epsilon)^c = a^c + c b a^{c-1}\epsilon \,.
\end{equation*}
We can extend this to exponentiation by dual numbers using the trick
\begin{equation*}
(a + b\epsilon)^{c + d\epsilon} = \exp((c + d\epsilon)\log (a+b\epsilon)) \,.
\end{equation*}
By Taylor, 
\begin{equation*}
\log(a + b\epsilon) = \log(a) + \frac{b}{a}\epsilon \,,
\end{equation*}
which requires $a > 0$. Appealing to Taylor again, we have
\begin{equation*}
(a + b\epsilon)^{c+d\epsilon} = \exp\left(c\log(a) +\left(\frac{bc}{a} + d\log(a)\right)\epsilon\right) = a^c\left(1 +\left(\frac{bc}{a} + d\log(a)\right)\epsilon \right) \,.
\end{equation*}
We may extend this for $a < 0$ and consider complexified dual numbers, but this will not be done here for brevity.

</div>

## Examples

In [None]:
import numpy as np

from importnb import Notebook

# dual number implementation, autodiff function and numpy wrapper dumpy defined externally
with Notebook():
    from notebooks.mathematics.automatic_differentiation.__basic__dual_numbers import (
        Dual,
        NUMERIC,
        autodiff,
    )
    from notebooks.mathematics.automatic_differentiation.__basic__helper__dumpy import (
        dp,
    )
from theoria.validor import TestCase, Validor

In [None]:
# 1)
# let's do a nested horror: f(x) = exp(1/x) + exp(-2 * sin(x)) - x^5 + x^x
# this has: f'(x) = -exp(1/x)/x^2 - 2 * cos(x) * exp(-2 * sin(x)) - 5x^4 + x^x (1 + ln(x))

def f1(x: Dual) -> Dual:
    return dp.exp(1/x) + dp.exp(-2 * dp.sin(x)) - x ** 5 + x ** x

def f1_expected_derivative(x: NUMERIC) -> NUMERIC:
    return -np.exp(1/x) / x / x - 2 * np.cos(x) * np.exp(-2 * np.sin(x)) - 5 * x ** 4 + (x ** x) * (1 + np.log(x))

def f1_auto_derivative(x: NUMERIC) -> NUMERIC:
    return autodiff.diff(f1)(x)


In [3]:
# 2)
# since we defined all arithmetic operations, this also works for functions defined with loops
# for simplicity, let's present f(x) = 2x^4 as a while loop
# we'll also look at its 2nd and 3rd derivatives, f''(x) = 24x^2 and f'''(x) = 48x

def f2(x: Dual) -> Dual:
    i = 0
    while i < 2:
        x *= x
        i += 1
    return 2 * x

def f2_expected_derivative(x: NUMERIC) -> NUMERIC:
    return 8 * x ** 3

def f2_expected_second_derivative(x: NUMERIC) -> NUMERIC:
    return 24 * x ** 2

def f2_expected_third_derivative(x: NUMERIC) -> NUMERIC:
    return 48 * x

def f2_auto_derivative(x: NUMERIC) -> NUMERIC:
    return autodiff.diff(f2)(x)

def f2_auto_second_derivative(x: NUMERIC) -> NUMERIC:
    return autodiff.diff(autodiff.diff(f2))(x)

def f2_auto_third_derivative(x: NUMERIC) -> NUMERIC:
    return autodiff.diff(autodiff.diff(autodiff.diff(f2)))(x)


In [4]:
# generate tests

x_vals = [0.5, 1.2, 1.23, 6.2, 10.0]
expected_f1= [f1_expected_derivative(x) for x in x_vals]
expected_f2 = [f2_expected_derivative(x) for x in x_vals]
expected_f2_second = [f2_expected_second_derivative(x) for x in x_vals]
expected_f2_third = [f2_expected_third_derivative(x) for x in x_vals]

test_cases_f1 = [
    TestCase(
        input_data={"x": x},
        expected_output=expected,
        description="f(x) = exp(1/x) + exp(-2 * sin(x)) - x^5 + x^x: f'",
    )
    for x, expected in zip(x_vals, expected_f1, strict=True)
]

test_cases_f2 = [
    TestCase(
        input_data={"x": x},
        expected_output=expected,
        description="f(x) = 2 * x^4 with loops: f'",
    )
    for x, expected in zip(x_vals, expected_f2, strict=True)
]

test_cases_f2_second = [
    TestCase(
        input_data={"x": x},
        expected_output=expected,
        description="f(x) = 2 * x^4 with loops: f''",
    )
    for x, expected in zip(x_vals, expected_f2_second, strict=True)
]

test_cases_f2_third = [
    TestCase(
        input_data={"x": x},
        expected_output=expected,
        description="f(x) = 2 * x^4 with loops: f'''",
    )
    for x, expected in zip(x_vals, expected_f2_third, strict=True)
]

# 1e-8 is a reasonable tolerance
def comparison(x: float, y: float) -> bool:
    return np.isclose(x, y, atol=1e-8, rtol=1e-8)

Validor(f1_auto_derivative).add_cases(test_cases_f1).run(comparison)
Validor(f2_auto_derivative).add_cases(test_cases_f2).run(comparison)
Validor(f2_auto_second_derivative).add_cases(test_cases_f2_second).run(comparison)
Validor(f2_auto_third_derivative).add_cases(test_cases_f2_third).run(comparison)

[2026-01-04 23:16:55,462] [INFO] All 5 tests passed for f1_auto_derivative.
[2026-01-04 23:16:55,478] [INFO] All 5 tests passed for f2_auto_derivative.
[2026-01-04 23:16:55,479] [INFO] All 5 tests passed for f2_auto_second_derivative.
[2026-01-04 23:16:55,481] [INFO] All 5 tests passed for f2_auto_third_derivative.
