In [2]:
using Pkg; Pkg.activate("."); Pkg.instantiate()

[32m[1m  Activating[22m[39m project at `~/julia_ws/PACMAN_ADworkshop`


[32m[1mPrecompiling[22m[39m project...


[32m  ✓ [39m[90mMacroTools[39m


[32m  ✓ [39m[90mCommonSubexpressions[39m


[32m  ✓ [39mForwardDiff
  3 dependencies successfully precompiled in 9 seconds. 16 already precompiled.


In [3]:
include("utils.jl");

Chain rules (Univariate)
------------

Blablabla

Dual Numbers
------------

The idea of Dual number is very simple and is so what like the imaginary numbers. With imaginary numbers, we have

$$
i^2 = -1. 
$$

But with dual numbers, we have

$$
\epsilon^2 = 0 .
$$

To understand better, we first consider a univariate function $f : \mathbb{R} \rightarrow \mathbb{R}$. If $f$ is analytic, then:

$$
f(x + \epsilon) = f(x) + f'(x) \epsilon + O(\epsilon^2).
$$

But $\epsilon^2 = 0$ ! This means a forward evalaution of $f$ gives us also the derivative at the order $\epsilon$ term.

In Julia, the package ``ForwardDiff.jl`` has this implemented for us.

Ref: https://github.com/JuliaDiff/ForwardDiff.jl

In [22]:
using ForwardDiff

# trial functino
function f(x)
    y = 1 / (1 + x^2)
    z = sin(y)
    return z
end

# its derivative calculated by hand
function df(x)
    return -2x * cos(1 / (1 + x^2)) / (1 + x^2)^2
end

@assert ForwardDiff.derivative(f, 1.0) ≈ df(1.0)

Likewise one can use this package to work with vector functions...

For instance gradient of $f: \mathbb{R}^N \rightarrow \mathbb{R}$

In [18]:
# trial function
function fvec(x)
    y = 1 ./ (1 .+ x.^2)
    z = sin.(y)
    return sum(z)
end

# derivative by hand
function dfvec(x)
    return -2 .* x .* cos.(1 ./ (1 .+ x.^2)) ./ (1 .+ x.^2).^2
end

N = 3

# generate a random vector of size N
X = randn(N)

@assert ForwardDiff.gradient(fvec, X) ≈ dfvec(X)


And also Jacobian of a function $f: \mathbb{R}^N \rightarrow \mathbb{R}^M$

In [21]:
# trial function
function fmat(x)
    return [4 * x[1] ^ 2 * x[2], x[1] - x[2]^2]
end

# derivative by hand
function dfmat(x)
    return [8*x[1]*x[2] 4*x[1]^2 ; 1 -2*x[2]]
end

N = M = 2
X = randn(2)
@assert ForwardDiff.jacobian(fmat, X) ≈ dfmat(X)


Univariate forward and backward mode
------------


In this section we demonstrate the difference between two different way of differentiating "chain" of function.

We first consider a chain of functions:

$$
f(g(h(x))) = f(g(y)) = f(z) = F,
$$

with 

$$
\begin{cases}
h(x) = y \\
 g(y) = z \\ 
 f(z) = F
\end{cases}.
$$

Then the chain rule gives

$$
\frac{\partial F}{\partial x} = \frac{\partial F}{\partial z} \frac{\partial z}{\partial y} \frac{\partial y}{\partial x}
$$


- ***Forward mode*** means that we compute the derivative as the <b>SAME</b> direction as evaluation, i.e.

$$
\frac{\partial y}{\partial x} \rightarrow \frac{\partial z}{\partial x} \rightarrow \frac{\partial F}{\partial x}
$$

where we compute $\frac{\partial z}{\partial y}$ and $\frac{\partial F}{\partial z}$ in the first and second arrow respectively. Or, equivalently, 

$$
\frac{\partial F}{\partial x} = \frac{\partial F}{\partial z} \left(\frac{\partial z}{\partial y} \frac{\partial y}{\partial x}\right)
$$


- ***Backward mode*** means that we compute the derivative as the <b>REVERSE</b> direction as evaluation, i.e.

$$
\frac{\partial F}{\partial z} \rightarrow \frac{\partial F}{\partial y} \rightarrow \frac{\partial F}{\partial x}
$$

where we compute $\frac{\partial z}{\partial y}$ and $\frac{\partial y}{\partial x}$ in the first and second arrow respectively. Or, equivalently, 

$$
\frac{\partial F}{\partial x} = \left(\frac{\partial F}{\partial z} \frac{\partial z}{\partial y}\right) \frac{\partial y}{\partial x}
$$


In the univariate case, there are not much computational differences between two different modes but we will see how it makes a difference when we have multivariate functions...