In [1]:
using Pkg; Pkg.activate("."); Pkg.instantiate()

[32m[1m  Activating[22m[39m project at `~/julia_ws/PACMAN_ADworkshop/Day1`


[32m[1mPrecompiling[22m[39m project...


[32m  ✓ [39mPluto
  1 dependency successfully precompiled in 74 seconds. 53 already precompiled.


Forward and backward mode in multivariate functions
------------

Consider again the chain of functions:

$$
f(g(h(x))) = f(g(y)) = f(z) = F,
$$

with 

$$
\begin{cases}
h(x) = y \\
 g(y) = z \\ 
 f(z) = F
\end{cases}.
$$

Then the chain rule gives

$$
\underbrace{\frac{\partial F}{\partial x}}_{|F| \times |x|} = \underbrace{\frac{\partial F}{\partial z}}_{|F| \times |z|} \underbrace{\frac{\partial z}{\partial y}}_{|z| \times |y|} \underbrace{\frac{\partial y}{\partial x}}_{|y| \times |x|}

$$

where the underbraces indicated the dimention of each Jacobian.

Pushforward and pullback
------------
(Ref: https://juliadiff.org/ChainRulesCore.jl/v0.9/#The-propagators:-pushforward-and-pullback)

Two fancy terms that the autodiff community adopted from Differential geometry. Understanding the terms "pushforward" and "pullback" in autodiff benefits from familiarity with their usage in differential geometry. However, even without prior knowledge, grasping their meaning in autodiff is feasible. 

Consider the above example with the chain of derivatives:

$$
\frac{\partial F}{\partial x} = \frac{\partial F}{\partial z} \frac{\partial z}{\partial y} \frac{\partial y}{\partial x}
$$


The ***pushforward*** of $g$, incoperates the knowledge of $\frac{\partial z}{\partial y}$ and takes $\frac{\partial y}{\partial z}$ to $\frac{\partial z}{\partial x}$. i.e.

$$
\frac{\partial y}{\partial z} \xrightarrow{\frac{\partial z}{\partial y}} \frac{\partial z}{\partial x}
$$


Similarly, the ***pullback*** of $g$, incoperates the knowledge of $\frac{\partial z}{\partial y}$ and takes $\frac{\partial F}{\partial z}$ to $\frac{\partial F}{\partial y}$. i.e.

$$
\frac{\partial F}{\partial z} \xrightarrow{\frac{\partial z}{\partial y}} \frac{\partial F}{\partial y}.
$$

Multivariate reverse mode - simple example
------------

$$
f(x,y,z) = xzsin(xy)
$$

Computational cost analysis
------------

- Recall: ***Forward mode*** means that we compute the derivative as the <b>SAME</b> direction as evaluation, i.e.

$$
\frac{\partial y}{\partial x} \rightarrow \frac{\partial z}{\partial x} \rightarrow \frac{\partial F}{\partial x}
$$

where we compute $\frac{\partial z}{\partial y}$ and $\frac{\partial F}{\partial z}$ in the first and second arrow respectively. Or, equivalently, 

$$
\frac{\partial F}{\partial x} = \frac{\partial F}{\partial z} \left(\frac{\partial z}{\partial y} \frac{\partial y}{\partial x}\right).
$$

What is the computational cost of evaluting $\frac{\partial F}{\partial x}$  as above? This involves:


- evaluating $\frac{\partial y}{\partial x}$, $\frac{\partial z}{\partial y}$ and $\frac{\partial F}{\partial z}$. (Note : we do not have to store the full Jacobian in practice!)
- $|x| \times |y| \times |z| + |x| \times |z| \times |F|$ multiplications

How about backward mode?

- ***Backward mode*** means that we compute the derivative as the <b>REVERSE</b> direction as evaluation, i.e.

$$
\frac{\partial F}{\partial z} \rightarrow \frac{\partial F}{\partial y} \rightarrow \frac{\partial F}{\partial x}
$$

where we compute $\frac{\partial z}{\partial y}$ and $\frac{\partial y}{\partial x}$ in the first and second arrow respectively. Or, equivalently, 

$$
\frac{\partial F}{\partial x} = \left(\frac{\partial F}{\partial z} \frac{\partial z}{\partial y}\right) \frac{\partial y}{\partial x}
$$

What is the computational cost of evaluting $\frac{\partial F}{\partial x}$  as above? This involves:


- evaluating $\frac{\partial y}{\partial x}$, $\frac{\partial z}{\partial y}$ and $\frac{\partial F}{\partial z}$
- $|F| \times |z| \times |y| + |F| \times |y| \times |x|$ multiplications


Hence, when $|F| \gg |x|$, <b>BACKWARD</b> mode is more efficient, and if $|x| \gg |F|$, <b>FORWARD</b> mode is preferred. 


***Note:*** In pratice, the ***optimal*** scheme would usually be a mixed mode differentiation.
