# Reverse mode automatic differentiation.

* Any function, no matter how complicated, is evaluated by performing a sequence of simple elementary operations (ops) that contain one or 2 argument at a time.
* Ops with a single argument: trigonometric functions, exp, log
* Ops with two arguments: *, +, -, /


**Example:** Let $f:\mathbb{R}^3 \rightarrow \mathbb{R}$ be given by

$$ f(x_1,x_2,x_3) = (x_1x_2 \sin(x_3) + e^{x_1x_2})/x_3$$

**Input Variables:** $x_1$, $x_2$, and $x_3$

**Intermediate Variables:** $x_4 = x_1x_2\;\; $,  $x_5 = \sin(x_3)$,  $x_6 = e^{x_4}$,  $x_7 = x_4x_5$,  and  $x_8 = x_6 + x_7$.

**Output Variable:** $x_9 = x_8 / x_3$.
!["Reverse mode"](./images/ReverseMode2.png)
<!-- <img style="float: center;" src="ReverseMode2.png" width="1000"> -->

* In reverse mode automatic differentiation, each node $x_i$ in the computational graph is associated with a scaler node $\bar{x_i} = \dfrac{\partial f}{\partial x_i}$.
* Information about $\dfrac{\partial f}{\partial x_i}$ is accumulated in $\bar{x_i}$ during a reverse sweep.
* In forward sweep, the values of nodes $x_i$ are calculated using input node values. The partials $\dfrac{\partial x_j}{\partial x_i}$, where $x_j$ is a direct descendant of $x_i$ are assigned to the edges
* Initialization: $\bar{x_i} = 0$ for $i = 1, \cdots, 8$ and set $\bar{x_9} = \dfrac{\partial f}{\partial x_9} = \dfrac{\partial x_9}{\partial x_9} = 1$.
* Chain Rule: $\bar{x_i} = \dfrac{\partial f}{\partial x_i} = \sum_{j \text{child of } i} \dfrac{\partial f}{\partial x_j}\dfrac{\partial x_j}{\partial x_i}$
* Accumulation: $\bar{x_i} +=  \dfrac{\partial f}{\partial x_j}\dfrac{\partial x_j}{\partial x_i}$.
* The above is performed as soon as $\dfrac{\partial f}{\partial x_j}$ becomes available.
* Once computations have been received from all children of node $i$, we have $\bar{x_i}$ and the node is declared finalized.
* At this stage node $i$ is ready to contribute to the terms in the summation for each of its parents nodes.
* The process is continued until all nodes are finalized.
* Note that the flow of the computation in the graph is from children to parents during the backward sweep.

## Automatic differentiation in TensorFlow
To differentiate automatically, TensorFlow needs to remember what operations happen in what order during the forward pass. Then, during the backward pass, TensorFlow traverses this list of operations in reverse order to compute gradients.

### Gradient tapes
TensorFlow provides the `tf.GradientTape` API for automatic differentiation; that is, computing the gradient of a computation with respect to some inputs, usually `tf.Variables`. TensorFlow "records" relevant operations executed inside the context of a `tf.GradientTape` onto a "tape". TensorFlow then uses that tape to compute the gradients of a "recorded" computation using reverse mode differentiation.

Here is a simple example:

In [1]:
import tensorflow as tf
import numpy as np

x = tf.Variable(3.0)

with tf.GradientTape() as tape:
  y = x**2

# dy = 2x * dx
dy_dx = tape.gradient(y, x)
dy_dx.numpy()

2022-11-01 16:11:24.319283: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


6.0

In [2]:
import tensorflow as tf
import numpy as np

x = tf.Variable(np.array([1.0,2.0,np.pi/2.]), dtype = tf.float64)

## Note: Use tf.sin not np.sin as TF will treat np.sin as a constant
with tf.GradientTape() as t:
    y = (x[0] * x[1] * tf.sin(x[2]) + tf.exp(x[0] * x[1])) / x[2]
yp = t.gradient(y, x)

print("y = ", y.numpy())
print("yp = ", yp.numpy())

yp_manual = [(4. / np.pi) * (1 + np.exp(2)), (2. / np.pi) * (1 + np.exp(2)), (-8. - 4. * np.exp(2)) / (np.pi ** 2)]
print('yp =', ['{:0.4f}'.format(i) for i in yp_manual])

y =  5.977258756447682
yp =  [10.68127797  5.34063898 -3.80524111]
yp = ['10.6813', '5.3406', '-3.8052']
