# Introduction

Describe the problem the software solves and why it's important to solve that problem.

Differentiation has ubiquitous applications in many areas of mathematics, sciences and engineering. As such, it is certainly useful and convenient if computer programs could carry out differentiation automatically for application in a wide variety of cases. For computationally heavy projects, the ability to compute derivatives automatically becomes even more critical as manually working out deriatives in such projects is certainly an impossible task. Even though there exists methods such as *numerical diffentiation* and *symbolic differentiation* in determining derivatives computationally, these two methods have their limitations. In the following, we shall briefly review *numerical diffentiation* and *symbolic differentiation* to highlight some of their difficulties before moving on to describing *automatic differentiation* and the advantages it brings over the other two methods.   

### Numerical Differentiation
In *numerical differentiation*, the value of derivatives is approximated using the following formula:

$$
\frac{\partial{f(x)}}{\partial{x}} \approx \frac{f(x+h)-f(x)}{h}
$$

However, when the h values are too small, the numerical approximation fluctuates about the analytical answer. This is because the step size is too small, leading to a round-off error of the floating points caused by the limited precision of computations. On the other hand, when the h values are too large, the numerical approximation becomes inaccurate. This is because the step size is too big, leading to an error of approximation known as truncation error.

### Symbolic Differentiation
In *symbolic differentiation*, expressions are manipulated automatically to obtain the required derivatives. At its heart, *symbolic differentiation* applies transformations that captures the various rules of differentiation in order to manipulate the expressions. However, *symbolic differentiation* requires careful and sometimes ingenious design as accidental manipulation can easily produce large expressions which take up a lot of computational power and time, which leads to a problem known as expression swell.

### Automatic Differentiation
As seen from above, both *numerical diffentiation* and *symbolic differentiation* have their respective issues when it comes to computing derivatives. These issues are further exacerbated when calculating higher order derivatives, where both errors and complexity increases. *Automatic differentiation* overcomes these issues by recognizing that every differentiation, no matter how complicated, can be executed in a stepwise fashion with each step being an execution of either the elementary arithmetic operations (addition, substraction, multiplication, division) or the elementary functions (sin, sqrt, exp, log, etc.). To track the evaluation of each step, *automatic differentiation* produces computational graphs and evaluation traces. To compute the derivatives, *automatic differentiation* applies the chain rule repeatedly at all steps. By taking a stepwise approach and using the chain rule, *automatic differentiation* circumvents the issues encountered by both *numerical diffentiation* and *symbolic differentiation* and automatically compute derivatives that are both accurate and with a high level of precision. In order to further understand *automatic differentiation*, we present the mathematical background and essential ideas of *automatic differentiation* in the next section.

Note - In our research of automatic differentiation, we referred to the following resources:

Baydin, A.G., Pearlmutter, B.A., Radul, A. A. & Siskind, J.M. (2018). Automatic differentiation in machine learning: A survey. *Journal of Machine Learning Research, 18*, 1-42.

Geeraert, S., Lehalle, C.A., Pearlmutter, B., Pironneau, O. & Reghai, A. (2017). Mini-symposium on automatic differentiation and its applications in the financial industry. *ESAIM: Proceedings and Surverys* (pp. 1-10).

Berland, H. (2006). *Automatic differentiation* [PowerPoint Slides]. Retrieved from http://www.robots.ox.ac.uk/~tvg/publications/talks/autodiff.pdf

# Background

Describe (briefly) the mathematical background and concepts as you see fit. You do not need to give a treatise on automatic differentation or dual numbers. Just give the essential ideas (e.g. the chain rule, the graph structure of calculations, elementary functions, etc). Do not copy and paste any of the lecture notes. We will easily be able to tell if you did this as it does not show that you truly understand the problem at hand.

As mentioned before, *automatic differentiation* employs a stepwise approach and chain rule to automatically compute derivatives. We shall first state the chain rule in calculus before showing an example production of an evaluation trace and computational graph. Next, we discuss one mode of *automatic differentiation*, namely the forward mode. In particular, the demonstration of the use of chain rule at each step to determine derivatives will be shown here. Finally, we touch on the use of dual numbers in *automatic differentiation*. 

### Chain Rule 
For a function $f(u(t),v(t))$, the chain rule is given by

$$
\begin{align}
 \frac{\partial f}{\partial t} = \frac{\partial f}{\partial u}\frac{\partial u}{\partial t} + \frac{\partial f}{\partial v}\frac{\partial v}{\partial t}
\end{align}
$$

### Example Production of Evaluation Trace & Computational Graph
The most straightforward way to show the generation of an evaluation trace and computational graph is to consider an example. For this purpose, we study the following function 

$$
f(x,y) = sin(x) + 4y
$$

#### Evaluation Trace
The evaluation trace breaks the function into individual steps and creates a buildup of the function starting with the input variables. At each step, only either an elementary arithmetic operation (addition, substraction, multiplication, division) or an elementary function (sin, sqrt, exp, log, etc.) is used to build the function for the next step. The evaluation trace for our function of interest is shown in the table below.

| Trace | Elementary Function | Current Value | Comment               | 
| :---: | :-----------------: | :-----------: | :-------------------: | 
| $x_{1}$ | $x_{1}$           | $x$           | Input x               |
| $x_{2}$ | $x_{2}$           | $y$           | Input y               |
| $x_{3}$ | $sin(x_{1})$      | $sin(x)$      | Elementary function   |
| $x_{4}$ | $4*x_{2}$         | $4y$          | Elementary arithmetic |
| $x_{5}$ | $x_{3}+x_{4}$     | $sin(x) + 4y$ | Elementary arithmetic |


#### Computational Graph 
The computational graph translates the essence of the evaluation trace into a graph and captures the relationship between each step. Refer to the figure below for the computational graph of our function of interest.  

![computational-graph](Computational_Graph.png)

### Forward Mode
Armed with the knowledge of the chain rule, evaluation trace and computational graph, we can now consider the forward mode of *automatic differentiation*. The table below shows the earlier evaluation trace table that has now been expanded to include columns that store derivatives. At each step, the chain rule is applied to determine the elementary function derivative. 

| Trace | Elementary Function | Current Value | Elementary Function Derivative | $\nabla_{x}$ Value  | $\nabla_{y}$ Value  | 
| :---: | :-----------------: | :-----------: | :--------------------------: | :---------------------: | :---------------------: | 
| $x_{1}$ | $x_{1}$       | $x$           | $\dot{x}_{1}$             | $1$      | $0$ |
| $x_{2}$ | $x_{2}$       | $y$           | $\dot{x}_{2}$             | $0$      | $1$ |
| $x_{3}$ | $sin(x_{1})$  | $sin(x)$      | $cos(x_{1})\dot{x}_{1}$   | $cos(x)$ | $0$ |
| $x_{4}$ | $4*x_{2}$     | $4y$          | $4\dot{x}_{2}$            | $0$      | $4$ |
| $x_{5}$ | $x_{3}+x_{4}$ | $sin(x) + 4y$ | $\dot{x}_{3}+\dot{x}_{4}$ | $cos(x)$ | $4$ |

As seen from the table above, the derivative of elementary functions such as $sin$ has to be done manually and this has implications for our design of the *automatic differentiation* package later. Specifically speaking, we would need to define separate classes for each elementary function. For more details, refer to the Implementation section below.

In addition, the first and second row has initial values for $\nabla_{x}$ and $\nabla_{y}$ as (1,0) and (0,1) respectively. These are actually seed values for the stepwise propagation of the values of derivatives. The forward mode actually calculates the dot product between the gradient of our function with the seed vector (ie directional derivative). In this case, we have a scalar function with two variables, but in the case of a vector function of vectors, the forward mode actually calculates the dot product between the Jacobian matrix ($J$) and seed vector ($p$) (ie $J.p$). 

### Dual Numbers
Dual numbers extend the real number line in another direction by adding a second component. This extension is analagous to the extension of real numbers by imaginary numbers. The general form of a dual number is given by 

$$ x = a + \epsilon b, $$

where $\epsilon$ is defined as $\epsilon^2 = 0$, $a$ is the real part and $b$ is the dual part of the dual number.

In our *automatic differentiation* package, we can define a dual class that has two attributes. One of these attributes stores the value of the function while the other stores the value of the derivatives. This is similar to having a dual number with the value of a function as the real part and the value of derivatives as the dual part. Having such a dual number structure allows us to carry out the expected arithmetic operations between two dual instances.

#### Addition

$$ 
\begin{align}
(x +\epsilon \dot{x}) + (y +\epsilon \dot{y}) &= (x+y) + \epsilon(\dot{x}+\dot{y})
\end{align}
$$ 

#### Subtraction

$$ 
\begin{align}
(x +\epsilon \dot{x}) - (y +\epsilon \dot{y}) &= (x-y) + \epsilon(\dot{x}-\dot{y})
\end{align}
$$ 

#### Multiplication

$$ 
\begin{align}
(x +\epsilon \dot{x})*(y +\epsilon \dot{y}) &= xy+\epsilon x\dot{y}+\epsilon \dot{x}y+\epsilon^2\dot{x}\dot{y}\\
&= xy + \epsilon(x\dot{y} + \dot{x}y)
\end{align}
$$ 

#### Division

$$ 
\begin{align}
(x +\epsilon \dot{x}) / (y +\epsilon \dot{y}) &= \frac{(x +\epsilon \dot{x})(y -\epsilon \dot{y})}{(y +\epsilon \dot{y})(y - \epsilon \dot{y})} \\
&= \frac{xy-\epsilon x\dot{y}+\epsilon \dot{x}y-\epsilon^2\dot{x}\dot{y}}{y^2-\epsilon^2\dot{y}^2} \\
&= \frac{xy + \epsilon(-x\dot{y} + \dot{x}y)}{y^2} \\
&= \frac{x}{y} + \frac{\epsilon(y\dot{x}-x\dot{y})}{y^2} 
\end{align}
$$ 

In sum, this section covers the mathematical background and essential ideas of *automatic differentiation* for a scalar function with two variables. These basic concepts can be extended easily to higher dimensions if needed. In fact, our *automatic differentiation* package will not only handle scalar functions of scalar and vector values, but also vector functions of vectors.