# Introduction

Our software implements automatic differentiation (AD). Briefly, AD is a method of finding the derivatives of functions quickly and to machine precision. We will first describe general topics in computing that use differentiation, and then discuss why this approach (over symbolic or numerical differentiation) is especially useful.

## Uses for fast differentiation 
There is a broad range of applications for software that can perform fast differentiation. We list only a few below.
* **Physics and chemistry:** \
    Any simulation of a system with evolving vector fields or particles with individual trajectories likely needs to differentiate large sets of equations. These equations could govern velocities, forces, or other properties that are assigned to a single region of the vector field or a single particle. For example, chemists and physicists use molecular dynamics simulations to study chemical solutions or to conduct biophysical modelling. These simulations model each molecule in a solution, let them evolve over time, and see how interactions settle into equilibrium. They aid understanding in areas like protein conformation, DNA packing, or crystal deformations.


* **Deep neural networks:** \
    When deep neural networks are trained to compute some function, the training process requires a backpropagation of errors to move weights in directions that will minimize those errors. The backpropagation step involves calculating derivatives of errors with respect to weights, and in multi-layer networks, many calculations must be done iteratively to update the entire network. Therefore, deep neural networks, found extensively in machine learning, benefit greatly from well-implemented differentiation packages.
    
    
* **Optimization:** \
    While neural networks been given a lot of attention recently, they can be considered a subset of more general optimization problems. These include optimizing geometries for engineering projects, developing protocols for resource usage at large companies, or improving a policy in reinforcement learning. Optimization is often done iteratively. Small steps are taken in the direction of performance gradients. The scale of differentiation necessary depends on the situation (reflected in input dimensionality or number of functions) but automatic differentiation can easily be applied to many of these cases.

## Why automatic differentiation?
The major alternatives to automatic differentiation are symbolic and numeric differentiation. Symbolic differentiation maintains full symbolic representations of function derivatives. These can be convoluted and unwieldy, especially when we need to store large numbers of them. Overall, speed is an issue for symbolic differentiation. Numeric differentiation, on the other hand, suffers from accuracy. Its implementations usually depend on linearizing a function around the point at which to calculate a derivative. These linearizations suffer from rounding errors. Both symbolic and numeric differentiation perform worse and worse in their respective weaknesses as the order of derivatives increases, as well.

For the examples in the previous section, we often need derivatives for thousands or millions of functions as quickly as possible. Thus, symbolic differentiation is at a disadvantage. At the same time, small systematic errors can easily propagate and impact the performance of the system. In the case of a molecular dynamics simulation, for instance, the output of the simulation will be a set of averaged quantities over all particles in the system. These outputs could be sensitive to the precision in derivatives for force or velocity equations, especially if calculations are consistently off for every particle. Numeric differentiation, then, may interfere with simulation results.

Automatic differentiation is both fast and precise. Consequently, it is often the best choice in our given scenarios to carry out differentiation on extensive sets of functions.

# Background
Automatic differentiation decomposes a function into its elementary constituents. It strings them together into a graph, which can be traced through to calculate derivatives using the chain rule. Each node in the graph only depends on its associated edges and the values of adjacent nodes, obviating the need to store symbolic representations of the entire function's derivative. We show an example below.
### Example: Automatic differentiation using a table
We can demonstrate the automatic differentiation process on the function 
$$
f(\mathbf{x}) = e^{\sin(x_1+x_2)}-e^{\cos(x_1-x_2)}
$$

First, we construct a graph comprised of only elementary functions (e.g. $+,\; -,\; *,\; /,\; \sin(x),\; \cos(x),\; e^x$).

![image](ADGraph.png)

Next, we evaluate each node for its value, elementary symbolic derivative, and derivative value. At every step, the symbolic derivative is multiplied by the previous derivative value if needed. This is how the chain rule is implemented. We choose arbitrary points to evaluate the derivative at: $\mathbf{x}=(\pi/2,\pi/2)$ and two seed vectors $\mathbf{\dot{x}_a}=(1,0)$ and $\mathbf{\dot{x}_b}=(0,1)$. The seed vectors indicate the direction in which we want the derivative: a seed vector of $\mathbf{\dot{x}}=(1,0)$ returns the function's partial derivative with respect to the first dimension; $\mathbf{\dot{x}}=(0,1)$ is with respect to the second. A seed vector of $\mathbf{\dot{x}}=(1,1)$ would return the change in the function in the direction of vector $(1,1)$.

<img src="ADTable.png" alt="Drawing" style="width: 500px;"/>

From the last row, we have derivatives evaluated to be $-1-e$ for $\mathbf{\dot{x}_a}=(1,0)$ and $-1+e$ for $\mathbf{\dot{x}_b}=(0,1)$. In this case, our seed vectors were chosen to return partial derivatives with respect to each input variable. Hence,
$$
\frac{\partial f}{\partial x_1} = -1-e \\
\frac{\partial f}{\partial x_2} = -1+e
$$

# How to use {autodiff package name}

# Software Organization

# Implementation