## Introduction

Differentiation is important to computation, optimization and engineering, showing up everywhere from neural networks to physics equations and any field that requires the calculation of rate of change, extrema, and zeros of a function. Automatic differentiation (autodiff) allows for the automatic computation of precise derivatives of a values by applying the chain rule to  a sequence of elementary arithmetic operations and functions repeatedly. 

### Automatic differentiation differs from the finite difference method and symbolic method of differentiation. 

**Symbolic differtiation** find the derivative of a given forumula, gives a new forumula as an output and plugs in a value to find the derivative at that value. 

For example to find the derivative of f where
$$f\left(x\right) = x^3 + 3.$$

We get 

$$\dfrac{d}{dx} f\left(x\right)= \dfrac{d}{dx}x^3 + \dfrac{d}{dx}3 .$$

If we combine derivative constant rule and power rule we get 

$$\dfrac{d}{dx} f\left(x\right)= 3x^2 + 0 =  3x^2.$$

This allows calculation at machine precision and provides a solution to a class of problems, not just a single problems. However, this can lead to inefficent code and can be costly to evalute. 

**Finite difference method** estimates a derivative by computing the slope of a secant line through the points (x, f(x)) and (x + h, f(x + h)), choosing a small number for h. The slope of this secant line approaches the slope of the tangent line as h approaches zero.
<img src="/home/jovyan/work/image-20201018-153111.png" width="400">

Therefore the derivative of f at x is:
$$ f'(x) = \lim_{h \to 0}\dfrac{f\left(x+h\right) - f\left(x\right)}{h}$$

This aproach is quick and easy but suffers from accuracy and precision due to truncation and rounding errors. 

**Automatic Differentiation** is more precise than the finite differences method and more efficient than symbolic differentiation. It allows for the computation of the derivative to machine precision without forming the formula for the derivative by using the chain rule to decompose derivatives. 

 $$y = f\left(g\left(h\left(x\right)\right)\right) = f\left(g\left(h\left(w_{0}\right)\right)\right) = f\left(g\left(w_{1}\right)\right) = f\left(w_{2})\right) = w_{3}$$
 
 $$w_{0}= x$$

 $$w_{1}= h\left(w_{0}\right)$$

 $$w_{2}= g\left(w_{1}\right)$$
 
 $$w_{3}= f\left(w_{2}\right)=y$$ 
 
 The Chain rule gives
 
 $
  \frac{\partial y}{\partial x} = \frac{\partial y}{\partial w_{2}}\frac{\partial w_{2}}{\partial w_{1}}\frac{\partial w_{1}}{\partial x}=\frac{\partial f(w_{2})}{\partial w_{2}}\frac{\partial g(w_{1})}{\partial w_{1}}\frac{\partial h(w_{0})}{\partial x}.$

## Background
### Automatic Differntiation: the forward mode
#### Review of the Chain Rule
The chain rule allows for computing the derivative of a composite funtion. If f and g are both differentiable then 
the chain rule gives the derivative of $f(g(x))$ in terms of the derivatives of f and g, $$f'(g(x))g'(x)$$.

The derivative is $$\dfrac{\partial f}{\partial x} = \dfrac{\partial f}{\partial g}\dfrac{\partial g}{\partial x}$$

##### Example: $f\left(g\left(x\right)\right) = \ln\left(x\right)^7$
$$\dfrac{\partial f}{\partial g} = 7\left(g\right)^6, \quad \dfrac{\partial g}{\partial x} = \dfrac{1}{x},\quad \Rightarrow \quad \dfrac{\partial f}{\partial x} = 7\ln\left(x\right)^6*\dfrac{1}{x}.$$

#### Elementary Functions
Complex functions can be broken down into simpler paired functions which can be applied to the chain rule. These elementary operations include the arithmetic operations (addition, subtraction, multiplication and division) and exponential and trigonometric functions whose derivatives we know. We can combine these elementary functions to make more complex functions and find the derivatives of these more complex functions with the chain rule. 

#### The Gradient
We can find the derivative of a function with multiple inputs by applying the chain rule. The derivative of $f(x) = g(u(x), v(x))$ is 

$  \frac{\partial f}{\partial x} = \frac{\partial g}{\partial u}\frac{\partial u}{\partial x} + \frac{\partial g}{\partial v}\frac{\partial v}{\partial x}.$

We can write this as 

$  \nabla_{x}g = \sum_{i=1}^{n}{\frac{\partial g}{\partial y_{i}}\nabla y_{i}\left(x\right)}.$

With this formula we can find the partial derivatives for each input. 

#### Computational Trace
We can compute the derivative of elementary functions and combine them using the chain rule to find the derivative of more complex functions. 

Consider the following function $$ f\left(x,y\right) = \exp\left(-\left(\sin\left(x\right) - \cos\left(y\right)\right)^{2}\right)$$  We'd like to evalute $f$ at the point $x= \left(\dfrac{\pi}{2}, \dfrac{\pi}{3}\right)$.


| Trace |  Eleme Operation  |  Numerical Value  |  Elem Deriv  |  value in respect to x  |  value in respect to y  |
| :------: | :----------------------: | :------------------------------: | :------: | :------: | :------: |
| $x_{1}$ | x | $\pi/2$ | x1. | 1 | 0 |
| $x_{2}$ | y | $\pi/3$ | x2. | 0 | 1 |
| $v_{1}$ | $\sin(x1)$ | 1 | $\cos(x1)*\dot{x}1$ | 0 | 0 |
| $v_{2}$ | cos(x2) | 0.5 | $\sin(x2)*\dot{x}_{2}$ | 0 | $\sqrt{3}/2$ |
| ${v}_{3}$ | $v_{1} - v_{2}$ | 0.5 |  $\dot{v}_{1} - \dot{v}_{2}$ | 0 | $\sqrt{3}/2$  |
| ${v}_{4}$ | ${v}_{3}^2$ | 1/4 | $2{v}_{3}  \dot{v}_{3}$ |0  |$\sqrt{3}/2$   | 
| ${v}_{5}$ | $-{v}_{4}$ | -1/4 | $-\dot{v}_{4}$ | 0 |- $\sqrt{3}/2$ | 
| ${v}_{6}$| $exp({v}_{5})$ | $exp(-1/4)$ | $exp({v}_{5})*\dot{v}_{5}.$  | 0 | $\exp(-1/4)$ * $(-\sqrt{3}/2)$  | 

#### The Forward mode
We work from the inside out, starting with what we want to evaluate 
$x= \left(\dfrac{\pi}{2}, \dfrac{\pi}{3}\right)$, then build out the actual function at each step subsituting the derivative of the inner functions  in the chain rule. 
 The function f(x,y) is composed of elementary functions which we know the derivative of, we compute the derivative of the elementary functions then 
 use the chain rule to build up to the larger function.




# How to Use DeriveMeCrazy Auto_diff:
## Importing: 

#### Direct import:
Users will be able to clone the project off of github and run it locally.
Once they have the code in their local env, they will be able to install with pip and import the different classes implemented. 



In [None]:
pip install .

## Demo

We include a basic demo below, including how to import, create an object, and use `AutoDiff` for calculating derivatives in an arithmetic operation and a trignometric operation. A more detailed demo for more types of operations is included in `docs/demo.ipynb`.

#### Import

In [None]:
import auto_diff_pkg.AutoDiff as AD

#### Creating an `AutoDiff` object

In [None]:
ad1 = AD.AutoDiff(5.0)
ad2 = AD.AutoDiff(3.0)

#### Arithmetic operations

In [None]:
ad3 = ad1 * ad2

print('value: {}'.format(ad3.val))
print('derivative: {}'.format(ad3.der))

#### Trignometric operations

In [None]:
ad4 = AD.sin(ad1)

print('value: {}'.format(ad4.val))
print('derivative: {}'.format(ad4.der))

# Software Organization
- Other than the setup related files, we will have a directory for each of the following:

- `auto_diff_pkg/` - all source code 
    - `__init__.py`
    - `AutoDiff.py`
    - `ReverseAutoDiff.py`
- `docs/` - documentation and usage examples
    - `demo.ipynb`
    - `milestone1.ipynb`
    - `milestone2_progress.ipynb`
    - `milestone2.ipynb`
    - `documentation.ipynb`
- `tests/` - test suites files 
    - `test_AutoDiff.py`
- `.travis.yml`
- `README.md`
- `requirements.txt`
- `setup.py`
- `LICENSE`
 
What modules do you plan on including? What is their basic functionality?
- Ideally we want our module to be as independent as possible and therefore we will try to rely only on one dependancy - `numpy` for elementary operations. 
This will allow us to utilize the basic data structures and math operations we don't need to overload, including trigonometric functions, exponentiations, and square roots. 
- In order to run the test suite we also used `pytest` and `codecov` packages 

Where will your test suite live? Will you use TravisCI? CodeCov?
- Our tests will live under the `tests/` subdirectory of our project. 
- Test suite is integrated with Travis-ci so that all tests are running with every push done to the repository 
- Codecov is also integrated so that testing code coverage is evaluated with every new build done on travis. 
- The repository README file contains both testing and coverage status tags that are pulling information from Travis and CodeCov

How will you distribute your package?
- We will use Github to distribute our package. Users can clone the repository and is set up in such a way that will allow them to install the package directly with a pip command. This will also validate our package with the current python standards for software distribution.
- Our code will also be available on github for direct cloning for those who wish to extend our code or simply prefer to not use pip.

How will you package your software? Will you use a framework? If so, which one and why? If not, why not?
- After reviewing several python framework, we have decided to only follow the requirements for PyPI packaging and distribution. 
- This is a well established and supported process.
There is no constraints on the testing or source code control tools we use. 
Is is simple enough so that we are sure this will not impact our project in terms of overhead. 

Other considerations:
- We also took into account the fact that our team members are for the most part relatively new to python packaging and so we wanted to make sure we make choices that are appropriate.

# Implementation


Core data structures:
- Our core data structure is an `AutoDiff` object consisting of 2 float variables for storing values and derivatives. We recursively store values and derivatives for every elementary function.

Core classes we will implement:
- A class `AutoDiff` for elementary functions that takes the advantage of dunder methods to recursively calculate the intermediate values of the trace and stores the value and the derivative of an operation.
- Within the same file in `AutoDiff.py` we have functions in the package that will perform the operations including trignometric functions, exponentiation, square root.
- A class `AD_Trace` for the trace table and graph.

Methods and name attributes our classes have:
- Class`AutoDiff`: 
    - `__init__`
        - `self.val` (value of an AutoDiff object that will be updated per elementary operation)
        - `self.der` (derivative of an AutoDiff object that will be updated per elementary operation)
    - `__neg__`
    - operator overload for all elementary functions, including `__add__`, `__radd__`, `__sub__`, `__rsub__`, `__mul__`, `__rmul__`, `__truediv__`, `__rtruediv__`, `__pow__`, `__rpow__`
    - `__str__`
    - Trignometric, exponentiation, square root functions that take an AutoDiff object as argument: `log`, `exp`, `sin`, `cos`, `tan`, `sqrt`
- Class `ReverseAutoDiff`:
    - `__init__`
        - `self.graph`
        - `self.variables` (variables for multivariable differentiations)
        - `self.functions` (corresponding functions for multifunction)
- Class `ReverseADNode`:
    - `__init__`
        - `self.value` (current value of the node)
        - `self.children` (children of this reverse mode operations)
        - `self.grad_value` (current value of the gradient)
        - `self.op` (associated elementary operation)

External dependencies we rely on:
- `numpy` for evaluating the elementary operations.
- `pytest` for evaluating our test suite.
- `notebook` for enabling the interactive demo.


How will you deal with elementary functions like sin, sqrt, log, and exp (and all the others)?
- Arithmatic operations such as `add`, `sub`, `mul`, `truediv`, `pow` are implemented as dunder methods in `AutoDiff` class. The reverse dunder methods are also implemented.
- Elementary functions like like sin, sqrt, log, and exp will be done outside the class `AutoDiff` but still within our package module.
This is due to the fact that these math functions are not defined in Python out of the box (they need to be pulled from Numpy, Math etc.), 
and we wanted to make sure we utilize Python's built in order of operations so we can avoid having to parse any function strings. 
These functions will operate on both `AutoDiff` objects as well as on other types such as int, float etc. 


# Extension: Reverse Mode
Our initial plan for the first two milestones was to implement backpropagation for our extenstion. However, as a team we decided that it would be more beneficial if the program instead implements the reverse mode. In comparison to the forward mode, where the final product is the Jacobian-vector product $Jp$, the final product of the reverse mode is $J^Tp$. By implementing the reverse mode, users of our package will be able to get the most efficient solution, regardless of the number of $m$ seed vectors and $n$ functions. When the user encounters a system where $n >> m$, they can use the forward mode implementation, and when they have a system where $m >> n$, they can use the reverse mode implementation.

The reverse mode starts by taking a forward pass through the the elementary functions, storing the partial derivatives along the way (without evaluation the chain rule). We then iterate from the end of the trace, multiplying the current partial derivative with the previous, i.e $v_N=\frac{\delta f}{\delta v_N}$, $v_{N-1}=\frac{\delta f}{\delta v_N}\frac{\delta v_N}{\delta v_{N-1}}$, and so on. The when a partial derivative has multiple children in the trace, their sum is taken instead. The structure and implementation code for doing so is given above in conjunction with the forward mode specifics.

# ------ ADDRESS THIS BEFORE SUBMITTING ---------------
#### AutoDiff Class Extensions
In order to make the AutoDiff class more useful overall, we plan to implement a few more dunder methods into the main class. As of now, our plan is to add methods for:
- __repr__
- comparators __eq__,__lt__,__gt__, etc.

#### Additional functions:
We would like to include additional functions that were not yet implemented such as sinh, cosh, tanh. 

#### Supporting Graph representation 
We would like like to be able to present the graph of operations for forward mode. 


#### Supporting Display of Evaluation Table 

#### Implementation of Root Finder 
We can, if time allows, add this as a feature of our module, similar to what was demonstrated in our demo notebook. 