# Milestone 1: *AutoDiffAll*

## Table of Contents
1. [Introduction](#introduction)
2. [Background](#background)
3. [User API](#API)
4. [Software Organization](#SoftwareOrganization)
5. [Implementation](#implementation)
    - [Core Data Structure](#p1)
    - [Major Class](#p2)
    - [Method and Name Attributes in AutoDiff Class](#p3)
    - [Other Functions](#p4)
    - [External Dependences](#p5)

<a id="introduction"></a>
## Introduction 
>Todo: Describe problem the software solves and why it's important to solve that problem

<a name="background"></a>
## Background
>To do: Describe (briefly) the mathematical background and concepts as you see fit.  You **do not** need to
give a treatise on automatic differentation or dual numbers.  Just give the essential ideas (e.g.
the chain rule, the graph structure of calculations, elementary functions, etc).

It goes without saying that taking derivatives is an essential operation in numerical methods, optimization, and science. From a computational perspective, however, calculating a derivative can be a difficult.

If one uses **finite differences** (i.e. $f'(x) \approx (f(x+\epsilon) - f(x))/\epsilon)$), one needs to choose $\epsilon$ appropriately. If $\epsilon$ is too large, the approximation is poor. If $\epsilon$ is too small, one introduces round-off erros.

Alternatively, if one uses **symbolic differentiation** (i.e. an algorithm that produces the derivative as a symbolic function), the problem becomes computationally infeasible when you either have functions with many inputs or want to take high order derivates. These two scenarios occur often in applications.

**Automatic differentiation** overcomes these challenges by providing both quick and accurate derivatives. It does so by computing derivatives recursively using the chain rule. All functions are either an **"elementary" function**, for which we know the derivative, or a composition of elementary functions. To calculate the derivative of a composition $f(g(x))$, we apply the chain rule as follows:

$$
\frac{df}{dx} = \frac{df}{dg}\frac{dg}{dx}
$$

The chain rule can be applied recursively, which we exploit in automatic differentiation. For example, if we have a complex compositional function $f(g(h(x)))$, we can compute f'(x) by first calculating

$$
\frac{dg}{dx} = \frac{dg}{dh}\frac{dh}{dx}
$$

and then plugging this derivative into 

$$
\frac{df}{dx} = \frac{df}{dg}\frac{dg}{dx}
$$

This is in fact a simple example of **forward-mode** automatic differentiation. In general, to conduct forward mode automatic differntation, we represent our function to differentiate as a **computational graph**. The computational graph  captures the inputs and outputs of our elementary functions. In an example that can be found [here](http://www.columbia.edu/~ahd2125/post/2015/12/5/), we can represent $f(x,y)=\cos(x)\sin(y)+\frac{x}{y}$ as 

![comp-graph](figs/comp_graph_background.png)

By computing derivatives recursively using the chain rule from the inputs $x$ and $y$ to the output $f$, we can calculate the derivative over the entire graph.

This project will implement only forward-mode automatic differentiation, but as an aside, **reverse-mode automatic differntiation** begins at the output(s) of the computational graph and calculates derivates using the chain rule by traversing the graph backwards.


<a id="API"></a>
## User API 

### Initial Setting
The user would import the main `autodiff` class, as well as the `numpy` elementary functions that we have overridden. The usage of our elementary function stored in the AutoDiffFun

```python
from AutoDiff import AutoDiff as AD # Josh: can we come up with a better name? These are supposed to be the function inputs, right?
from AutoDiff import AutoDiffFun as ADF
import AutoDiff.numpy as np
```

### User Case
User can write the function directly using the variable being declared as AD nodes and using ADF function like:

```python
x=AD(2,n_dim=1)
f=ADF.sin(x)+ADF.exp(y) #Josh: What is y? Can we have the functions live in our overwritten version of numpy
```
Or write a lambda function and fit the AD nodes later like:

```python
f=lambda x: ADF.sin(x)+ADF.exp(y) #Josh: Again, what is y?
result=f(AD(2,n_dim=1)
```

Our `AutoDiff` class can handle scalar or vector function in scalar or vector value, and can even handle when there are no explicit expression of the derivative for certain points (differentiate the non-differentiable function). 

More examples are as follows:

#### Scalar Function
- scalar values.
    
In the case for a scalar function, the implementation is simple. The user calls initializes AD instances on the independent variable x at a given value. The user can then use their function on the AD instance just as they would previously on a given float/integer. They can then use the `der` function that basically just returns the `der` instance attribute.


```python
# unit variable case
x = AD(0, n_dim=1) 
test_fn1 = x - ADF.exp(-2*ADF.sin(4x))
print(test_fn1.val, test_fn1.der)
```

>Output:
>-1, -1

```python
# multi variable case
x=AD(1.0, n_dim=2)
y=AD(2.0, n_dim=3)
test_fn2 = xy+(x+y)*6
print(test_fn2.val)
print(test_fn2.der)
```

>Output
>20
>[8, 7]

- vector values
    
For a scalar function of vectors with length n with multiple independent variables d, the implementation is largely similar. The user has to initialize 2 separate AD instances for each independent variable. Moreover, the user has to indicate the number of independent variables d at initialization. The rest of the implementation is exactly the same as before. However, we note that the der function would return a n*d array of derivative with respect to all the independent variables at each value in the vector function.

```python
test_fn2 = lambda x, y: x*y + np.sin(x)
x = AD([2.0, 5.0, 7.0], n_dim=2) 
y = AD([1.0, 2.0, 3.0], n_dim=2) 
ad_res = test_fn2(x, y) 
der(ad_res) # returns a 3x2 array of derivatives with respect to both x and y at each value
```

#### Vector Function
Even if we have a vector function of vectors with lengths m and n respectively, with multiple independent variables d, the implementation remains the same. However, when the user calls the vector function on the AD instances, he/she gets returned a vector of AD instances. As such, the user will have to call the der function separately on each AD instance.

```python
test_fn4 = lambda x, y: (xy + np.sin(x), x+y+np.sin(xy))
ad_res = test_fn4(x,y) 
der(ad_res[0]) 
der(ad_res[1])
```

- scalar value
    
the nodes fits in scalar value the same as in Scalar function.

- vector value

the nodes fits in vector value the same as in Scalar function.


<a name="SoftwareOrganization"></a>
## Software Organization 

### Directory Structure

The directory structure will be as follows

`
AutoDiff
|-README.md
|-LICENSE
|-setup.py
|-requirements.txt
|-AutoDiff
  |-__init__.py
  |-variables.py
  |-numpy.py
  |-user_func.py
  |-derivatives.py
|-docs
  |-documentation.md
|-tests
  |-__init__.py
  |-test_basic.py
  |-test_numpy.py
  |-test_user_func.py
  |-test_derivatives.py
`
### Modules

The `Variables` module contains the functionality to define variables that are compatible with automatic differentiation. These variables will be passed to functions in AutoDiff.Numpy or user defined functions.



### Test Suite

We will store our tests in the `tests` module and run them using `TravisCI`.

### Distribution

We will distribute our package on `PyPI`

>Todo: Discuss how you plan on organizing your software package.
* What will the directory structure look like?  
* What modules do you plan on including?  What is their basic functionality?
* Where will your test suite live?  Will you use `TravisCI`? `Coveralls`?
* How will you distribute your package (e.g. `PyPI`)?


<a name="implementation"></a>
## Implementation 
<a id="p1"></a>
### Core Data Structure
We want to follow the computational graph and construct the "node" class as our major data structure `AutoDiff` class. So we can go through the computational graph and update the value and derivatives along the graph.

![comp-graph](figs/Computational-Graph.png)

<a id="p2"></a>
### Major Class
The main class that we will implement is the `AutoDiff` class that takes as input the values of the "independent variables"(either a scaler or a vector) at the function that we are calculating the derivative on. The "independent variables" can be seen as a node in the computational graph. 

<a id="p3"></a>
### Method and Name Attributes in AutoDiff Class
* Name Attributes

The `AutoDiff` class will have two main instance variables, the value of the AutoDiff instance, and the derivatives of the instance. The derivatives of the instance is initialized with the relevant `n_dim` seed vectors, where `n_dim` is the number of independent variables.

1. `AutoDiff.n_dim`: number of the variables in the target function. Use it to decide the dimension of derivatives.
2. `AutoDiff.val`: value of variable(nodes). the shape is the same as the input variable. So if input is scalar, it will be scalar, while if input is vector, it is vector.
3. `AutoDiff.der`: value of derivatives in this nodes. It is `m*n_dim` array, which store the value of the nodes(`m` value in a vector) to `n_dim` different variables.

* Overide the four basic operations. 

In order to override the four basic operations of elementary arithmetic (addition, subtraction, multiplication, and division), we use dunder methods within our `AutoDiff` class. The dunder methods return new `AutoDiff` instances with the updated value and derivatives.

<a id="p4"></a>
### Other function 

* Define elementary differentiation function. 

In order to deal with the other elementary functions (exponential, logarithm, powers, roots, trigonometric functions, inverse trigonometric functions, hyperbolic functions, etc.), we will override the numpy elementary functions such that we can use it for our AutoDiff class. 
>For example, we will override the `np.sin` function such that if you use it on an `AutoDiff` instance `x` at a given value, it will return another `AutoDiff` instance with the value of $\sin(x)$, and the calculated derivative of $\dot{x}cos(x)$ at the given value. Similarly, we will override the `np.exp` function such that if you use it on an `AutoDiff` instance `x` at a given value, it will return another `AutoDiff` instance with the value of $\exp(x)$, and the calculated derivative of $\dot{x}exp(x)$ at the given value.

* Define non-differentiable function.

We can even handle some function which is non-differentiable at certain points, such as Zigzig function like Brownian Motion, or like $f(x)=\frac{1}{x}$ at $x=0$. This is our extension for the `AutoDiff` class. We will employ $$f'(x)\approx\frac{f(x+\Delta x)-f(x)}{\Delta x}$$ to calculate the differentiation of that point. <font color = 'red'> **JOSH: ARE WE SURE THAT THIS WILL WORK? DO WE WANT TO ENSURE CONTINUITY AT LEAST?** </font>

<a id="p5"></a>
### External dependencies 

In order to implement this, we will rely on the numpy and math external libraries, which will be specified as our external dependencies in our setup.py file.


As such, after the user initializes the AutoDiff class on the indepndent variables, he/she will be able to use the usual elementary functions on the AutoDiff instance in order to calculate both the value of the function and the value of the derivative.