# Milestone 1: *AutoDiffAll*

## Table of Contents
1. [Introduction](#introduction)
2. [Background](#background)
3. [User API](#API)
4. [Software Organization](#SoftwareOrganization)
5. [Implementation](#implementation)
    - [Core Data Structure](#p1)
    - [Major Class](#p2)
    - [Method and Name Attributes in AutoDiff Class](#p3)
    - [Other Functions](#p4)
    - [External Dependences](#p5)

<a id="introduction"></a>
## Introduction 

It goes without saying that taking derivatives is an essential operation in numerical methods, optimization, and science. From a computational perspective, however, calculating a derivative can be a difficult.

If one uses **finite differences** (i.e. $f'(x) \approx (f(x+\epsilon) - f(x))/\epsilon)$), one needs to choose $\epsilon$ appropriately. If $\epsilon$ is too large, the approximation is poor. If $\epsilon$ is too small, one introduces round-off errors.

Alternatively, if one uses **symbolic differentiation** (i.e. an algorithm that produces the derivative as a symbolic function), the problem becomes computationally infeasible when you either have functions with many inputs or want to take high order derivatives. These two scenarios occur often in applications.

**Automatic differentiation** overcomes these challenges by providing both quick and accurate derivatives.

<a name="background"></a>
## Background
>To do: Describe (briefly) the mathematical background and concepts as you see fit.  You **do not** need to
give a treatise on automatic differentation or dual numbers.  Just give the essential ideas (e.g.
the chain rule, the graph structure of calculations, elementary functions, etc).

Automatic differentiation computes derivatives recursively using the chain rule. All functions are either an **"elementary" function**, for which we know the derivative, or a composition of elementary functions. To calculate the derivative of a composite function $f(g(x))$, we apply the chain rule as follows:

$$
\frac{df}{dx} = \frac{df}{dg}\frac{dg}{dx}
$$

The chain rule can be applied recursively, which we exploit in automatic differentiation. For example, if we have a complex composite function $f(g(h(x)))$, we can compute f'(x) by first calculating

$$
\frac{dg}{dx} = \frac{dg}{dh}\frac{dh}{dx}
$$

and then plugging this derivative into 

$$
\frac{df}{dx} = \frac{df}{dg}\frac{dg}{dx}
$$

This is in fact a simple example of **forward-mode** automatic differentiation. In general, to conduct forward mode automatic differntation, we represent our function to differentiate as a **computational graph**. The computational graph  captures the inputs and outputs of our elementary functions. In an example that can be found [here](http://www.columbia.edu/~ahd2125/post/2015/12/5/), we can represent $f(x,y)=\cos(x)\sin(y)+\frac{x}{y}$ as 

![comp-graph](figs/comp_graph_background.png)

By computing derivatives recursively using the chain rule from the inputs $x$ and $y$ to the output $f$, we can calculate the derivative over the entire graph.

This project will implement only forward-mode automatic differentiation, but as an aside, **reverse-mode automatic differentiation** begins at the output(s) of the computational graph and calculates derivates using the chain rule by traversing the graph backwards.

<a id="API"></a>
## User API 

### Initial Setting
The user would import the main `AutoDiff` class, as well as the `numpy` elementary functions that we have overridden.

```python
from AutoDiff import variable 
import AutoDiff.numpy as anp
```

### User Case
User can write the function directly using the variable being declared as AD nodes and using ADF function like:

```python
x = variable(2) # set variable x to 2
y = variable(3) # set variable y to 3
f = anp.sin(x)+anp.exp(y)
```
Or write a lambda function and fit the AD nodes later like:

```python
f = lambda x, y: anp.sin(x)+anp.exp(y)
result = f(x, y)
```

Our `AutoDiff` class can handle scalar or vector function and scalar or vector inputs.  

More examples are as follows:

#### Scalar Function
- scalar values.
    
In the case for a scalar function, the implementation is simple. The user initializes an independent variable x at a given value. The user can then use their function on the variable instance just as they would previously on a given float/integer. They can then use the `der` function, which returns the derivative of the function with respect to the input variable.


```python
# unit variable case
>>> x = variable(3)
>>> f = x**2
>>> print(f.val, f.der)
9, 6 
```

```python
# multi variable case
>>> x = variable(3)
>>> y = variable(2)
>>> f = x**2+y**2
>>> print(f.val)
>>> print(f.der(x))
>>> print(f.der(y))
>>> print(f.grad())
13
6
4
[6, 4]
```

- vector values
    
For a scalar function of $d$ vectors with length $n$, the implementation is largely similar. The user has to initialize 2 separate varible instances for each independent variable. The rest of the implementation is exactly the same as before. However, we note that the `der` function would return a $n$ array of derivative with respect to all the independent variables at each value in the vector function. This is calculated element-wise.

```python
>>> x = variable([2.0, 5.0]) 
>>> y = variable([1.0, 2.0]) 
>>> f = x**2+y**2
>>> print(f.val)
>>> print(f.der(x))
>>> print(f.der(y))
>>> print(f.grad())
[5, 29]
[4, 10]
[1, 4]
[[4, 1], [10, 4]]

```

#### Vector Function
Even if we have a vector function of vectors with an input of dimension n and and output of dimension m respectively, with multiple independent variables d, the implementation remains the same. However, when the user calls the vector function on the variable instances, he/she gets returned a vector of variable instances. As such, the user will have to call the `der` function separately on each AD instance.

```python
>>> x = variable([2.0, 5.0]) 
>>> y = variable([1.0, 2.0]) 
>>> f = [x**2+y**2, x+y]
>>> print(f.val)
>>> print(f.der(x))
>>> print(f.der(y))
>>> print(f.jacobian())
[[5,3], [29,7]]
[[4,1], [10,1]]
[[2,1], [4,1]]
[[[4, 2],[1,1]], [[10,5],[4,2]]] # These are jacobians for the first (x,y) pair, second (x,y) pair, and so on...

```

### Project extension: User defined functions

The user can define their own function as follows. For example, imagine if a user wanted to implement the natural logarithm (although this is included in our numpy module anyways...)

```python
>>> log = user_function(np.log, lambda x: 1/x):
>>> x = variable(1)
>>> f = log(x)
>>> print(f.val)
>>> print(f.der)
0
1
```

<a name="SoftwareOrganization"></a>
## Software Organization 

### Directory Structure

The directory structure will be as follows

`
AutoDiff
|-README.md
|-LICENSE
|-setup.py
|-requirements.txt
|-AutoDiff
  |-__init__.py
  |-variables.py
  |-numpy.py
  |-user_func.py
  |-derivatives.py
|-docs
  |-documentation.md
|-tests
  |-__init__.py
  |-test_variables.py
  |-test_numpy.py
  |-test_user_func.py
  |-test_derivatives.py
`
### Modules

The `variables` module contains the functionality to define variables that are compatible with automatic differentiation. These variables will be passed to functions in `AutoDiff.numpy` or to functions specified by the user with `user_func`.

The `derivatives` module stores our derivative functionality that will be accessed by the variables outputted by our functions.

### Test Suite

We will store our tests in the `tests` module and run them using `TravisCI`.

### Distribution

We will distribute our package on `PyPI`


<a name="implementation"></a>
## Implementation 
<a id="p1"></a>

### Major data structure: Variables and the Computational Graph

Our variables will be the nodes in our computational graph. The variables will keep track of the node's value and it's derivative.

<a id="p2"></a>
### Classes

The main class that we will implement is the `variable` class. All auto-differentiable functions will have inputs and outputs consisting of instances of the `variable` class. The `variable` class will be an extension on ordinary python numbers that will also store a derivative in addition to it's value.

<a id="p3"></a>
### Method and Name Attributes in Variable Class
* Name Attributes

The `variable` class will have two main instance variables, the value of the variable instance, and the derivatives of the instance.

`variable.val`: value of the variables. The shape is the same as the input variable. So if input is scalar, it will be scalar, while if input is vector, it is vector.

`variable.der`: value of the derivatives. The shape is the same as the input variable. So if input is scalar, it will be scalar, while if input is vector, it is vector.

* Methods

In order to override the four basic operations of elementary arithmetic (addition, subtraction, multiplication, and division), we use dunder methods within our `variable` class. The dunder methods return new `variable` instances with the updated value and derivatives.

We will also implement methods that will allow the user to access the derivatives of the variable.

<a id="p4"></a>
### Other function 

* Define elementary differentiation function. 

In order to deal with the other elementary functions (exponential, logarithm, powers, roots, trigonometric functions, inverse trigonometric functions, hyperbolic functions, etc.), we will override the numpy elementary functions such that we can use it for our AutoDiff class. 

>For example, we will override the `np.sin` function such that if you use it on an `variable` instance `x` at a given value, it will return another `variable` instance with the value of $\sin(x)$, and the calculated derivative of $\dot{x}\cos(x)$ at the given value. Similarly, we will override the `np.exp` function such that if you use it on an `variable` instance `x` at a given value, it will return another `variable` instance with the value of $\exp(x)$, and the calculated derivative of $\dot{x}\exp(x)$ at the given value.

To define an auto-differentiable function, the user will pass the expression for the value and derivative to the `user_function` method, which will return a function compatible with `variable` inputs.

<a id="p5"></a>
### External dependencies 

In order to implement this, we will rely on the numpy and math external libraries, which will be specified as our external dependencies in our setup.py file.


As such, after the user initializes the AutoDiff class on the indepndent variables, he/she will be able to use the usual elementary functions on the AutoDiff instance in order to calculate both the value of the function and the value of the derivative.