 # Milestone 1
 
 First milestone for CS207 Fall 2018 Project Group 1.



## Introduction

Differentiation is ubiquitous in almost all aspects of computer science, mathematics, and physics. It is used for numeric root-finding as in Newton's Method, and used for optimization with different forms of gradient descent.
However, calculating analytic derivatives is difficult and can lead to exponentially growing abstract syntax trees, which makes finding the derivative infeasible in many cases.
Similarly, calculating the derivative numerically using the limit definition runs into numeric instabilities due to limited machine precision.
Automatic differentiation addresses both of these issues - it uses the chain rule and the fact that computers calculate any function as a sequence of elementary operations to find the derivative.

## Background

Automatic differentiation relies heavily on the principles of chain rule differentiation. A graph of elementary functions is built to calculate the values of more complex functions. Using the chain rule on the graph of elementary functions, the value of the derivative at each node can also be calculated. This gives us the ability to calculate the values of functions and their derivatives, no matter how complex, to near machine precision (a significant advantage compared to alternatives such as finite differences). 

The chain rule tells us that:
\begin{align}
\frac{d}{dx}(f(g(x))) &= f'(g(x))g'(x)\\
\end{align}

Since each step in our graph is just a combination of linear operations, we can find the derivative at a node by considering the value and derivative of the expressions at the previous node. By starting with an initial 'seed' vector for the derivative (often set to 1), we can find the derivative in any desired search direction. 

Below is an example of constructing a graph to find the exact values of a function and its derivative. The function we used was:

$$f\left(x, y, z\right) = \dfrac{1}{xyz} + \sin\left(\dfrac{1}{x} + \dfrac{1}{y} + \dfrac{1}{z}\right)$$

We worked through this, starting with trace elements $x_1$ for $x$,  $x_2$ for $y$ and  $x_3$ for $z$. We wanted to solve this function at $(x, y, z) = (1, 2, 3)$.

| Trace | Elementary Function | Current Value | Elementary Function Derivative | $\nabla_{x}$ Value  | $\nabla_{y}$ Value  | $\nabla_{z}$ Value  |
| :---: | :-----------------: | :-----------: | :----------------------------: | :-----------------: | :-----------------: | :-----------------: |
| $x_{1}$ | $x$ | 1 | $\dot{x}$ | 1 | 0 | 0 | 
| $x_{2}$ | $y$ | 2 | $\dot{y}$ | 0 | 1 | 0 | 
| $x_{3}$ | $z$ | 3 | $\dot{z}$ | 0 | 0 | 1 | 
| $x_{4}$ | $1/x_{1}$ | 1 | $-\dot{x}_{1}/x_{1}^{2}$ | $-1$ | $0$ | $0$ | 
| $x_{5}$ | $1/x_{2}$ | 1/2 | $-\dot{x}_{2}/x_{2}^{2}$ | $0$ | $-1/4$ | $0$ | 
| $x_{6}$ | $1/x_{3}$ | 1/3 |  $-\dot{x}_{3}/x_{3}^{2}$ | $0$ | $0$ | $-1/9$ | 
| $x_{7}$ | $x_{4} + x_{5} + x_{6}$ | 11/6 |$\dot{x}_{4} + \dot{x}_{5} + \dot{x}_{6}$ | -1 | -0.25 | -0.11 |
| $x_{8}$ | $sin(x_{7})$ | 0.966 |$\dot{x}_{7}cos(x_{7})$ | 0.260 | 0.065 | 0.029 | 
| $x_{9}$ | $x_{4}x_{5}x_{6}$| 1/6 |$\dot{x}_{4}x_{5}x_{6} + \dot{x}_{5}x_{4}x_{6} + \dot{x}_{6}x_{4}x_{5} $ |-0.167 | -0.083  | -0.056 | 
| $x_{10}$ | $x_{8} + x_{9}$ | 1.132 |$\dot{x}_{8} + \dot{x}_{9}$ | 0.093| -0.018  | -0.027 | 

This isn't a very complicated function, but it shows how we can use the most basic of functions to create a graph allowing us to find exact values and gradients.



## Usage

Since most expressions and applications that require automatic differentiation are not constructed dynamically, we will start by building the computational graph statically. Then, to perform computations and get the derivative, we feed our computational graph some inputs. A sample usage would look like:

```python
>>> import ad
>>> x = ad.Variable('x')
>>> f = 10.0 + 5.0 * (x ** 2.0) - 3.0 * x
>>> f.eval({'x': 5.0})
120.0
>>> f.d
47.0
```

Multiple inputs could also be provided. For example,

```python
>>> x = ad.Variable('x')
>>> y = ad.Variable('y')
>>> f = x * y
>>> f.eval({'x': 5.0, 'y': 2.0})
10.0
>>> f.d
{x: 2.0, y: 5.0}
```

## Software Organization

### Directory Structure

Our library currently only does Automatic Differentiation. Therefore, we only have one module, which we will call `ad`. We will add additional modules for our extension once we decide what we will be doing for that.

```bash
cs207project
├── LICENSE
├── README
├── ad
│   ├── __init__.py
│   ├── ad.py
│   ├── mat_ops.py
│   ├── plots.py
│   ├── simple_ops.py
│   └── tests
│       ├── test_eval.py
│       ├── test_forward.py
│       ├── test_mat_ops.py
│       └── test_simple_ops.py
└── setup.py
```

Our test suite will use TravisCI for continual integration and Coveralls for testing code coverage. Also, we will distribute our package through PyPI.

## Implementation

In our implementation, “Expression" would be the core data structure. Everything in the computational graph is an expression. 

The attributes of an expression includes its gradient and the links to its children. The methods of an expression includes "eval" (get the value of the expression) and "d" (get the gradient of the expression). Dunder methods like \__add__(), \__sub__(), \__mul__(), and \__truediv__()  will be implemented for ease of use.

Classes that we are going to implement for constructing the computational graph include: 

* Expression(object)
* Variable(Expression)
* Constant(Expression)

Classes that we will use for operations take in one or two instances of an `Expression` class and returns a new `Expression` class with the operation applied. These will store the children so we can traverse down our computational graph. The initial classes will be:

* Unop(Expression)
* Binop(Expression)
* Addition(Binop)
* Subtraction(Binop)
* Multiplication(Binop)
* Division(Binop)
* Sin(Unop)
* Cos(Unop)
* Exp(Unop)
* Log(Unop)

The `Unop` class is a base class that stores 1 child and `Binop` stores two children. This implementation should be able to handle vector functions of vectors by using multiple `Variable` objects in our function.

## External Dependencies

We will mainly be using `numpy` as an external dependency for mathematical and vector operations.