# Milestone 2

**Due: Thursday, Nov 8th at 11:59 PM**

## Introduction

Automatic differentiation (AD) is implemented in this software. More specifically, it can automatically differentiate of a python function up to machine precision and it can take derivatives of derivatives. Obtaining derivatives accurately is important because it is the key part of gradient-based optimization, which is the foundation of many machine learning algorithms. As a matter of fact, most machine learning problems can be divided into following steps:

1. define a function connecting some input $X$ with some output $Y$ with a set of parameters $\beta$ as $Y = f(X,\beta)$;
2. define a loss function to check how good the model is $L(X,Y,\beta)$;
3. find the parameter set $\beta$ that minimize the loss function: argmin$_\beta L(X,Y,\beta)$.

Generally the power or performance of a machine learning algorithm is limited by the third step which is can be handled by gradient-based optimization. Therefore, our software can be used in various machine learning packages and boost their performances.

Furthermore, AD can also be applied to solve differential equations in various physical systems. Such as, diffusion equations, wave equations, Navier–Stokes equations and other non-linear equations which cannot be solved analytically. Traditional numerical method using difference method possess error much larger than machine error. Therefore, applying AD will possibly increase the accuracy of the solvers of those differential equations. 

## How to Use *AutoDiff*
### Installation

### Introduction to basic usage of the package

After successful installation, the user will first import our package.
```python
import autodiff as ad
```
Then depending on the type of expressions they have, they will employ one of the following methods.

#### Scalar functions of scalar values
Say the user wants to get the gradient of the expression $f(x) = alpha * x + 3$.
The user will first create a variable x and then define the symbolic expression for `f`.
```python
a = 2.0
x = ad.Variable(a, name='x')
f = 2 * x + 3
```
Note: If the user wants to include special functions like sin and exp, they need to do the following:
```python
f = 2 * ad.Sin(x) + 3
```
Then when they want to evaluate the gradients of f with respect to x, they will do
```python
print(f.val, f.der)
```
f.val and f.der will then contain the value and gradient of f with respect to x.

#### Scalar functions of vectors
Say the user wants to get the gradient of the expression $f(x1,x2) = x_1 x_2 + x_1$. 

The user will first create two variables `x1` and `x2` and then define the symbolic expression for `f`.
```python
a1 = 2.0
a2 = 3.0
x1 = ad.Variable(a1,name='x1')
x2 = ad.Variable(a2,name='x2')
f = x1 * x2 + x_1
```
Then when they want to get the values and gradients of f with respect to x1 and x2, they will do
```python
print(f.val, f.der)
```
f.val and f.der will then contain dictionaries of values and gradients of f with respect to x1 and x2.

#### Vector functions of vectors
Say the user wants to get the gradients of the system of functions 
$$f_1 = x_1 x_2 + x_1$$
$$f_2 = \frac{x_1}{x_2}$$

i.e.
$$\mathbf{f}(x1,x2)=(f_1(x_1,x_2),f_2(x_1,x_2))$$
The user will first create two variables `x1` and `x2` and then define the symbolic expression for `f`.
```python
x1 = ad.Variable(name = 'x1')
x2 = ad.Variable(name = 'x2')
f1 = x1 * x2 + x_1
f2 = x1 / x2
```
Then when they want to evaluate the gradients of f with respect to x1 and x2, they will do
```python
print(f1.val, f2.val, f1.der, f2.der)
```
The Jacobian $\mathbf{J}(\mathbf{f})$ =(f1', f2') = (f1.der, f2.der)


### Demo

In [None]:
# import AutoDiff

In [None]:
# define initial variables

In [None]:
# calculate the values and derivatives

In [None]:
# print the values and derivatives

## Background

*Describe (briefly) the mathematical background and concepts as you see fit.  You **do not** need to
give a treatise on automatic differentiation or dual numbers.  Just give the essential ideas (e.g.
the chain rule, the graph structure of calculations, elementary functions, etc).*



#### What is AD?

AD is a set of techniques to numerically evaluate the derivative of a function specified by a computer program based on the fact that every computer program execute a sequence of elementary arithmetic operations and elementary functions. Using the chain rule, the derivative of each sub-expression can be calculated recursively to obtain the final derivatives. Depending on the sequence of calculating those sub-expressions, there are two major method of doing AD: **forward accumulation** and **reverse accumulation**. 

#### Why AD?

Traditionally, there are two ways of doing differentiation, i.e., symbolic differentiation (SD) and numerical differentiation (ND). SD gives exact expression of the derivatives and produce differentiation up to machine precision, while SD is very inefficient since the expression could become very during differentiation. ND on the other hand, suffers from round-off errors (or truncate error), which leads to bad precision. Moreover, both ND and SD have problems with calculating higher derivatives and they are slow for vector inputs with large size. AD solves all of these problems nicely.

#### How to do AD?

Considering a simple function:
$$z = \cos(x)\sin(y) + \frac{x}{y}$$
In AD, its computational graph for forward accumulation method looks like:
<img src="figs/Fig1.png" width="400">
Accoring to the graph, the simple function can be rewritten as

\begin{align}
z = \cos(x)\sin(y) + \frac{x}{y}=\cos(w_1) \sin(w_2) + \frac{w_1}{w_2}=w_3 w_4+w_6=w_5 + w_6=w_7
\end{align}
The derivates with respect to $x$ and $y$ can be calcualted according to chain rule as:

\begin{align}
\frac{\partial z}{\partial x}&=\frac{\partial z}{\partial w_7}\left(\frac{\partial w_7}{\partial w_5}\frac{\partial w_5}{\partial w_3}\frac{\partial w_3}{\partial w_1}+\frac{\partial w_7}{\partial w_6}\frac{\partial w_6}{\partial w_1}\right)\frac{\partial w_1}{\partial x}\\
\frac{\partial z}{\partial y}&=\frac{\partial z}{\partial w_7}\left(\frac{\partial w_7}{\partial w_5}\frac{\partial w_5}{\partial w_4}\frac{\partial w_4}{\partial w_2}+\frac{\partial w_7}{\partial w_6}\frac{\partial w_6}{\partial w_2}\right)\frac{\partial w_2}{\partial y}\\
\end{align}

Therefore $\frac{\partial z}{\partial x}$ and $\frac{\partial z}{\partial y}$ are just the combinations of derivatives of elementary functions, which can be calculated analytically. In forward accumulation, the chain rule are applied from inside to outside. Computationally, the values of $w_i$ and their derivatives are store along the chain accumulatively.

## Software Organization

#### Directory structure 
```
/cs207-FinalProject
    /docs
        milestone1.ipynb
        milestone2.ipynb
    /AutoDiff
        __init__.py
        AutoDiff.py
    /tests
        __init__.py
        test_operator.py
    README.md
    requirements.txt
    LICENSE.md
```
#### Modules

- `__init__.py`:  initialize the package by importing necessary functions from other modules

- `AutoDiff.py`:  main module of the package which implements basic data structure and algorithms of the forward automatic differentiation, including overloaded operators and special functions such as sin and trig.

#### Test

The test suite will live on a test_operator.py file in tests folder. We automate our testing using continuous integration. Every time we commit and push to GitHub, our code is automatically tested by `Travis CI` and `Coveralls` for code coverage. 

#### Package Installation

Eventually we will use PyPI to distribute our package. At this point, the user needs to download and manually install the package as following.

## Implementation

### Current Implementation
#### Data structures
*What are the core data structures?*

* dictionary: we use dictionaries to keep track of the partial derivatives. The keys are the variables we differentiate with respect to and the values are the actual derivatives.
* overloaded operators such as \__add\__ and \__mul\__ to add or multiply two auto-differentiation objects.

#### Classes
*What are the core classes?*

* class Variable - an auto-differentiation class with the overloaded operators 
    * attributes
        * val: scalar value of current node
        * name: name of variable
        * der: dict of partial derivatives of current node
    * Methods
        * \__pos\__
        * \__neg\__
        * \__add\__
        * \__radd\__
        * \__sub\__
        * \__rsub\__
        * \__mul\__
        * \__rmul\__
        * \__itruediv\__
        * \__rtruediv\__
        * \__pow\__
        * \__rpow\__


* method exp()
    * input 
        * Variable object
    * output 
        * Variable object after taking exponential


* method log()
    * input 
        * Variable object
    * output 
        * Variable object after taking log


* method sin()
    * input 
        * Variable object
    * output 
        * Variable object after taking sine


* method cos()
    * input 
        * Variable
    * output 
        * Variable object after taking cosine


* method tan()
    * input 
        * Variable
    * output 
        * Variable object after taking tangent
        
* method sinh()
    * input 
        * Variable
    * output 
        * Variable object after taking sinh
      
      
* method cosh()
    * input 
        * Variable
    * output 
        * Variable object after taking cosh


* method tanh()
    * input 
        * Variable
    * output 
        * Variable object after taking tanh
        
        
* method arcsin()
    * input 
        * Variable
    * output 
        * Variable object after taking arcsin


* method arccos()
    * input 
        * Variable
    * output 
        * Variable object after taking arccos


* method arctan()
    * input 
        * Variable
    * output 
        * Variable object after taking arctan

#### External dependecies

* NumPy
* Math

#### Elementary functions

Our elementary functions include the following: 
* exp
* log
* sin
* cos
* tan
* sinh
* cosh
* tanh
* arcsin
* arccos
* arctan

### Future Implementations

### Future Steps