<h1 style="padding-top: 25px;padding-bottom: 25px;text-align: left; padding-left: 10px; background-color: #DDDDDD; 
    color: black;"> <img style="float: left; padding-right: 10px; width: 45px" src="https://raw.githubusercontent.com/Harvard-IACS/2018-CS109A/master/content/styles/iacs.png"> CS207 Project Milestone 2  - @Software-Samurais</h1> 
    
### Group 3: 
#### Erick Ruiz, Jingyuan Liu, Kailas Amin, Simon (Xin) Dong



<hr style='height:2px'>

In [2]:
from IPython.display import Latex


# 1 Introduction

The increasing importance of computational models in science and business alongside the slowing pace of advances in computing hardware has increased the need for efficient and accurate evaluations of derivatives. Many important applications such as simulation, opti- mization, and neural networks rely on repeated differentiation of complex functions.

Before the advent of automatic differentiation (AD) the primary method for derivative evaluation was the method of finite differences (FD), where the function to be evaluated is effectively treated as black box oracle.1 As the FD method is effectively sampling, the granularity (i.e. step size) of the algorithm can introduce error effects if it is either too large or too small, but even at the perfect medium, f′(x) evaluations cannot reach machine precision. The alternative approach, fully symbolic differentiation (SD), is cumbersome and inefficient in many cases. In the case of a complex computer program, the size of the symbolic expression could grow to outrageous size and cause significant inefficiency.

The approach of algorithmic differentiation seeks to find the best of both worlds, with machine precision and easy evaluation. This is done by repeated evaluation of the chain rule at a point stored in a table called the computational trace. Thus rather than storing to full symbolic expression, an AD code only needs to apply the chain rule to a specific evalua- tion, representable by a single variable. This approach allows us to achieve the accuracy of symbolic approaches while drastically reducing the cost of evaluation.
Within the umbrella of automatic differentiation, we seek to implement the forward mode which evaluates the intermediate results directly in an inside out manner. Other approaches such as reverse mode also have specific advantages especially in the fields of machine learning and artificial intelligence– or in any context in which the number of inputs dominates the number of outputs.

The method of automatic differentiation, sometimes also referred to as algorithmic differ- entiation, addresses the weaknesses of the finite difference method by providing a systematic way to calculate derivatives numerically to arbitrary precision. The goal of AutoDiff is to implement the forward mode of automatic differentiation, as it is a relevant feature that even some mainstream machine learning libraries, such as PyTorch, lack.

# 2 Background



# 3 How to use

## 3.1 <span id="jump">Installation</span>

The `github` url to the project is https://github.com/Software-Samurais/cs207-FinalProject.
####  - Download the package from GitHub to your folder via these commands (in the terminal)
Assuming that the user already has the latest version of `Python` 3, `Git` and a package manager of choice installed (like `pip`).



```shell
# Clone the repo
git clone https://github.com/Software-Samurais/cs207-FinalProject.git

```

#### - Create a virtual environment and activate it (for Mac and Ubutun)

```shell
# If you don't have virtualenv, install it
sudo easy_install virtualenv
# Create virtual environment
virtualenv env
# Activate your virtual environment
source env/bin/activate
```

#### -  Install the necessary dependencies :

```shell
pip install -r requirements.txt
```

#### - Run module tests (in the root directory)
```shell
# Run module tests if you like
python -m pytest ./test/test_forward.py
```

## 3.2 Basic Demo

In this part, we will give a basic demo for a quick guide to the user. 

Once AutoDiff is installed, the user must import it to be able to use it. The user will have the option to either import the entire library or to choose only a subset of modules, classes, or methods to import.

In this demo, we import the package from autodiff directory and give an alias AD to the package.

In [1]:
# Import package from parent directory 
import sys 
sys.path.append("..") 
import autodiff.AD as AD

### Declare Variables

- Denote constants

In [16]:
# Constant (scalar or vector): User do not need to initialize derivative, it will set to 1.0
x1 = AD.AutoDiff(3.0)
# Each object has two attributes: val and der 
print(x1.val, x1.der)
x1

3.0 1.0


Function value: 3.0, Derivative value: 1.0

- Denote scalar variables

In [5]:
# User can pass value and derivative of a variable
x2 = AD.AutoDiff(2.5, 2.0)
x2

Function value: 2.5, Derivative value: 2.0

### Elementary Operations

We can perform basic elementary operations on `AutoDiff` objects, just as we would with numbers. For example, addition, subtraction, multiplication, division, and power operations are all supported. Simply define an instance of the `AutoDiff` class to get started. 

In [10]:
x = AD.AutoDiff(3.0)
y = AD.AutoDiff(2.0, 0.5)

In [12]:
# Negative
print(-x)
print(-y)

Function value: -3.0, Derivative value: -1.0
Function value: -2.0, Derivative value: -0.5


In [13]:
# Addition
print(x + 3.)
print(x + y)

Function value: 6.0, Derivative value: 1.0
Function value: 5.0, Derivative value: 1.5


In [17]:
# Subtraction
print(x - 1.)
print(x - y)

Function value: 2.0, Derivative value: 1.0
Function value: 1.0, Derivative value: 0.5


In [18]:
# Multiplication
print(3.* x)
print(x * y)

Function value: 9.0, Derivative value: 3.0
Function value: 6.0, Derivative value: 3.5


In [20]:
# Division
print(x / 3.)
print(1 / x)
print(x / y)

Function value: 1.0, Derivative value: 0.3333333333333333
Function value: 0.3333333333333333, Derivative value: -0.1111111111111111
Function value: 1.5, Derivative value: 0.125


In [22]:
# Power
print(x ** 2)
print(y ** 3)

Function value: 9.0, Derivative value: 6.0
Function value: 8.0, Derivative value: 6.0


### Trigonometric Functions, Exponentials, and Logarithms

Other basic elementary operations, such as trigonometric functions, exponentials, and natural logarithms are also supported. Example uses of each of these are shown in the following cells. 

In [28]:
# Sine
print(AD.sin(x))
print(AD.sin(x*y))

Function value: 0.1411200080598672, Derivative value: -0.9899924966004454
Function value: -0.27941549819892586, Derivative value: 3.360596003276281


In [29]:
# Cosine
print(AD.cos(x))
print(AD.cos(x-y))

Function value: -0.9899924966004454, Derivative value: -0.1411200080598672
Function value: 0.5403023058681398, Derivative value: -0.42073549240394825


In [30]:
# Tangent
print(AD.tan(x))
print(AD.tan(x/y))

Function value: -0.1425465430742778, Derivative value: 1.020319516942427
Function value: 14.10141994717172, Derivative value: 24.98125556581156


In [34]:
# Exponential
print(AD.exp(x))
print(AD.exp(y)**3)

Function value: 20.085536923187668, Derivative value: 20.085536923187668
Function value: 403.42879349273517, Derivative value: 605.1431902391028


In [36]:
# Logarithm
print(AD.log(x))

Function value: 1.0986122886681098, Derivative value: 0.3333333333333333


### Customized Functions

Customized functions can be defined using these elementary operations. For example, suppose we wish to work with the function
\begin{equation}
    f(x) = \exp\left(\sin x^2\right) - x^4 + \log(x). 
\end{equation}
We may define $f(x)$ using the standard Python conventions. However, instead of using the `exp` and `sin` methods from Numpy, we must use the `exp` and `sin` methods defined in the `AD` module because only the latter accept instances of the `AutoDiff` class as inputs. 

In [37]:
def f(x):
    return AD.exp(AD.sin(x**2)) - x**4 + AD.log(x)

f(x)

Function value: -78.39137437130643, Derivative value: -115.9215797663472

## 3.3 Application: Newton's Method

Given an equation, $f(x)$, Newton's method allows us to find solutions to $f(x)=0$ iteratively. Starting with an initial guess, $x_0$, we evaluate the function and its derivative at $x_0$. If $|f(x_0)| < \epsilon$, where $\epsilon$ is some tolerance value much less than one, then the iteration stops. Otherwise, the value of the next guess, $x_1$, is obtained as follows.
\begin{equation}
    x_1 = x_0 - \frac{f(x_0)}{f'(x_0)}
\end{equation}
Following the same scheme, the $n$-th iteration can be written as 
\begin{equation}
    x_n = x_{n-1} - \frac{f(x_{n-1})}{f'(x_{n-1})},
\end{equation}
where $x_{n-1}$ is the guess at the previous iteration.

Since Newton's method makes use of the derivative of the function of interest, we may use automatic differentiation to implement Newton's method! Traditionally, we might consider using the finite difference method to approximate $f'(x)$ or define it explicitly in our routine. There are drawbacks to each of these approaches. The finite difference method is easy to implement, but its accuracy is strongly dependent on choosing the right step size. If $f(x)$ is a simple function, then explicitly defining $f'(x)$ is not a problem. However, this is not always possible. With the `AD` module, there is no need to approximate or explicitly define $f'(x)$. As we operate on our initial guess, $x_0$, the value of the derivative is automatically calculated as well, making it easy to implement Newton's method.

In [6]:
def newton(f, x0, tol=1e-16, max_iter=100):
    """Solves f(x) = 0 using Newton's method.
    
    Args:
    =========
    f (function): Function of interest
    x0 (float): Initial guess
    tol (float): Tolerance value
    max_iter (int): Maximum number of iterations
    
    Returns:
    =========
    xn.val (float): Solution to f(x) = 0 if it exists
                    None if xn.der is zero or if the maximum number of 
                    iterations is reached without satisfying the stopping  
                    criteria
    """
    
    # Initial guess
    xn = x0
    
    for n in range(max_iter):
        
        # Calculate f(xn) and f'(xn) using the AutoDiff class
        fn = f(xn)
        
        # Stop iterating if |f(xn)| is less than the tolerance value and return 
        # the solution, xn
        if abs(fn.val) < tol:
            print(f"Found a solution after {n} iterations.")
            return xn.val
        
        # Check if the derivative is zero
        if fn.der == 0:
            raise ValueError("Encountered zero derivative. No solution.")
            
        # Update guess
        xn = xn - fn.val/fn.der
        
    # Stop iterating if no solution is found within the allowed number of 
    # iterations
    print("Exceeded maximum number of iterations.")
    return None

In [7]:
# Function of interest
def f(x):
    return x**2-x-1

# Initial guess   
x0 = AD.AutoDiff(1.0)

print(f"Solution: {newton(f, x0)}")

Found a solution after 6 iterations.
Solution: 1.618033988749895


# 4 Software Organization

## 4.1 Directory Structure
The package's directory will be structured as follows:
``` py
Ccs207-FinalProject
	__init__.py                                         #Initialization
    /autodiff                                           #Back-end source code
	    /config                                         #Configuration for the project
	    AD.py
    /gui                                                #Front-end source code
        /dist                                           #Static css, js etc.
        /template                                       #Web html files
        /img                                            #Images used for font-end
    /utils                                              #Preprocessing scripts
	/test                                               #Test cases
		test_vector.py
		test_forward.py
        test_reverse.py
	/docs                                               #Documentation and records
		milestone1.pdf
        milestone2.ipynb
	/demo
    requirements.txt                                    #Packages on which the program depends
    README.md                                           #Introduction for the project
```
##  4.2 Basic Modules and functionality

So far, we have simple forward mode for auto-differentiation in `AD` module.

- `AD`: This module contains our custom library for autodifferentiation. 
    - It includes functionality for a AutoDiff class that contains values and derivatives. In the class, we override the operator like `__repr__`, `__neg__`, `__add__`, `__radd__`, `__sub__`, `__rsub__`, `__mul__`, `__rmul__`, `__truediv__`, `__rtruediv__`, `__pow__`.
    - In addition, we define class-specific functions e.g., `sine`, `cosine`, `tangent`, `power`, `exponentiation`, `logarithm`. Thus the user could use our defined math function easily (as we use numpy).

We plan to include reverse mode and other modules for application in the later work:

- `optimize`: This module is designed to perform optimization. Users can define a custom function to optimize. For example, the first n−1 columns denote the features of the data, and the final column represents the labels. The user could specify the function to optimize as “mse”. Then, the function will find a local minimum of the mean squared error objective function. Finally, the module allows for static and animated plots for visualization.

- `rootfinding`: This module is designed to find roots of a given function. It includes Newton’s method. It also allows the user to visualize static or animated results for visualization.

- `utils`: This module is designed for parsing the input, preprocessing and start main program.
    
## 4.3 Test Suite
Coding is the fundamental part of software development. Equally significant is build and testing. 
- We would utilize Travis CI and CodeCov to make the development process more reliable and professional. 

     `Travis CI` is used as a distributed CI (Continuous Integration) tools to build and automate test the project.
     
     `CodeCov` is used for test results analysis (eg. measuring test code coverage) and visu- alization.

We have already set up these integrations, with badges included in the README.md, showing that build passed 100%, and coverage 100% .
- Users can run the test suite by running in the root folder:
```shell
python -m pytest ./test/test_forward.py
```
- All test files will be placed in the test folder.

    `test_forward`: It includes tests for scalar functions to ensure that the AD module properly calculates values of scalar functions and gradients with respect to scalar inputs.
    
    we plan to add more test files in future work:

    `test_rootfinding`: This is a test suite for rootfinding.
    
    `test_optimize`: This is a test suite for optimization.



## 4.4 Software Package and Distribution:
### Package distribution
We will package our software using `PyPI` (Python Package Index) for release. Write and run ’setup.py’ to package the software and upload it to the distribution server, thus people in community could easily download our package by ’pip install’.
### Version Control
We will take Version Control into consideration according to the standard in Python Enhancement Proposal (PEP) 386. With version control, we can tell the user what changes we made and set clear boundaries for where those changes occurred.
### Framework
- For web development, we would use `Flask`, a micro web framework, which is suitable for a small team to complete the implementation of a feature-rich small website and easily add customized functions.
- For GUI (Graphical User Interface), we may choose `Vue.js`, a JavaScript frame- work for building user interfaces and single-page applications. Because it offers many API (Application Program Interface) to integrate with existing projects and is easy to get started. It is better in code reuse compared to frameworks like `jQuery`.

## 4.5 How to install the package

At this point, our package isn't on `PyPI` (will distribute later). User can download and install our package manually as mentioned in [3.1 Installation](#jump).



# 5 Implementation

## Forward Mode

### Core Data Structures
- `AutoDiff`: For the scalar implementation of forward mode automatic differentiation, `AutoDiff` objects are the core data structures. When an instance of the `AutoDiff` class is defined, the user is able to easily store,and update function and derivative values. These function and derivative values are attributes of the `AutoDiff` class.

### Core Classes
- `AutoDiff`: The core class for the scalar implementation of forward mode automatic differentiation is the `AutoDiff` class. Defining an instance of the class allows the user to store and update function and derivative values. 

### Important Attributes
**Note:** The following two attributes are considered private. The user should not access these directly.
- `_val`: Stores the function value
- `_der`: Stores the derivative value; defaults to `1.0` if the user does not pass in a second argument to `AutoDiff` when defining a new instance

There exist getter and setter methods to access current values and set new values for `_val` and `_der`.

### External Dependencies
- Numpy is used for mathematical calculations. 

### Elementary Functions
The following elementary functions are supported: sine, cosine, tangent, exponential, and logarithm. Recall from elementary calculus that the derivatives of these functions are the following.
\begin{align}
    \frac{d}{dx} \sin x &= \cos x\\
    \frac{d}{dx} \cos x &= -\sin x\\
    \frac{d}{dx} \tan x &= \sec^2 x = \frac{1}{\cos^2 x}, \quad x \neq \frac{\pi}{2}\\
    \frac{d}{dx} \exp x &= \exp x\\
    \frac{d}{dx} \ln x &= \frac{1}{x}, \quad x > 0
\end{align}
Note that $\tan x$ and $\ln x$ will raise a `ValueError` if the user passes in a function value that is outside the valid domain.

The methods that execute these elementary functions are defined in the `AD` module and work as follows. Each method takes in a single argument that is an instance of the `AutoDiff` class and returns a new instance of the `AutoDiff` class with the appropriate function and derivative values. To avoid rounding error, the `check_tol` method in the `AD` module compares the calculated values with their rounded counterparts. For example, suppose we wish to calculate $\tan x$ and its derivative at $x=\pi/4$. The `check_tol` method ensures that `AD.tan(AD.AutoDiff(np.pi/4))` returns `Function value: 1.0, Derivative value: 2.0` rather than `Function value: 1.0, Derivative value: 1.999999...`.

## 5.2 Planning work

### Not implemented - Reverse mode

For the reverse mode, we need to build the computational graph and figure out the sequence of computation correctly to get the right gradient.

Basically, there are two strategies of building the computational graph, static graph and dynamic graph. The former computes value and gradient after finishing graph and the latter do the computation dynamically. Although dynamic is easier to debug and more user-friendly, static graph is more clear to understand reverse mode and more suitable for this course. So we choose static graph based reverse mode as our main framework.

A computational graph consists of multiple nodes. Each node has two main attributes, self.inputs and self.op which indicate inputs of current node and the operation between inputs. It is worth mentioning that nodes are created by their own operation.

Several operations are implemented, such as add, add_by_const, mul, mul_by_const, tan, sin and so on. To better support higher order gradient computation, gradients of leaf and intermediate variable are also represented by nodes. As a result, our computational graph not only has nodes representing values of variable during forward propagation, but also has nodes representing gradients of variable during back propagation. If users want to compute higher order gradient, they can easily add new nodes representing gradient of nodes of low order gradient. With this mechanism, we can compute any order gradient.

How to find the path of creating nodes of gradients in the graph is also quite important to get correct results efficiently. Given a list of nodes, we use a post-order Depth First Search (DFS) to get a topological sort list of nodes ending in our target nodes. The reverse of this list is the right sequence to add nodes of gradients.

### Plan on implementing - Vectorizaton, Optimization, Root Finding etc.
More details could be found at [6 Future Features](#6).


# 6 <span id='6'>Future Features </span>

These are some proposals for future features in the AD package. Some of these will be implemented in the final version!
### Vectorizaton
In this current release, our automatic differentiation package can only handle scalar to scalar functions. In the future, we will extend this work to include the more general category of scalar to scalar functions. This extension will primarily preserve the structure of the code but will result in rewriting the main functions to be able to deal with vector inputs.

### Intermediate Results
As written, the code will only output the final result of any multistage calculation. In the final version, one possible feature would be to store/output intermediate results to test debugging and/or numerical stability. In order to implement this, we would likely need to change many of the basic functions or set up a higher level function or decorator which saves the input of a function.

### Optimization
We could write an application to solve (potentially constrained) continuous optimization problems. In contrast with the above two extensions, This method would most probably result in a largely seperate codebase which builds on and calls the top level code from our AD library.

### Basic Neural Networks
In a similar vein to optimization, we could implement basic neural networks. This approach may need to be coupled with reverse mode as both high dimensional optimization and neural networks are much faster when implemented using reverse mode.

### Root Finding
We could write an application to find the roots of continuous functions. This method would most probably result in a largely seperate codebase which builds on and calls the top level code from our AD library.

### Reverse Mode
Another options is to implement the reverse mode. Although not fundementally increasing the capacity of the code, implementation of a reverse mode would allow the user to effectively differentiate functions with a large number of inputs. For applications such as NNs, root finding, and optimization, the joint implementation of reverse mode could yield a huge speed up.

### Non Differentiable Functions
One final potential option is support for non-differentible functions, especially loops and if statements. In order to implement this, we would likely simply add to the library of elementary functions and return a null for non differentibilities. This approach would likely be somewhat complicated as simple operator overloading would not suffice. In this case, we would need to implement some sort of parser to transfer the code into piecewise functions.