# Milestone 1
#### CS207 Final Project
#### Group 1: _Team Gillet_
#### Lucie Gillet, Sakari Jukarainen, Jovin Leong, Huahua Zheng


---

# Introduction

<br>

Derivatives play a critical role in the natural and applied sciences, with optimization being one of the core applications involving derivatives. Traditionally, derivatives have been approached either symbolically or through numerical analysis (_e.g._ finite differences). Although numerical approaches to solving derivatives are simple to compute, they are prone to stability issues and round-off errors. Meanwhile, although symbolic derivatives enable the evaluation of derivatives to machine precision, the process is limited by its computational intensity. Recently, the size and complexity of the functions involving derivatives have grown; these demands necessitate an alternative to symbolic and numerical methods that is able to compute derivatives with higher accuracy at a lower cost. Automatic Differentiation (AD) addresses these issues by executing a sequence of elementary arithmetic operations to compute accurate derivatives. 

<br>

Our team aims to develop a Python package, ```superautodiff```, that implements forward-mode AD. We also extend the package to solve simple optimization problems through gradient descent. This document will review some of the the mathematical foundations behind our approach and provide relevant information on documentation and usage of ```superautodiff```.

---

# Background
<br>

## Mathematical Foundations

AD relies heavily on the chain rule and several other key mathematical concepts in order to compute derivatives. We now consider some background mathematical foundations that form the theoretical basis of our approach to AD.

<br>

**Differential calculus**

Differential calculus is concerned with the evaluation and study of gradients and/or rates of change. Numerically, we can formally define the derivative of a function $f$ evaluated at $a$ as:

$$f'(a)=\lim _{h\to 0}{\frac {f(a+h)-f(a)}{h}}$$.


**Elementary functions and their derivatives**

  Here are some examples of elementary functions used by AD and their corresponding derivatives:

  <br>

  **<center> Table 1. Elementary functions and their derivatives  </center>**
  <br>

| $$f(x)$$     | $$f'(x)$$    | 
| ------------- |:-------------:| 
| $$c$$        |         $0$   | 
| $$x$$        |         $1$   | 
| $$x^n$$      | $$nx^{n-1}$$ | 
| $$\frac{1}{x}$$ | $$-\frac{1}{x^2}$$ |
| $$e^x$$      | $$e^x$$ | 
| $$log_ax$$      | $$\frac{1}{x \ln a}$$ | 
| $$\ln x$$      | $$\frac{1}{x}$$ | 
| $$\sin(x)$$      | $$\cos(x)$$ | 
| $$\cos(x)$$      | $$-\sin(x)$$ | 
| $$\tan(x)$$      | $$\frac{1}{\cos^2x}$$ |<br>


<br>

**Chain rule for composite functions**

  The chain rule is a formula used to compute composite derivatives containing multiple variables. For instance, if we have a variable $z$ depending on $y$, which itself depends on $x$, we can subsequently employ the chain rule to express the derivative of $z$ with respect to $x$ is given by:

<br>

$${\frac  {dz}{dx}}={\frac  {dz}{dy}}\cdot {\frac  {dy}{dx}}$$

<br>

**<center> The chain rule </center>**

<br>

**Forward and reverse mode**

  For functions where we have intermediate components in our derivatives, we can keep track of the derivatives of each component using either of the following two modes: the forward mode and the reverse mode.
  -        The forward mode starts with the input and computes the derivative with respect to the input using the chain rule at each subcomponent. The process involves storing the intermediate values of the derivatives of variables with respect to the input in order to evaluate the overall derivative: <br> <br> 
  $$\frac{dw_i}{dx} = \frac{dw_i}{dw_{i-1}}\frac{dw_{i-1}}{dx}$$<br>
   **<center> Forward mode </center>**  
   
<br>
  
  -        The reverse mode, meanwhile, involves both a forward pass that evaluates the values of the functions along with a backward pass that stores the derivatives of the output with respect to the different variables: <br> <br> $$\frac{dy}{dw_i} = \frac{dy}{dw_{i+1}}\frac{dw_{i+1}}{dw_i}$$ <br>    **<center> Reverse mode </center>**

<br>

**Computational graph representation**

  The elementary operations involved in the forward accumulation involved in the forward mode can be visually represented through a computational graph. For instance, the computational graph of the function $f(x)=x−exp\{−2sin^2(4x)\}^{[1]}$ is illustrated on Figure 1; Figure 2 presents a more complex computational graph.

<br>

  The graph breaks down the given function into a sequence of elementary operations that are visually charted out through the computational graph. The graph operates similarly to a flowchart and illustrates how each elementary operation modifies our initial parameter inputs in order to recover the function.


<br>

<img src="fig/graph_1.png" style="height:300px;">

**<center> Figure 1. A computational graph for $f(x)=x−exp\{−2sin^2(4x)\}^{[1]}$</center>**

<br>

<img src="fig/graph_2.png" style="height:450px;">

**<center> Figure 2. A more complex computational graph</center>**

<br>

[1] D. Sondak, lecture 10, CS207 Fall '19

<br>

## What our package is doing
Essentially, our package utilizes the aforementioned mathematical concepts in order to implement the AD through the forward mode. A primary function in our package, ```autodiff()```, takes in mathematical functions and corresponding points at which to evaluate the mathemetical functions and obtains an evaluative trace (similar to that of the graph structure above). Subsequently, this trace is used to perform differentiation on said mathematical function, using the chain rule to evaluate both the derivatives, the derivative values, and the current values at each component of the trace.

Under the hood, we might perceive of the function's calculations as equivalent to populating the table illustrated in Table 2. This is basically the core of forward-mode AD; the functionality and operation of our package is discussed in greater detail in the subsequent section.

<br>

**<center>Table 2. An evaluation table for a foward-mode neural network</center>**

| Trace | Elementary Function | Current Value | Elementary Function Derivative | $\nabla_{x}$ Value  | $\nabla_{y}$ Value  | 
| :---: | :-----------------: | :-----------: | :----------------------------: | :-----------------: | :-----------------: | 
| $x_{1}$ | $x$ | $x$ | $\dot{ x}_{1}$ | $1$ | $0$ |
| $x_{2}$ | $y$ | $y$ | $\dot{x}_{2}$ | $0$ | $1$ |
| $x_{3}$ | $w_{21}x_1$ | $w_{21}x$ | $w_{21}\dot{x}_{1}$ | $w_{21}$ | $0$ |
| $x_{4}$ | $w_{12}x_2$ | $w_{12}y$ | $w_{12}\dot{x}_{2}$ | $0$ | $w_{12}$ |
| $x_{5}$ | $w_{11}x_1$ | $w_{11}x$ | $w_{11}\dot{x}_{1}$ | $w_{11}$ | $0$ |
| $x_{6}$ | $w_{22}x_2$ | $w_{22}y$ | $w_{22}\dot{x}_{2}$ | $0$ | $w_{22}$ |
| $x_{7}$ | $x_4 + x_5$ | $w_{11}x + w_{12}y$ | $$\dot{x}_{4} + \dot{x}_{5}$$ | $w_{11}$ | $w_{12}$ |
| $x_{8}$ | $x_3 + x_6$ | $w_{21}x + w_{22}y$ | $$\dot{x}_{3} + \dot{x}_{6}$$ | $w_{21}$ | $w_{22}$ |
| $x_{9}$ | $z(x_7)$ | $z(w_{11}x + w_{12}y)$ | $$\dot{x}_{7}z'(x_7)$$ | $w_{11}z'(w_{11}x + w_{12}y)$ | $w_{12}z'(w_{11}x + w_{12}y)$ |
| $x_{10}$ | $z(x_8)$ | $z(w_{21}x + w_{22}y)$ | $$\dot{x}_{8}z'(x_8)$$ | $w_{21}z'(w_{21}x + w_{22}y)$ | $w_{22}z'(w_{21}x + w_{22}y)$ |
| $x_{11}$ | $w_{out,1}x_9$ | $$w_{out,1}z(w_{11}x + w_{12}y) $$  | $$w_{out,1}\dot{x}_9$$ | $w_{out,1}w_{11}z'(w_{11}x + w_{12}y)$ | $w_{out,1}w_{12}z'(w_{11}x + w_{12}y)$ |
| $x_{12}$ | $w_{out,2}x_{10}$ | $$w_{out,2}z(w_{21}x + w_{22}y) $$ | $$w_{out,2}\dot{x}_{10}$$ | $w_{out,2}w_{21}z'(w_{21}x + w_{22}y)$ | $w_{out,2}w_{22}z'(w_{21}x + w_{22}y)$ |
| $x_{13}$ | $x_{11} + x_{12}$ | $$w_{out,1}z(w_{11}x + w_{12}y) + w_{out,2}z(w_{21}x + w_{22}y) $$ | $$\dot{x}_{11} + \dot{x}_{12}$$ | $$w_{out,1}w_{11}z'(w_{11}x + w_{12}y) + w_{out,2}w_{21}z'(w_{21}x + w_{22}y)$$ | $$w_{out,1}w_{12}z'(w_{11}x + w_{12}y) + w_{out,2}w_{22}z'(w_{21}x + w_{22}y)$$ |





---

# How to use ```superautodiff```

## User interaction with the package
### Installation
Our package will be distributed throughy PyPI (which will be detailed in the subsequent section). Users will first install the package by running:

```pip install superautodiff -r requirements.txt```

For more advanced users who think they have the required Python dependencies and do not wish to reinstall said dependencies, the following command should be run instead:

```pip install superautodiff```

Users will then need to import the package as in the above use case and will call ```autodiff()``` to instantiate AD objects. Users will then simply have to instantiate their functions and points in order to perform AD.

### Importing
After installing the package, users need to subsequently import it into their Python environment using the following import command:

```python
import superautodiff
```

Alternatively, it is recommended that users run the following import alias command for concision: 
```python
import superautodiff as sad
```

Dependencies such as ```NumPy``` will be imported with the package. 

## Instantiating AD objects

```superautodiff``` is a Python package and its core class is ```autodiff```. When initialized, ```autodiff()``` accepts an input $x \in \mathbb{R}$ (stored as the ```val``` attribute) and initializes the derivative (```der``` attribute) at $1$. The ```autodiff``` object then supports basic arithmetic operations (_e.g._ addition, multiplication) with integers, floats, and other ```autodiff``` objects. These operations will be implemented commutatively through dunder methods as appropriate. With an ```autodiff``` object, the user can evaluate the derivatives of a vector of functions at a specified vector of points.
<br><br>



# Pull first; TO DO: Use case

## Example equation

## Import

## Do up the actual example and run the functions

## Use case

```python
# Import the package
import superautodiff as sad

# Instantiate AutoDiff object with a scalar point
a = 3.0
f1 = sad.autodiff(a)

print(f1.der["a"])

>>> 1
```
```python
print(f1.val)

>>> 3
```
```python
# Define f2 as 4*f1
f2 = 4*f1

# Returns d/dx 4x
print(f2.der["a"])

>>> 4
```
```python
print(f2.val)

>>>12
```

```python
# Define f3 as (f2)^2
f3 = f2**2

# Returns d/dx (4x)^2 = 32x evaluated at x = a = 3:
print(f3.der["a"])

>>> 96
```
```python
# Returns (4x)^2 evaluated at x = a = 3
print(f3.val)

>>> 144
```

---


# Potentially remove multi and insert text into autodiff

# Try out just using
-pytest

and


-pytest --cov

# Software Organization

## Directory structure



        cs207-FinalProject/
                    LICENSE
                    README.md
                    requirements.txt
                    setup.py
                    setup.cfg
                    travis.yml
                    docs/
                          milestone_1.md
                          milestone_2.md
                          milestone_1.ipynb
                          milestone_2.ipynb
                          fig/
                              graph_1.png
                              graph_2.png
                    superautodiff/
                          __init__.py
                          autodiff.py
                          functions.py
                          optimize.py
                          graddesc.py
                    tests/
                          __init__.py
                          tests_autodiff.py      
                    
                
<br>

## Modules

```superautodiff``` contains four modules corresponding to our package's four main competencies. The modules are summarily described here and explained in detail in the subsequent sections.
- ```autodiff```: This module contains the core functionality of package—a forward mode AD library. 
- ```autodiffmulti```: Contains a user friendly interface for simultaneously evaluating partial derivatives with respect to a vector of input variables for a vector of functions.
- ```optimize```: This module extends the base functionality of our forward mode AD library by providing functions to solve simple constrained and unconstrained optimization problems.
- ```graddesc```: This module provides functions to perform gradient descent and stochastic gradient descent.

<br>

## Testing
Testing is largely relevant to developers looking to edit and/or build upon our package; general users need not read this section. Our test suite, ``` testsuite.py ```, is stored in our ```tests/``` folder; our testing will be largely monitored through both Travis CI and CodeCov. Our GitHub repository will be fully integrated with Travis CI and CodeCov with relevant badges on our ```README.md``` to reflect the build status on Travis CI and the code coverage status on CodeCov. 

```superautodiff``` also supports ```pytest```. To run our tests, users will need to have ```pytest``` installed on their environment and navigate to the repository. Subsequently, users should run the following code:

```python -m pytest
```

This will run all our tests and provide summary statistics on the outcome of said tests.

<br>

## Package Distribution
Our package is distributed using PyPI. We use _setuptools_ and _wheel_ to generate our distribution archives and we use _twine_ to upload our package to PyPI.

The reason for this choice of tools is that they are simple, easy-to-use, and reliable. Our package does not have many complicated dependencies; we, therefore, want to employ simple packaging and distribution tools to ensure that our package is easily distributed to users with minimal hassle.

As mentioned above, users will simply have to call ```pip install superautodiff``` in order to install our package. The installation instructions and troubleshooting will be available on our GitHub repository.

---

# Implementation 


The primary data structures used in ```superautodiff``` are vectors, lists, arrays, and dictionaries. Our package also replies on the external package ```NumPy```.

Our package defines a class ```autodiff``` that takes a variable ```x``` as input. An ```autodiff``` object has two important attributes: 
- ```val``` - a scalar that contains the value of the function 
- ```der``` - a dictionary that stores the derivatives. For example:

 ```{"a":1, "b":1}```

```python
class autodiff:
# initiation
  def __init__(self, x, val, der):
    #  attributes 
    self.val 
    self.der 

# Override Dunder methods e.g.
  def __add__(self, other):
    # To be implemented
  def __radd__(self, other):
    # To be implemented
  
```

```python
# Univariate functions
a = 3.0
f1 = sad.autodiff(a)

print(f1.der)
>>> 1
```
```python
print(f1.val)
>>> 3
```
```python
# Multivariate functions
b = 4.0
f2 = sad.autodiff(b)
f3 = f1+f2

print(f3.der["a"])
print(f3.val)

>>> 1
>>> 7
```

Common operators such as `__add__`, `__radd__`,`__mul__`,`__rmul__`, `__mul__`,`__pow__`,`__rpow__` will be overloaded to process values and derivatives correctly and to return ```autodiff``` objects.

We implement specific trigonometric functions such as ```sin(x)```, ```cos(x)```, ```tan(x)```, ```sec(x)```, ```csc(x)```, ```cot(x)```, ```arcsin(x)```, ```arccos(x)```, ```arctan(x)```, ```arcsec(x)```, ```arccsc(x)```, ```arccot(x)```.

Additionally, we implement logarithmic functions such as ```log(x)``` and ```exp(x)```.

Elementary functions will be defined through ```NumPy``` such that said functions take in ```autodiff``` objects as inputs and returns ```autodiff``` objects with updated values and derivatives. 
Elementary functions will non-exhaustively include those we list in Table 1.


```python
# An example
def cos(x):
    """
    Takes the cosine of original object and returns a new autodiff object
    """
    # Need to init output 
    output.val = np.cos(x.val)
    # For all i
    output.der[i] = -np.sin(x)*x.der[i]     
    return output
```

The module```autodiffmulti``` extends the base functionality of our package such that values and derivatives can be computed at the same time for multiple variables in multiple functions. 

# Fix this based on changes to autodiffmulti

```python
### Draft code 
class autodiffmulti():
  def __init__(self, varibles):

  def fit(self):

  def derivative(self):
    # Jacobian
    nvar = len(p)
    nfun = len(f)
    mat = np.zeros((nfun, nvar))
    mat[x,y] = f[x].eval(self).der[p[y]]
    return mat

  def value(self):
    vec = [0]*nfun
    vec[x] = f[x].eval(self).val
    return vec

# Define list of functions (here two functions)
p = [3,2,1]
f = ["x1*x2", "x1 + x2 + x3"]
diff = autodiffmulti(p) 
diff.fit(f)

print(diff.val)
print(diff.der)

>>> [6,6]
>>> [[2,3,0],[1,1,1]]
```

---

# Additional Features
