# Milestone 1
## CS207 Final Project, Group 28
#### Team Members:
Josh Bodner  
Theo Guenais  
Daiki Ina  
Junzhi Gong

# Introduction

Ever since the notion of a derivative was first defined in the days of Newton and Leibniz, differentiation has become a crucial component across quantitative analyses. Today, differentiation is applied within a wide range of scientific disciplines from cell biology to electrical engineering and astrophysics. Differentiation is also central to the domain of optimization where it is applied to find maximum and minimum values of functions.

Because of the widespread use and applicability of differentiation, it would be highly beneficial if scientists could efficiently calculate derivatives of functions using a Python package. Such a package could save scientists time and energy from having to compute derivatives symbolically, which can be especially complicated if the function of interest has vector inputs and outputs.

Our Python package, autodiff, will address this need by allowing the user to implement both the forward and reverse modes of automatic differentiation (AD). Using AD, we are able to calculate derivatives to machine precision in a manner that is less costly than symbolic differentiation.

# Background
In this section we provide a brief overview of the mathematical concepts relevant to our implementation:

##### 1. Multivariable Functions:

A multivariable function $f(x)$ can have m inputs and n outputs. These inputs and outputs are vectors. From this lense, f(x) can be understood as mapping from $\mathbb{R^m}$ to $\mathbb{R^n}$. In other words $f(x): \mathbb{R}^{m} \mapsto \mathbb{R}^{n}$  

*Example:*  
$f(x_1, x_2) = 
  \begin{bmatrix}
    sin(x_1)x_2^2
  \end{bmatrix}$

Here we have...  
$x = 
  \begin{bmatrix}
    x_1 \\
    x_2
  \end{bmatrix} \in\mathbb{R}^{2}$  
    
$f(x_1, x_2) \in\mathbb{R}^{1}$

##### 2. Gradient:

The gradient of a scalar-valued multivariable function is a vector of its partial derivatives with respect to each input:

$\nabla f = 
  \begin{bmatrix}
    \frac{\partial f}{\partial x_1} \\
    \frac{\partial f}{\partial x_2} \\
    \frac{\partial f}{\partial x_3} \\
    \vdots
  \end{bmatrix}$


##### 3. Chain Rule
*FILL IN*

##### 4. Graph Structure
*FILL IN*

##### 5. Elementary Functions
*FILL IN*

# How to Use autodif

The user could import our package as follows:

In [None]:
import autodiff as AD

Next we have two ideas of how the user might go about using our package.

We will be working out which option makes more sense as we begin the process of development...

In [None]:
# --- Option 1 ---

# The user could instantiate a function by typing out the expression
# of that function. They could then call get_der to get the derivative.

function = AD.ADFunction(“4 * x + sqrt(x) + x * y”)
function.get_der(target='x', x=2, y=3)
>>> 7.3535533

In [1]:
# --- Option 2 ---

# The user could instantiate autodiff variables and create functions by
# performing operations on those autodiff variables.
# They could then call get_der to get the derivative.

x = AD.ADVariable(length of input vector)
y = AD.ADVariable(length of input vector)

function = 4 * x + sqrt(x) + x * y

function.get_der(target=x, x=2, y=3)
>>> 7.3535533

SyntaxError: invalid syntax (<ipython-input-1-853be072e192>, line 7)

# Software Organization

##### Directory Structure:

We will have our main implementation stored in the autodiff directory. This will have a subdirectory for our implementation of forward mode, which in turn will have a subdirectory containing tests.

```
cs207-FinalProject/
    docs/
        milestone1.ipynb
    autodiff/
        ForwardMode/
            __init__.py
            forwardmode.py
            Tests/
                __init__.py
                test_forwardmode.py
    README.md
    Requirements.txt
    
```
##### Modules to Include:
We will include a forwardmode module containing our basic functionality. We may end up breaking this out into separate modules for better readability, including an adfunctions module and an advariables module.

We may also end up implementing the reverse mode, in which case this would also be stored as a separate module within a directory structure matching that of ForwardMode above.

##### Testing:

Our test suite will live within our ForwardMode directory (see above). Additionally for continuous integration and code coverage we will be utilizing both TravisCI and CodeCov. We have already set up basic functionality for TravisCI and CodeCov for our repository.

##### Distribution and Packaging:

We will distribute our package using PyPI. We will use Twine to upload our distributions to PyPI.

The user can then install our package via a pip install such as:
"pip install autodiff"
 
##### Other Considerations:

We may choose to build a GUI in order to make our package more accesible to end users with limited Python coding abilities.

# Implementation

##### Core Data Structures:

We will have a DAG data structure for ADfunctions. Each DAG node will have a list of values and derivatives, possibly along with some other metadata. 

##### Classes:

The classes we will implement include ADFunction (possibly both a forward and reverse mode version), ADVariable, and possibly some other subclasses of ADFunction such as elementary functions.

##### Methods and name attributes:

ADFunction will have the methods get_val() and get_der() to get its value and derivative. 


* FILL IN MORE *

##### External dependencies:
Required:
- Numpy (for vector operations)

Additional possibilities:  
- Math (for scientific math functions)
- Matplotlib (for graphing if we end up building a GUI)

##### Elementary functions:

We will likely implement the elementary functions such as sin, sqrt, log, exp, etc. as subclasses of ADFunction. Each would then have two major methods: get_val() and get_der() to calculate their values and derivatives.

##### Components that must be taken into account for implementation:
There are two components, concepts of "Variable" and function, to achieve information flow 
1. Concepts of “Variable” 
    - The variable (which could be vectors) carries the information flow (cf the old PyTorch implementation and idea of computational graph). The variable would have value and gradients attribute (x.val and x.grad) that would carry the information for the computational graph.
    

2. Idea of function
    - How to make the information flow using "Variable" ? Function is used for this purpose. We feed function a variable and it returns another variable. If x is a variable with x.value and x.gradient attributes: let y = exp(x) or maybe something less trivial, i.e y = AD.fn.tan(x). The functional class would allow us to break any function into computational units. A function has two main methods, AD.fn.get_grad() and AD.fn.get_val(). The get_grad would actually implement the chain rule. 

“Functional component”-write our own implementation of all the elementary functions.
Below is pseudocode to show how a function brings information flow using a variable

In [None]:
import autodiff as AD

x = AD.Variable(0)
y = AD.FN.exp(x)

# Internally 	
    y=Variable() 
    y.val = fn.exp.get_val(x.val) 
    y.grad = fn.exp._derivative_fn(x.val) * x.grad #implement the chain rule. 

# Another idea
	Have the chain rule implemented for the entire class (class methods) 

# Class fn
	Def (self, input_variable):

Output = Variable()
Output.val = self.eval(x.val)
Output.grad = self.grad_fn(x.val) * x.grad #0r np.dot(..) for multidimensional arrays and so on 
And then all the elementary functions would just have fn.exp.eval, and fn.exp.grad_fn 
YES: IMPLEMENT THE CHAIN RULE AS A BASE CLASS METHOD. 