# Documentation -- Milestone 2 

## Introduction

Differentiation is great. It is a necessity in a vast range of applications, such as atomic simulations, economic analysis, and machine learning. 

There are three ways of differentiation: numerical, sybolic and automatic. 

Numerical differentiation finds the derivative using finite difference approximations $\Delta f / \Delta x$. Even with higher-order methods, its error is far greater than machine precesion.

Symbolic differentiation finds the symbolic expression of the derivative. In the field of computational science, when functions and programs get complicated, it becomes inefficient and messy. This is called expression swell.

Automatic differentiation can find the derivative of expressions to the accuracy of machine precision. It does not have the problem of expression swell because it deals with numbers. That is why we need automatic differntiation!

Our `superdiff` package performs automatic differentiation on single- or multi-variable functions using the _forward mode_ as well as the _reverse mode_. The function is stored as an `Expression` object that can output values and derivatives at any given point that is allowed.


## Background

Differentiation is the process of finding derivative, which is the rate of change of a function's output with regard to its variables. Take $f(x,y) =3*x^2+\exp(y)$ as an example. Symbolic differentiation gives $\dfrac{\partial f}{\partial x}=6x$ and $\dfrac{\partial f}{\partial y}=\exp(y)$.

Automatic differentiation treats a function as a chain of elementary functions and performs differentiation on each elementry function. 
Here the elementary functions include: (1) A single arithmetic operation, such as $3*x$ and $x_1+x_2$. (2) A single trigonometric operation, such as $\sin(x)$. (3) A single exponential or logarithmic opration, such as $\log(x)$.

The chain rule dictates that 

$$\frac{df(g(x))}{dx}=\frac{df(x_1)}{dx_1}*\frac{dg(x)}{dx}.$$

Therefore, a function that is made up of elementary functions can be extended into a computational graph. For $f(x,y) =3*x^2+\exp(y)$, the graph is shown below. Each $x_i$ is the output of an elementary function.

<img src="https://i.imgur.com/hBQvv4n.jpg" alt="drawing" width="600"/>
  
To calulate the derivative of $f$ at $[x,\ y]$, we pass the value of the previous $x_i$ and $x_i^\prime$ into the next elementary function to evaluate the derivative of that elementary function. Below shows the forward mode automatic differentiation table (traceplot). 

<img src="https://i.imgur.com/1AIngxT.png" alt="drawing" width="600"/>

The derivative is computed using the chain rule. To get $\dfrac{\partial f}{\partial x}$, forward mode starts from $\dfrac{\partial x_1}{\partial x}$, while the reverse mode starts from $\dfrac{\partial x_6}{\partial x_4}$. The result is 

$$\dfrac{\partial f}{\partial x} = \dfrac{\partial x_6}{\partial x_4}\dfrac{\partial x_4}{\partial x_3}\dfrac{\partial x_3}{\partial x_1}\dfrac{\partial x_1}{\partial x}=1*3*2x*1=6x.$$

## How to use superdiff

Our goal is for the syntax of `superdiff` to be as natural as possible, not requiring the user to learn any new paradigms and thereby minimizing the chances of hard-to-debug errors. Therefore, we take inspiration from the kind of notation one might use when writing out mathematical expressions and functions by hand. 

The core functionality of `superdiff` involves three main kinds of objects: `Variable`, `Expression`, and subclasses of `Operation`. These mean exactly what you might expect from a mathematical context. If a user wants to define an expression, they first define one or more `Variable`s. The they make an `Expression` using basic math operators such as `+ - * / ` or special operators such as `superdiff.log`. The expression can be evaluated and differntiated at any given point.

### 1. How to install `superdiff`

```python
pip intall superdiff
```

### 2. Demo

In [1]:
import superdiff as sd

"""Define the base variables"""

x = sd.Var('x')

"""This creates an `Expression` that is mathematically equivalent to the one above. 
As well, the user can evaluate this `Expression` just as they might evaluate a function"""
f = sd.make_expression((x * 0.2 + sd.log(x) ** 3) / y, vars = [x])


ModuleNotFoundError: No module named 'superdiff'

## Software Organization
### 1. Directory structure
```
cs207-FinalProject/
|
|-- superdiff/
|   |-- __init__.py
|   |-- superdiff.py
|   |-- expression.py
|   |-- operations.py
|
|-- tests/
|   |-- conftest.py
|   |-- operations_test.py
|   |-- test_dummy.py
|   |-- test_expression.py
|
|-- docs/
|   |-- milestone1.md
|   |-- milestone2.ipynb
|
|-- README.md
|-- requirements.txt
|-- setup.py
|-- LICENSE
```

`superdiff/` subdirectory hosts our code.

`tests/` subdirectory hosts tests to the code.

`docs/` subdirectory hosts our milestone documents. The documents provides an introduction to automatic differntiation, as well as a guide to using our package.

### 2. Basic modules and their functionality

Our modules are `__init__.py`, `superdiff.py`, `expression.py`, and `operations.py`. 

1. `__init__.py` is

2. `superdiff.py` imports everthing from `expression.py` and `operations.py`. Additionally, it contains functions that allow users to build experssions and call all our elementary functions such as `log` and `sin`.

3. `expression.py` contains the code for our `Variable` and `Expression` classes. These objects build the computational graph in a tree-like structure. Users can call the `eval()` and `deriv()` methods of an expression to get values and derivatives. 

4. `operations.py` contains elementary function classes. Each elementary function have methods to build new expressions, to evaluate at given points, and to compute the derivative at given points.

### 3. Where do tests live? How are they run? How are they integrated?

As shown in the directory structure, our test suite lives inside the `tests/` subdirectory. 

They include many unit tests to ensure the differentiator modules run correctly and handle edge cases appropriately for a variety of complex functions. They will also more closely test many of the operations abstracted away under the hood to ensure quality is maintained beneath all the layers. Additionally, we will also apply stress tests to see how our package handles overloading. `pytest` will be key in handling this test suite efficiently.

We will utilize TravisCI to perform integration testing as we build the package, which will help us flag defects as they arise and maintain quality control among the various components in our software. In addition, CodeCov will help us analyze ways to improve our test suite so that we maintain high coverage of our code.

### 4. How to install our package

For now, please refer to the 'how to use' section for installation guide. 

In the next step, we will use PyPI to distribute our package and use `pip` to package the software. In `setup.py`, we will include a brief description of the package, authors and licensing information, and the appropriate version of the package so that we can host our package on PyPI. This will allow potential users to easily install our product through the `pip` command.


## Implementation
### 1. Core data structures

The function to be differentiated (henceforth referred to as ƒ) will be parsed and converted into a directed graph, containing node-like objects for each step in the traceplot (i.e. each node represents an elementary operation in ƒ). The edges of the graph represent steps from one part of the traceplot to the next. This is similar to the tree structure that we learned in class, but it is built from the leaf nodes to the root.

Each node is an `Expression` object that contains
- The type of elementary operation being performed. It could be add, mul, pow, log, etc. 
- References to the mathematical objects ('parents') that go into this operation. A binary operation has two parents, while a unary operation has only one parent. The parent could be an `Expression` object, a `Variable` object, or just a number.
- Notice that a node does not reference the next operation to be performed. 

Say we have the following situation:

![](https://i.imgur.com/p2gMe9B.png)


In this case, C knows about A and B, but not about F. This may seem counterintuitive, since in the forward mode of autodiff, we need to go from A to C to F. However, we want to allow for situations like the following:

![](https://i.imgur.com/eWljQhb.png)


Here, the function `f3` is composed of two inputs (`f1` and `f2`) and combines them in an operation in node G. Rather than copying functions `f1` and `f2` into brand new graphs, we believe that it would be more memory- and time-efficient to simply refer to the same graph objects that `f1` and `f2` refer to. However, this creates a potential issue if we were to implement the graphs as bidirectional, rather than unidirectional: if we add a connection from C to G, then if the user tries to run the forward mode of autodiff on `f1`, the algorithm will continue past node C onto node G. However, node G is not part of function `f1`!

This design choice has a tradeoff, namely that each time forward mode auto differentiation is performed, Python must step from the end of the function all the way to the beginning leaf nodes. This is done by recursively calling the `eval()` and `deriv()` methods.

### 2. Core classes

<img src="https://i.imgur.com/ST3mu2D.png" alt="drawing" width="800"/>


`Var` class

- Is the variable in the eventual function
- Can be combined with other variables or scalars to create more complex expressions. Overloads math operators such as `__add__` and `__pow__`
- Does not store values. When `eval` method is called, it returns whatever value that is being passe into it
- When `deriv` method is called, if we are differentiating with respect to this particular variable, it returns 1. Otherwise, it returns 0

`Expression` class

- Subclass of `Var`
- Stores `self.parents`. The parents can be an `Expression`, a `Var`, or a number
- Stores the name of the operation that is being performed
- Stores the list of variables but do not store their values. When `eval` or `deriv` is called, our helper method smartly passes the values to the parent expressions
- `eval` and `deriv` use helper methods to recursively find the value and derivative at any given point. Currently `deriv` only works with the forward mode. In the forward mode, differntiation starts from the leaf nodes.

`BaseOperation` and its subclasses

- Is divided into unary and binary operations classes
- Contains `expr` method to make new nodes 
- Arithmetic operations and functions such as $\sin$ and $\exp$



### 3. Important attributes

`Var` class

- `curr_val()`
- `__repr__()`
- `__str__()`
- `__eq__()`

`Expression` class

- All methods defined in `Variable`
- Stores a list of parent nodes (typically one or two parent nodes)
- An `Expression` knows which parameters go with each parent node
    - E.g. say an `Expression`, `expr1`, has two parent nodes `par1` and `par2`. `expr1` is a function of three parameters: $x$, $y$, $z$. Then, when `expr1` is initialized, in the constructor, Python will "ask" each of `par1` and `par2` whether their leaf nodes (i.e. the `Variable` objects at the end of their computational graphs) contain the `Variable` objects that correspond to $x$, $y$, or $z$. Then, `expr1` will store which variables correspond to which Parent expressions.
- `_varlist` is a list containing the arguments to the `Expression` in the order specified by the user
- The constructor is used to create an individual node from at most two `Expression`s and an `Operation`:
```python
e = Expression(expr1, expr2, sd.mul)
```
The user will rarely call this constructor themselves. Rather, they will call a wrapper around the `Expression` constructor that allows them to specify the order of the function arguments:
```python
e = sd.make_expression(x * 0.2 + y, vars=[x,y])
```
where `x` and `y` are `Variable` or `Expression` objects. The second argument, `varlist=[x,y]` sets the `_varlist` attribute of the `Expression` object, which remembers the correct order of the arguments.
- `next_parent()` to get the next parent node of the `Expression`
- Arithmetic oprations that call `BaseOperation`
- `__str__()` and `__repr__()`

`BaseOperation` and other `Operation` subclasses
- Used to implement elementary operations at each step in the computational graph
- `eval()` and `deriv()` methods
- `__str__()` and `__repr__()`
- We are using inheritance here rather than simply defining these operations ourselves so that all operations have a unified interface


`BaseDifferentiator`
- Unimplemented version of `deriv()`

- Has `.diff()` method
- Also a nice `__str__` method
- Ensures a consistent interface across different differentiator subclasses
- Function signature: `BaseDifferentiator`

`ForwardDifferentiator`
- Same `.diff()` method as `BaseDifferentiator` but implementing forward mode under the hood

`ReverseDifferentiator`
- Same `.diff()` method as `BaseDifferentiator` but implementing reverse mode under the hood

### 4. External dependencies

* `numpy`
* `pytest` (only for the test suite, not for the actual functionality)

### 5. Elementary functions

The elementary functions are coded in module `operations.py` and then loaded into `superdiff.py`. Each operation has its own seperate class. Each class has the following methods:

- `expr` takes in parent nodes and makes an epxression 
- `eval` takes in numbers to evaluate this operation at given point.
- `deriv` takes in 
- As well, when the user defines ƒ, they will have to use either the built-in Python operations (add, subtract, multiply, divide, power) or our custom functions, which will have the exact same interface as analogous `numpy` or `math` functions. E.g.

 

## Future plan