## Automatic Differentiation Documentation

### Introduction

---

Automatic differentiation (AD) encompasses a suite of tools used to compute the derivatives of functions, evaluate the functions at specified values, and evaluate the functions' derivates at specified values. In a situation where analytically deriving the derivative of compicated functions is not feasible (within the user's limitations), AD guarantees to return the exact solution in theory. In practice, rounding errors may compound given that AD performs a series of elementary operations.  




### Background

---


The way automatic differentiation works is by taking a possibly complex function and breaking it down into a sequence of elementary functions 
(i.e. summation, multiplication, cosine, etc.), where the output or outputs of past elementary functions are fed into the input of the next elementary 
function. The sequence of the elementary functions starts by first assigning a value or values to the variables in the function, 
then working its way from the inside out of the function by sequentially performing the elementary functions until you build out the whole function. 
This sequence can be expressed in a graph structure. ![Graph Image](https://blog.paperspace.com/content/images/2019/03/computation_graph_forward.png) 
Once you have the sequence of elementary functions, what automatic differentiation does is "passing down" the evaluation of the elementary function and the evaluation 
of the derivate through the sequence to get the whole function and its derivative evaluated at certain values. Mathematically, to evaluate an elementary function node, 
you need to take the function evaluation outputs of the nodes that feed into it and use that to evaluate that node. There is more subtlety for passing along the derivative evaluation. 
In a way, each elementary function has some variables that it depends on, but those variables from the previous nodes depend on other variables and thus, to pass the derivative along 
the sequence we need to use the chain rule $$\frac{\partial f}{\partial x}=\frac{\partial f}{\partial y}\frac{\partial y}{\partial x}$$. Now we can see that we need 
to take the derivate of the node we are in and multiply it by the evaluated derivative(s) of the previous node(s)! But we still need a derivate evaluated for the initial mode, 
and this will be assigned with a seed vector of the choosing. Intuivitely, what this seed vector is doing is making the derivative into a directional derivative,
the seed vector being the direction where the derivative (or the Jacobian in the case of multiple functions) is being projected in.

For example, a user may want to evaluate the derivative of a complicated function $f'$ at a given point. Let us define a function $f$:
$$
f(x) = sin^3(x^2 + cos(\sqrt{x}))
$$
The derivative of the function $f'$, given below, is messy and tedious to derive.
$$
f'(x) = 3 \left(2x - \frac{sin(\sqrt{x})}{2\sqrt{x}}\right) cos(x^2 + cos(\sqrt{x})) sin^2(x^2 + cos(\sqrt{x})) 
$$
However, the user does not have to analytically derive the derivative of the given function when using automatic differentiation. 
Provided the user supplies a function of interest and point(s) of interest, the derivative of the given function will be evaluated
at the given point(s) of interest. 

### Software Organization

---

#### Directory Structure


    Auto_diff/

        __init__.py  
        ad/
            __init__.py
            AD.py  # create AD objects
                     ...
        utils/ 
            __init__.py 
            jacobian.py # helps create jacobian matrix
                           ...
        tests/
            __init__.py 
            test_basic.py  # test basic operations
            test_jacobian.py  # test jacobian helper
                   ...
                   
#### Modules

We have four modules within our package `Auto_diff`.
*  `AD`: a module that contains the following class:
    * The `AD` class used for instatiating an AD object, which is able to perform the forward mode of automatic differentiation and produces the numerical output.
*   `jacobian`: a module that contains the following functions:
    * The function`Jacobian` used for handling functions of multiple inputs. This function takes as an argument an integer defining the number of inputs for the given function and returns a list of AD objects.
*  `test_basic`: a module that tests all the elementary functions (addition, multiplication, power, etc) and functions like `get_value` and `get_derivative`. 
*  `test_jacobian`: a module that tests function `Jacobian` on a single function and multiple functions.


#### Testing

Module testing can be found in the files `test_basic.py` and `test_jacobian.py`. 
* ** test_basic: ** Each elementary function is tested individually inside `test_basic.py`.
* ** test_jacobian: ** Function `Jacobian` is tested on a single function and multiple functions respectively to see if the function works in both scenarios.

Our test suite is included in the subdirectory `tests` that runs with pytest automatically on TravisCI and CodeCov. 

### Installation

##### Installing Python
You will need an updated version of python that is compatible with your system. These downloads can be found [here](https://www.python.org/downloads/).
Downloading a python version $\geq$ 3.4 will also install pip, the package manager for python.

##### Installing Git
Git is a version control software that will be used in order to pull all relevant package data from the Github repository. This step is not
necessary, but it greatly simplifies the process of downloading all relevant data. The steps used to install git for your given machine
can be found [here](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git).

The automatic differentiation package can be installed by cloning the necessary github repository using the following command.

    git clone https://github.com/AutoDiff-Dream-Team/cs107-FinalProject.git


The dependencies needed to properly use this pacakge can be installed by running the following command. First, you must be located in the directory that contains the `requirements.txt` file.

    pip install -r requirements.txt


### Implementation
---
Classes: 
There will be one main class that users will interact with in order to perform AD. The `AD` class is capable of performing the forward mode of automatic differentiation. The basic workflow is as follows: The user instantiates an instance of the `AD` class and uses
the newly created `AD` object as the input to a user-defined function. The `AD` object stores the value of the function and the value of the derivative
in the attributes `val` and `der`. 

Inputs:
* **val**
    * Type: Numeric (default is 1)
    * The value of the function evaluated at the specified user input
* **der** 
    * Type: Numeric (default is 1)
    * The value of the derivative of the function evaluated at the specified user input
    * This input is likely to be changed from 1 when computing partial derivatives for functions of multiple variables

Attributes:
* **val**
    * Type: Numeric (default is initially 1)
    * The value of the function evaluated at the specified user input
* **der** 
    * Type: Numeric (default is initially 1)
    * The value of the derivative of the function evaluated at the specified user input

Methods:
* The `AD` class does not contain methods that will be commonly accessed by the user.
* Basic operations including addition, subtraction, multiplication, division, power, exponential, negation, and the trigonometric functions sine, cosine, and tangent
  are overloaded in the definition of the class.

Necessary Dependencies:

`Numpy` is the only dependency that will be required in order to properly use our package. 
`Numpy` will be used to handle the elementary functions such as $sin(x)$ and $exp(x)$.


### How to Use
---

### Instantiating and using an AD object

Import the AD module and numpy using the following commands. 

    
    from Auto_diff import AD
    import numpy as np

Instantiate an AD object. You can change the first argument, but leave the second argument as 1 for this example.
    
    # Instantiates the AD object
    val = 3
    x = AD(val,1)

Define a function that takes one argument as input. This will be your AD object.

    
    # User defined function (you can change the body of this function)
    def f(x):
        output = np.sin(x**2 + 3)
        return output

Once the function is defined, you can call the function by passing the AD object as the argument.

    
    # Running this will change the values of the val and der attributes (these are shown in output which is an AD object)
    output = f(x)

    # If you dont wan't to define a function you can compute them on the fly
    output = np.sin(x**2 + 3)

    print(f"The value of the function evaluated at {val} is {output.val}.")
    print(f"The value of the derivative of the function evaluated at {val} is {output.der}.")

### Using multiple AD objects (functions of multiple inputs)

You can define functions of multiple variables where each variable is an AD object. An example is given below. When using multiple variables
the second arguments in the instantiations of the AD objects will define the seed vector and therefore will dictate the value of the computed 
derivative. For example, let us define a function $f=x^2+y$. Let us set the values for $x$ and $y$ as 2 and 3 respectively.
We can instantiate the AD objects as follows and define our function as follows.

    # Instantiates the AD objects
    x = AD(2,1) 
    y = AD(3,1) 

    # Define the function
    def f(x,y):
        return x**2 + y



We will first manually derive the value of the function and the values of the partial derivatives of the function.

We can evaluate the function at $x=2$ and $y=3$. 
$$
f(2,3) = 2^2 + 3 = 7
$$
We can also determine $\frac{\delta f}{\delta x}$ and $\frac{\delta f}{\delta y}$.
$$ 
\frac{\delta f}{\delta x} = 2x
$$
$$ 
\frac{\delta f}{\delta y} = 1  
$$
Using our values for $x$ and $y$, $\frac{\delta f}{\delta x}=4$ and $\frac{\delta f}{\delta y}=1$.
We can compute $\frac{\delta f}{\delta x}$ by setting the second argument for `x` to 1 and the second
argument for `y` to 0. Similarly, we can compute $\frac{\delta f}{\delta y}$ by setting the second argument 
for `x` to 0 and the second argument for `y` to 1. This is done as follows.

    # Evaluate the function and the partial derivative with respect to x for x=2 and y=3
    x = AD(2,1) 
    y = AD(3,0) 
    output1 = f(x,y)

    # Evaluate the function and the partial derivative with respect to y for x=2 and y=3
    x = AD(2,0) 
    y = AD(3,1) 
    output2 = f(x,y)

    # The value of the function should be the same for both evaluations
    function_value1 = output1.val
    function_value2 = output2.val
    assert function_value1 == function_value2

    der_x = output1.der
    der_y = output2.der

    print(f"The value of the function f at x=2 and y=3 is {function_value1}".)
    print(f"The partial derivative with respect to x of f at x=2 and y=3 is {der_x}".)
    print(f"The partial derivative with respect to y of f at x=2 and y=3 is {der_y}".)

### Computing the Jacobian

We also give the user the option of directly taking the Jacobian Matrix of a function or set of functions with one or multiple variables. The user will
import the Jacobian module and call the Jacobian method by passing just the values of the variables they want and the method will return a numpy ndarray 
of AD objects ready to be used in a single function or in a list of functions. This will return a matrix of n x n of AD objects evaluated at the 
correct seed vectors for the Jacobian. You can get only the derivatives by using the get_derivative method.
    
    from Auto_diff import Jacobian, AD

    # Passing in list of values for x amount of variables
    x = Jacobian([2, 7, 10])

    # This assignment will return a 3x3 matrix of AD objects
    jacobian_results = [np.exp(x[0]/x[1] + x[2]),tan(sin(3^x[0])*x[2]), x[0]+x[1]+x[2]]

    # Getting only the derivatives of the 3x3 matrix, i.e., the Jacobian matrix
    jacobian_matrix = AD.get_derivatives(jacobian_results) 

### Future Features



We plan to implement the reverse mode of automatic differentiation. This implementation is meant to overcome one of the main pitfalls of 
the forward mode, namely computing multiple partial derivatives such as has to be done when computing the Jacobian. For example, given 
we want to compute $\frac{\delta f}{\delta x}$ and $\frac{\delta f}{\delta y}$ for some function $f$, we would need to perform the forward
mode twice. Once when setting the seed for the `AD` object associated with $x$ to 1 and the `AD` object associated with $y$ to 0 and another
time setting the seed for the `AD` object associated with $x$ to 0 and the `AD` object associated with $y$ to 1. 

We can think of performing the reverse mode as inverting the expressions for the derivatives when peforming the chain rule. However, this
will not necessarily be straighforward to implement. We envision recursively propagating the gradient of the given function down the nodes
of a tree where the nodes represent each step in the evaluation trace. 

Implementing the reverse mode may change how we structure our module. As of now, the `AD` class is strictly used for implementing the forward
mode. We may define an argument `mode` upon instantiation of an `AD` object that can either be set to "forward" or "reverse". Setting this
argument to reverse will instantiate an object that can be used to perform the reverse mode. 