# Milestone 2: CS207 Group 21

# Introduction

Differentiation has ubiquitous applications in many areas of mathematics, sciences and engineering. As such, it is certainly useful and convenient if computer programs could carry out differentiation automatically for application in a wide variety of cases. For computationally heavy projects, the ability to compute derivatives automatically becomes even more critical as manually working out deriatives in such projects is certainly an impossible task. Even though there exists methods such as *numerical diffentiation* and *symbolic differentiation* in determining derivatives computationally, these two methods have their limitations. In the following, we shall briefly review *numerical diffentiation* and *symbolic differentiation* to highlight some of their difficulties before moving on to describing *automatic differentiation* and the advantages it brings over the other two methods.   

### Numerical Differentiation
In *numerical differentiation*, the value of derivatives is approximated using the following formula:

$$
\frac{\partial{f(x)}}{\partial{x}} \approx \frac{f(x+h)-f(x)}{h}
$$

However, when the h values are too small, the numerical approximation fluctuates about the analytical answer. This is because the step size is too small, leading to a round-off error of the floating points caused by the limited precision of computations. On the other hand, when the h values are too large, the numerical approximation becomes inaccurate. This is because the step size is too big, leading to an error of approximation known as truncation error.

### Symbolic Differentiation
In *symbolic differentiation*, expressions are manipulated automatically to obtain the required derivatives. At its heart, *symbolic differentiation* applies transformations that captures the various rules of differentiation in order to manipulate the expressions. However, *symbolic differentiation* requires careful and sometimes ingenious design as accidental manipulation can easily produce large expressions which take up a lot of computational power and time, which leads to a problem known as expression swell.

### Automatic Differentiation
As seen from above, both *numerical diffentiation* and *symbolic differentiation* have their respective issues when it comes to computing derivatives. These issues are further exacerbated when calculating higher order derivatives, where both errors and complexity increases. *Automatic differentiation* overcomes these issues by recognizing that every differentiation, no matter how complicated, can be executed in a stepwise fashion with each step being an execution of either the elementary arithmetic operations (addition, substraction, multiplication, division) or the elementary functions (sin, sqrt, exp, log, etc.). To track the evaluation of each step, *automatic differentiation* produces computational graphs and evaluation traces. To compute the derivatives, *automatic differentiation* applies the chain rule repeatedly at all steps. By taking a stepwise approach and using the chain rule, *automatic differentiation* circumvents the issues encountered by both *numerical diffentiation* and *symbolic differentiation* and automatically compute derivatives that are both accurate and with a high level of precision. In order to further understand *automatic differentiation*, we present the mathematical background and essential ideas of *automatic differentiation* in the next section.

Note - In our research of automatic differentiation, we referred to the following resources:

Baydin, A.G., Pearlmutter, B.A., Radul, A. A. & Siskind, J.M. (2018). Automatic differentiation in machine learning: A survey. *Journal of Machine Learning Research, 18*, 1-42.

Geeraert, S., Lehalle, C.A., Pearlmutter, B., Pironneau, O. & Reghai, A. (2017). Mini-symposium on automatic differentiation and its applications in the financial industry. *ESAIM: Proceedings and Surverys* (pp. 1-10).

Berland, H. (2006). *Automatic differentiation* [PowerPoint Slides]. Retrieved from http://www.robots.ox.ac.uk/~tvg/publications/talks/autodiff.pdf

Rufflewind (2016). Reverse-mode automatic differentiation: a tutorial. Retrieved Nov 19, 2019, from https://rufflewind.com/2016-12-30/reverse-mode-automatic-differentiation

# Background

As mentioned before, *automatic differentiation* employs a stepwise approach and chain rule to automatically compute derivatives. We shall first state the chain rule in calculus before showing an example production of an evaluation trace and computational graph. Next, we discuss one mode of *automatic differentiation*, namely the forward mode. In particular, the demonstration of the use of chain rule at each step to determine derivatives will be shown here. Finally, we touch on the use of dual numbers in *automatic differentiation*. 

### Chain Rule 
For a function $f(u(t),v(t))$, the chain rule is given by

$$
\begin{align}
 \frac{\partial f}{\partial t} = \frac{\partial f}{\partial u}\frac{\partial u}{\partial t} + \frac{\partial f}{\partial v}\frac{\partial v}{\partial t}
\end{align}
$$

The chain rule is essential for automatic differentiation as the forward mode applies the chain rule repeatedly at each step of the evaluation trace in order to determine the derivatives at each step (see below).

### Example Production of Evaluation Trace & Computational Graph
The most straightforward way to show the generation of an evaluation trace and computational graph is to consider an example. For this purpose, we study the following function 

$$
f(x,y) = sin(x) + 4y
$$

#### Evaluation Trace
The evaluation trace breaks the function into individual steps and creates a buildup of the function starting with the input variables. At each step, only either an elementary arithmetic operation (addition, substraction, multiplication, division) or an elementary function (sin, sqrt, exp, log, etc.) is used to build the function for the next step. The evaluation trace for our function of interest is shown in the table below.

| Trace | Elementary Function | Current Value | Comment               | 
| :---: | :-----------------: | :-----------: | :-------------------: | 
| $x_{1}$ | $x_{1}$           | $x$           | Input x               |
| $x_{2}$ | $x_{2}$           | $y$           | Input y               |
| $x_{3}$ | $sin(x_{1})$      | $sin(x)$      | Elementary function   |
| $x_{4}$ | $4*x_{2}$         | $4y$          | Elementary arithmetic |
| $x_{5}$ | $x_{3}+x_{4}$     | $sin(x) + 4y$ | Elementary arithmetic |


#### Computational Graph 
The computational graph translates the essence of the evaluation trace into a graph and captures the relationship between each step. Refer to the figure below for the computational graph of our function of interest.  

![computational-graph](Computational_Graph.png)

### Forward Mode
Armed with the knowledge of the chain rule, evaluation trace and computational graph, we can now consider the forward mode of *automatic differentiation*. The table below shows the earlier evaluation trace table that has now been expanded to include columns that store derivatives. At each step, the chain rule is applied to the elementary function to determine the elementary function derivative.

For instance, 

Trace $x_{3}$

$$
\begin{align}
\dot{x_{3}} &= \frac{\partial{sin(x_{1})}}{\partial{x_{1}}} \dot{x}_{1} \\
&= cos(x_{1})\dot{x}_{1}
\end{align} 
$$

Trace $x_{5}$
$$
\begin{align}
\dot{x_{5}} &= \frac{\partial{(x_{3}+x_{4}})}{\partial{x_{3}}} \dot{x}_{3} +  \frac{\partial{(x_{3}+x_{4}})}{\partial{x_{3}}} \dot{x}_{4} \\
&= \dot{x}_{3}+\dot{x}_{4}
\end{align} 
$$

| Trace | Elementary Function | Current Value | Elementary Function Derivative | $\nabla_{x}$ Value  | $\nabla_{y}$ Value  | 
| :---: | :-----------------: | :-----------: | :--------------------------: | :---------------------: | :---------------------: | 
| $x_{1}$ | $x_{1}$       | $x$           | $\dot{x}_{1}$             | $1$      | $0$ |
| $x_{2}$ | $x_{2}$       | $y$           | $\dot{x}_{2}$             | $0$      | $1$ |
| $x_{3}$ | $sin(x_{1})$  | $sin(x)$      | $cos(x_{1})\dot{x}_{1}$   | $cos(x)$ | $0$ |
| $x_{4}$ | $4*x_{2}$     | $4y$          | $4\dot{x}_{2}$            | $0$      | $4$ |
| $x_{5}$ | $x_{3}+x_{4}$ | $sin(x) + 4y$ | $\dot{x}_{3}+\dot{x}_{4}$ | $cos(x)$ | $4$ |

As seen from the table above, the derivative of elementary functions such as $sin$ has to be done manually and this has implications for our design of the *automatic differentiation* package later. Specifically speaking, we would need to define separate classes for each elementary function. For more details, refer to the Implementation section below.

In addition, the first and second row has initial values for $\nabla_{x}$ and $\nabla_{y}$ as (1,0) and (0,1) respectively. These are actually seed values for the stepwise propagation of the values of derivatives. The forward mode actually calculates the dot product between the gradient of our function with the seed vector (ie directional derivative). In this case, we have a scalar function with two variables, but in the case of a vector function of vectors, the forward mode actually calculates the dot product between the Jacobian matrix ($J$) and seed vector ($p$) (ie $J.p$). 

### Dual Numbers
Dual numbers extend the real number line in another direction by adding a second component. This extension is analagous to the extension of real numbers by imaginary numbers. The general form of a dual number is given by 

$$ x = a + \epsilon b, $$

where $\epsilon$ is defined as $\epsilon^2 = 0$, $a$ is the real part and $b$ is the dual part of the dual number.

In our *automatic differentiation* package, we can define a dual class that has two attributes. One of these attributes stores the value of the function while the other stores the value of the derivatives. This is similar to having a dual number with the value of a function as the real part and the value of derivatives as the dual part. Having such a dual number structure allows us to carry out the expected arithmetic operations between two dual instances.

#### Addition

$$ 
\begin{align}
(x +\epsilon \dot{x}) + (y +\epsilon \dot{y}) &= (x+y) + \epsilon(\dot{x}+\dot{y})
\end{align}
$$ 

#### Subtraction

$$ 
\begin{align}
(x +\epsilon \dot{x}) - (y +\epsilon \dot{y}) &= (x-y) + \epsilon(\dot{x}-\dot{y})
\end{align}
$$ 

#### Multiplication

$$ 
\begin{align}
(x +\epsilon \dot{x})*(y +\epsilon \dot{y}) &= xy+\epsilon x\dot{y}+\epsilon \dot{x}y+\epsilon^2\dot{x}\dot{y}\\
&= xy + \epsilon(x\dot{y} + \dot{x}y)
\end{align}
$$ 

#### Division

$$ 
\begin{align}
(x +\epsilon \dot{x}) / (y +\epsilon \dot{y}) &= \frac{(x +\epsilon \dot{x})(y -\epsilon \dot{y})}{(y +\epsilon \dot{y})(y - \epsilon \dot{y})} \\
&= \frac{xy-\epsilon x\dot{y}+\epsilon \dot{x}y-\epsilon^2\dot{x}\dot{y}}{y^2-\epsilon^2\dot{y}^2} \\
&= \frac{xy + \epsilon(-x\dot{y} + \dot{x}y)}{y^2} \\
&= \frac{x}{y} + \frac{\epsilon(y\dot{x}-x\dot{y})}{y^2} 
\end{align}
$$ 

In sum, this section covers the mathematical background and essential ideas of *automatic differentiation* for a scalar function with two variables. These basic concepts can be extended easily to higher dimensions if needed. In fact, our *automatic differentiation* package will not only handle scalar functions of scalar and vector values, but also vector functions of vectors.

# How to Use Package

## Installation

To begin, the user has to work in a `python` environment (preferably version >= 3.7). It is advisable for the user to create a new virtual environment for interacting with our package. To create a new virtual environment, enter the following command in the terminal:

`conda create -n env_autodiff python=3.7`

After which, activate the environment with the following command:

`conda activate env_autodiff`

Since we have used PyPI to host our package, users can download our Automatic Differentiation package with the following command in the terminal:

`pip install autodiffing`

As we have set up the pip package in a way such that the required dependencies will be installed during `pip install autodiffing`, users need not worry about not having the required dependencies when using the Automatic Differentiation package. 

In case users fail to get the required dependencies during pip install, users can still refer to the contents of requirements.txt below to pip install the main dependencies that are required. If not, users can visit https://pypi.org/project/autodiffing/#files to download the latest gunzip tar file and unzip the contents to get the requirements.txt file. In the directory with the unzipped folder containing the requirements.txt file, users need to run the following command in the terminal to download the required dependencies:

`pip install -r requirements.txt`

Within our requirements.txt, we have the a number of packages that come with the installation of `python` version 3.7 and our main packages, but the main packages that we require for our Automatic Differentiation package are: 

`numpy==1.17.4`\
`matplotlib==3.1.1`\
`scipy==1.3.2`

`numpy` is essential for our Automatic Differentiation package as we require it for the calculation of our elementary functions, and for dealing with arrays and matrices when there are vector functions and vector inputs.

`matplotlib` is needed for any potential visualization of our outputs.

`scipy` is a good package to have for its optimization and linear algebra abilities.

## Using the Package

Once users have installed all the dependencies and the package itself, they may begin to use our package to quickly find derivatives of functions.  For this section, we walk through three different examples of how users can interact with the package for their purposes.  Users should start by importing the DualNumber and ElementaryFunctions modules.  

```python
from AD.DualNumber import DualNumber
from AD import ElementaryFunctions as EF
```

As specified in Milestone 1 and in this document, users should start by initializing a DualNumber object:

```python
# DualNumber is a class in module AD, user must initialize the value of the variable in the initialization.
x = DualNumber(5)
```

Note that when the user initializes a variable, he or she _must_ provide the initial value.  For this milestone, the user may only pass in scalar-valued functions of scalars, but in the future we will implement methods and classes for the user to input vector-valued functions of vectors.  The initial value of the derivative is set as a default of 1, but the user may overwrite this. To find the derivative of a specific elementary function, users should pass this initialized object into our custom-designed elementary functions as follows:
```python
# ElemFunctions is a class where we define some elementary function derivatives and calculate the derivative function.
func = EF.Sin(x)

# we can get the value and derivative from the attributes "val" and "der". If we did not assign value and derivative direction in the fist
# step, we can do it here. 
print(func.val)
print(func.der)
```



However, it is more likely that users will want to use our class for more complicated uses like finding the roots of a function, so our demo code will show this below (note that this code is also included in our test suite).  Here we wish to find a root of the function

$$y = \sin(x)\cos(x)\tan(x) - 2\exp(x)\log(x)\sqrt{x}$$


In [1]:
# note that sys here is just for the sake of the jupyter notebook hosted on github to make this session interactive,
# users should directly import ElementaryFunction and DualNumber when they use the package
# DualNumber.py contains a class called DualNumber, which the user should import
import sys
from scipy.optimize import fsolve
import numpy as np
sys.path.insert(0, '../AutoDiff/AD/')

import ElementaryFunctions as EF
from DualNumber import DualNumber as DN

def given_function(x):
    x = DN(x);
    y = EF.Sin(x)*EF.Cos(x)*EF.Tan(x)-EF.Exp(x)*EF.Log(x)*EF.Sqrt(x)*2
    return y 

x0=1.2
max_iter=int(1e4) #Maximum number of iterations before stop
threshold=1e-12;
x=x0

for i in range(max_iter):
    current_value = given_function(x).val
    current_slope = given_function(x).der
    delta_X = -current_value/current_slope
    if abs(delta_X)<threshold:
        break
    else:
        x=x+delta_X
print('The found root of 0=sin(x)cos(x)tan(x)-2exp(x)log(x)sqrt(x) is', x)


def f(x):
    return np.sin(x)*np.cos(x)*np.tan(x)-np.exp(x)*np.log(x)*np.sqrt(x)*2
scipy_sol = fsolve(f, [10])
print('The numerical solution given by scipy is {}'.format(scipy_sol))

The found root of 0=sin(x)cos(x)tan(x)-2exp(x)log(x)sqrt(x) is 1.1321821619751826
The numerical solution given by scipy is [1.13218216]


Similarly, if the user simply wishes to find the derivative of a function, he or she may interact with the class as follows:

In [2]:
# initialize a value for the variable
x = DN(5)
function = EF.Sin(x)/x

print(function.der)

0.0950894080791708


Users have access to a wide range of elementary functions, including, but not limited to, `Sin`, `Cos`, `Tan`, `Exp`, `Pow`, `ArcSin`, `Log`, `Sqrt`, `ArcCos`, and `ArcTan`.  `Pow` enables users to write polynomial functions of variables, such as $x^5$.

# Software Organization
This section gives a high-level overview of our software organization, including our modules, test suites, and directory structure.  

## Directory Structure
At the moment, our directory structure is as follows:

```
AutoDiff/
│   README.md
│   LICENSE
│   setup.py
│
└───AD/
│   │   DualNumber.py
│   │   ElementaryFunctions.py
│   │   tests.py
│
│    driver.py
```

## Modules

`DualNumber.py` contains a class to hold the basic DualNumber object, which the user interacts with to initialize a variable.  This class also overloads basic operators, such as addition, subtraction, division, and multiplication, among others.

`ElementaryFunctions.py` is a module with functions defined for each of the elementary functions.  The user can import this module and interact with it by using the functions defined in the module for root-finding, setting derivatives, and other applications.

The `tests.py` module contains our unit tests for `DualNumber.py` and `ElementaryFunctions.py`, which uses the `pytest` module to test for functionality (since `ElementaryFunctions.py` depends on `DualNumber.py`, we combined the test suites).  We also implemented doctests, but we will go into further detail on this in the next section.

Lastly, we have a file `driver.py` which contains an example use of the package for finding derivatives.


## Testing
Users who wish to run the test suites may do so by typing ```pytest AutoDiff/AD/tests.py``` into the terminal to run our test suites of the DualNumber class, for example.  We have integrated our tests with Travis and codecov, and these badges live within the README.md.  We have also provided doctests in our elementary functions and DualNumber() class based on [doctest](https://docs.python.org/3/library/doctest.html) as described in lecture.  To run these and test for code coverage, just use ```pytest --doctest-modules --cov --cov-report term-missing tests.py```, for example.

## Demo 
Last, our `driver.py` files contains an example use of the package for root finding using Newton's method.  Users may use this demo to run their own root-finding algorithms for any function $f: \mathbb{R} \rightarrow \mathbb{R}$!

# Implementation details

## Classes and Data Structures


### DualNumber()
The DualNumber() class within the DualNumber module will makes up the core of our setup. The user may interact with the class to set up scalar- (or vector-valued variables in the future) that will later be inputed into functions. When constructing a DualNumber object, the user should always specify a value for the object as follows:

    x = DualNumber(5)

DualNumber will then store the variable value, 5, in the DualNumber data structure. This initialization will also store the derivative as `x._der = 1`.  The user may override this by initializing x as:

    x = DualNumber(5,10)
    
Which would initialize the derivative of this variable to 10.  Note that while we give the users access to this functionality, we primarily use this functionality internally, when we need to return a new DualNumber whose derivative is not 1.  To see a more concrete case of this, just consider the following example:

    x = DualNumber(5)
    y = x + x
    
Then, when we initialize y internally, we will set `y = DualNumber(x._val + x._val, x._der + x._der)`.  To generally protect the user from manipulating the values and derivatives of DualNumber objects, we have made these attributes private with the leading underscore (although this is not actually private, just Python 'private').  We have used the `@property` decorator around the `val()` and `der()` methods, however, which enable the user to access the value and derivative of their variable with `y.val` and `y.der`, while still keeping the attributes themselves private.

Aside from initialization, the DualNumber class overloads many of the basic arithmetic operators, including, but not limited to, addition, subtraction, division, multiplication, and negation (technically a unary operator).

To demonstrate the functionality of this class, let's focus on the overloaded __add__ operator. Note that there are two regimes we should consider: (1) the case in which both the operands are DualNumbers and (2) the case in one is a DualNumber and the other is a float.  To account for these cases, we have used the following design:

```python
     def __add__(self, other):
        try:
            val2 = self.val + other.val
            der2 = self.der + other.der
            return DualNumber(val2, der2)
        except AttributeError:
            assert(isinstance(other, float) or isinstance(other, int)), "Check the type of objects in function!"
            val2 = self.val + other
            der2 = self.der
            return DualNumber(val2, der2)
        ```
            
The case in which both are DualNumbers is relatively straightforward, and we simply add the derivatives and values.  In the case in which one is a float, however, we must check that the other is a constant (either int or float) and then add the values only (since derivative remains constant).  We implement this checking through a try-except design.

We used this as a simple but instructive example of the principles guiding our implementation for the DualNumber() class.  With the other basic operators, we use the same idea but have different rules for updating the value and derivative (use product rule for multiplication, for example).

### ElementaryFunctions
We also implement a module ElementaryFunctions in which we give users the ability to define functions of their variables. Note that we should be careful about how the function is called when we have multiple variables. For example, suppose the user defines:
    
```python
    import AD.ElementaryFunctions as EF
    import AD.DualNumber as DN
    a = DualNumber(5)
    b = EF.Exp(a)
    c = EF.Cos(EF.Sqrt(a)) + b**2
    ```

If we were to trace how our package is calculating derivatives and values, note that in the first step, `a` is initialized to a value of 5.  The user then creates a new variable `b` which is `Exp(a)`.  If we were to write this mathematically, we'd have that the user has set

$$a=5, b=e^a$$

Then, `c` is defined as a more complicated combination of these two variables.  The user can access the value and derivative of this variable using `c.val` and `c.der` (and again, the actual attributes are hidden, the user accesses these values from a getter method in the DualNumber class with the property decorator).

Note that `c` is returned as a DualNumber object, as any of the elementary functions should return an object with the value and the derivative of the new variable.  Let's look at one of the elementary functions in ElementaryFunctions:

```python
    def Sin(x):
        if data_type_check(x) == 0:
            return DualNumber(np.sin(x._val),np.cos(x._val)*x._der)
        else:
            return DualNumber(np.sin(x),0)
        ```

This is fairly straightforward; we specify the function such that it returns a DualNumber() with value as the value of the sine function evaluated at the input, and the derivative evaluated as the derivative of the sine function at that point.  Our function `data_type_check(x)` is similar to our `assert` condition in the DualNumber initialization, where we check to make sure that the input x is either a DualNumber (return 0) or float/int.  If a string is entered, for example, `data_type_check(x)` would raise an error.

```python
    def data_type_check(x):
        try:
            float(x._val)+float(x._der)
            return 0  # returns 0 if x is DualNumber
        except AttributeError:
            try:
                float(x)
                return 1 # returns 1 if x is real
            except:
                raise AttributeError('Input must be dual number or real number!')
```

In summary, these two modules, ElementaryFunctions and DualNumber, form the crux of our implementation.  Our core data structure is the DualNumber object, which overloads basic arithmetic operators and stores the private attributes `_val` and `_der`.  While these attributes are private, the user may access them with a getter method which is decorated with `@property`.  ElementaryFunctions is a module which defines a set of functions for the user to interact with.  It depends on DualNumber() and returns DualNumber objects.

### External Dependencies
All the dependencies are written in `setup.py` (to be moved to `requirements.txt` in milestone3 since it was not required for this one).  We use numpy for calculating the value and derivatives of functions in the ElementaryFunctions module.  We also use pytest for our test suite, which has further external dependencies.  We do include `requirements.txt` but do not encourage users to use this.

# Future Features

Future features for our Automatic Differentiation package include taking in vector inputs and vector functions and implementing reverse mode for automatic differentiation. For each of these future features, their required software changes and primary challenges are elaborated below.


### Vector Inputs 

To deal with vector inputs and vector functions, we make use of our current package that deals with only scalar function of scalar input. Specifically speaking, we first create scalar functions that can deal with vector inputs before considering vector functions. A new class `scalar_func` is created that inherits from the `DualNumber` class. In this class, we define new methods that have their equivalent in the `DualNumber` class and determine values of the scalar function and its derivatives by looping over the array of vector inputs. 

```python
class scalar_func(DualNumber):
    # To deal with scalar functions with vector inputs
    def __init__(self, vector_inputs,seed_vector):
    # check dimension
    assert len(seed_vector) == len(vector_inputs)
    self._inputs = vector_inputs
    self._val= np.empty()
    self._der= seed_vector
    
    def val(self):
        for i in self._inputs:
            self._val = np.append([self._val],[i.val()],axis=0)
        return self._val
    
    def Jacobian(self):
        # Calculate Jacobian vector given the vector_inputs
        for i in self._inputs:
            jacobian = np.append([jacobian],[i.der()],axis=0)
        return jacobian
    
    def der(self):
        # determine result of derivatives
        self._der = np.dot(self.Jacobian(), self._der)
        return self._der
    ```

### Vector Functions

Vector functions build on the work that is done for scalar and vector inputs. A new class `vector_func` is created that inherits from the `scalar_func` class. In this class, we define new methods that have their equivalent in the `scalar_func` class and the general approach is to loop over the list of scalar functions while applying the equivalent methods in the `scalar_func` class to determine the outputs for the vector functions. For instance, in the `Jacobian` method, we calculate the Jacobian matrix by simply looping over the list of `scalar functions` and applying the `Jacobian` method on each scalar function. The result for the derivatives is determined using the dot product betwen the Jacobian matrix and seed vector.

The primary challenge for the implementation of both vector inputs and vector functions is the design of the code such that users can interact with our package in the most straightforward and easily understood manner. For example, the team is considering collapsing both `scalar_func` and `vector_func` into a single `func` class so that the user only have to call upon one class when defining functions. In addition, we have to think of a way to tackle the case when users decide to create a function using `DualNumber` directly (ie not using our classes for functions). In that case, perhaps we should make `DualNumber` a private class and let users use `func` directly to create variables (ie treat variables as a scalar function of scalar input). 

### Reverse Mode

The reverse mode is fundamentally different in its approach to automatic differentiation as compared to the forward mode. In particular, the reverse mode consists of both the forward pass and reverse pass, with no chain rule applied in the forward pass (only partial derivatives are stored). The result of a reverse mode is only determined after the reverse pass is done, and the value of each variable or parent node at each stage depends on the values of its children nodes. As such, this has three important implications for the design of our package for reverse mode. 

Firstly, the reverse mode cannot be interpreted in the context of dual numbers like the forward mode and we need to come up with a different class for the implementation of reverse mode. Since the result cannot be calculated until the reverse pass is done and the variables at each stage of the reverse mode depends on the values of its children, we need to instantiate a reverse mode object/variable with an empty list that will temporarily hold the partial derivative values of its children during the forward pass. Note that we need a list here because it is possible for a parent node to have more than one child node.

```python
class ReverseVar():
    def __init__(self, value):
        self._value = value # value of variable at which the derivative is determined
        self._children = [] # empty list to contain partial derivatives of children during forward pass
        self._der = None # value not determined until reverse pass is done
        
    def val(self):
        return self._value
        ```

Secondly, as the forward pass only does partial derivatives and does not apply the chain rule, we need to redefine the overloading of operators for our reverse mode objects/variables. As an example, the overloading of the multiplication operator is shown below. Note that overloading the operators in essence is equivalent to carrying out the forward pass, and the partial derivatives are stored as temporary items within `self._children` for evaluation later during reverse pass.

```python
class ReverseVar():
    def __init__(self, value):
        self._value = value # value of variable at which the derivative is determined
        self._children = [] # empty list to contain partial derivatives of children during forward pass
        self._der = None # value not determined until reverse pass is done
        
    def val(self):
        return self._value    
        
    def __mul__(self, other):
        z = ReverseVar(self._value * other._value)
        self._children.append((other._value, z)) 
        other._children.append((self._value, z)) 
        return z
    ```
    
Lastly, we will define a method `der` to carry out the reverse pass recursively in order to calculate the value of the derivatives.

The primary challenge for reverse mode is to ensure that its classes, methods and attributes are kept separate from that of forward mode even though we would want them to share certain similarity. 

### Visualization

When users interact with our package, it would be useful for them to have a way of visualizing the on-going calculations or final results. We hope to include some code that will print outputs for the status of on-going calculations and meaningful results. In addition, we hope to define a new method called `post_process` which will be found in the different classes of our package where visual outputs of either key calculations or important results is possible. For instance, `post_process` within the reverse mode class might produce tables of forward pass and reverse pass. The method `post_process` primarily uses the `matplotlib` library and takes in `directory_out` as an argument for users to indicate the directory in which they wish to save the visualization outputs. 