# Introduction: 

Our software will solve the problem of Forward Automatic Differentiation. Forward Automatic Differentiation allows for accurate and easy gradient calculation for a given function. There are many areas in which Automatic Differentiation is necessary such as back propagation in neural networks (this is more reverse-mode oriented), Jacobian calculation, gradient calculation, finding the local maxima and minima of functions (Newton's method), and optimizing them. With automatic differentiation, it is possible to calculate a derivative to machine precision in a computationally efficient manner. This method allows us to avoid the burden of complexity from symbolic differentation while also having better accuracy than numerical differentation.

# Background:

### Chain Rule

The chain rule allows us to break down a composite function into elementary operations, particularly when calculating the derivative of a function. It is described like so:
${dx}=\frac{df}{dg}\frac{dg}{dx}$

### Elementary operations
In forward automatic differentiation, a given function is first broken down into elementary functions. Elementary functions include multiplication, addition, subtraction, division, and other basic functions. Once a given function is "divided" into the elementary functions calculated at each step within the function, a graph can be generated where each node is a specific stage in the calculation and each edge is an elementary operation applied to a given node. Applying the derivative chain rule in this graph makes it possible to calculate the gradiants and the "contributions" of a given node. 

### Primal and tangent traces
When examining this broken down graph, the primal trace and tangent trace allow us to calculate the intermediate results at a given step in our differentiation calcuation. The forward primal trace of a function computes the value of each intermediate variable $v_j$ at step j while the tangent trace computes the derivatives of these intermediate values, $D_pv_j$ at a given step j. 
### Seed Vector
In order to actually calculate a specific derivative at different steps, we use a seed vector. A seed vector is a "direction" vector that allows us to evaluate a derivative with the weighted combination of the seed vector. For example, for a function $f(x_1, x_2)$, we could calculate $\frac{df}{dx_1}$ with a seed vector of $ p = [1,0]$ or $\frac{df}{dx_2}$ with a seed vector of $ p = [0, 1]$. 
### Example 
The following table displays a simple example of calculating the forward primal trace, tangent trace, and values with seed vectors, from a lab completed in class. The equation $f(x_1, x_2) =  e^{-(sin(x_1)- cos(x_2))^2}$ and we evaluate $f(\frac{\pi}{2}, \frac{\pi}{3})$:


  Forward Primal Trace   | Forward Tangent Trace |  p = $[1,0]^T$    | p = $[0,1]^T$
  -----------------------|-----------------------|-------------------|---------------
  v0 = x_1 = pi / 2      | Dpv0 = p1             | 1                 | 0
  v1 = y_1 =  pi / 3     | Dpv1 = p2             | 0                 | 1
 | | | 
  v_2 = sin(v0) = 1      | Dpv2 = cos(v0)*Dpv0   | 0                 | 0
  v_3 = cos(v1) = 1/2    | Dpv2 = -sin(v1)*Dpv1  | 0                 | -sqrt(3) / 2
 | | | 
  v4 = v2 - v3 = 1/2    | Dpv4 = Dpv2 - Dpv3    | 0                 | sqrt(3) / 2
 | | | 
  v5 = v4 * v4 = 1/2     | Dpv5 = 2*v4 *Dpv4     | 0                 | sqrt(3) / 2
 | | | 
  v_6 = -1 * v_5 = -1/4  | Dpv6 = -Dpv5          |  0                | -sqrt(3) / 2
 | | | 
  v_7 = e^v6 = e^(-1/4)  | Dpv7 = Dpv6 * e^v6    | 0                 | (-sqrt(3) / 2) * e ^(-1/4) 

### Dual Numbers
Dual numbers are another important concept that are extremely useful in forward automatic differentiation. A dual number consists of a real and a dual part (much like complex numbers consist of a real and imaginary part) where $z = a + b\epsilon$ where a is the real part and b the dual. Dual numbers have notable addition and multiplication properties where, if $z_1 = a_1 + b_1\epsilon$ and $z_2 = a_2 + b_2\epsilon$, $z_1 + z_2 = (a_1 + a_2) + (b_1 + b_2)\epsilon$ and $z_1 * z_2 = (a_1 * a_2) + (a_1b_1 + a_2b_2)\epsilon$. With respect to automatic differentiation, a dual number can represent a real part $a = f(x)$ and a dual part where $b = f'(x)$. With respect to traces, this real part can represent the primal trace and the dual part the tangent trace. This makes it easy to calculate the derivative and function at a given step, because the addition and multiplication propoerties of the dual numbers as discussed above correctly upholds the chain rule, as proven in lecture with Taylor series expansion. 

### Jacobian 

It is often necessary to compute and evaluate derivatives of a function at a given point in order to correctly determine the Jacobian (defined below); thus, forward automatic differentiation can be crucial to such calculations. 

The Jacobian is a matrix of first order partial derivatives of a given function with respect to the dependent variables. It is often crucial to calculating things such as Newton's method. As an example, take two example functions $x = u^2 - v^2$ and $y = u^2 + v^ 2$. For the Jacobian with respect to this system, we would want: 

$\begin{array}{cc}
\frac{dx}{du} & \frac{dx}{dv} \\
\frac{dy}{du} & \frac{dy}{dv} \\
\end{array}$  

Which would yield: 

$\begin{array}{cc}
2u & -2v \\
2u & 2v \\
\end{array}$

### Additional Notes

Our group thinks that 3Blue1Brown does a great exploration of the topic of automatic differentiation: https://www.youtube.com/watch?v=tIeHLnjs5U8

Here is an image from the video which shows a computation graph which represent the operations in a trival NN: 
<div>
<img src="https://www.3blue1brown.com/content/lessons/2017/backpropagation-calculus/tree-extended.png" width="400">
</div>

# How to Use Our Package:

At this time, you will need to clone our respository to use the package, us `pip install` within the repository to install the requirements and then to install our package for use. Below is an example of the commands in the command line that are needed to install the package.

```
git clone git@code.harvard.edu:CS107/team48.git
cd team48
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install -e .
```
After this, you should be able to use our given package as specified below. The below example finds the value of the Jacobian for a scalar function and single input (derivative) and returns the result at the specified x as an integer or float depending on the value (In the future, we will be returning an array. However, we currently return an integer or float and only support the Jacobian and derivative calculation for single input and single function cases).
```
from team48_autodiff_package.AutoDiff import *

def func(x):
    return 2*x**2 + 1
x = 2

forward_diff_outs = AutoDiff(func)
print(forward_diff_outs(x))
```

# Software Organization:

## Directory Structure and Basic Modules:
Our directory follows the below structure. Because we have started integrating our the package to PyPI, our software has some additional files to make this possible. 

```
team48/
├── docs
│   └── milestone1
├── LICENSE
├── README.md
├── requirements.txt
├── pyproject.toml
├── tests
└── src 
    ├── team48_autodiff_package
    └── team48_autodiff_package.egg-info
```

Inside of the src folder we have a folder named team48_autodiff_package which includes most of our central modules. This includes:
  - dual.py: contains the dual class and the overloaded functions for the DualNumber class
  - AutoDiff.py: contains the AutoDiff class that returns the derivative value for a value and function as specified in the example in the 'How to Use' section above.
  - ad.py: helper functions for testing and AutoDiff functions 

## Tests and Running
Our tests for the code are in the tests folder. All of the tests can be run with the pytest command. These are integrated into our workflow using GitHub workflows. We have 3 files, test_dual.py, test_forward_ad.py, and test_ad.py that test each of the corresponding files in src/team48_autodiff_package. To run these tests, ensure that you are in the root directory and have installed the necessary dependencies and packages (this is in the how to use section). Then use `pytest` to run the tests. Our tests cover 94% of the code in src/team48_autodiff_package. 

Our package can be installed as specified in the `How to Use Our Package` section.

# Implementation:
Below are the two classes that we implemented in our package. The Dual Numbers class serves not only as a core class, but also as a core data stucture necessary for effective forward automatic differentiation. Details about the classes are specified below.
- **DualNumber class**:
  - Attributes: 
    - real: a real number part of type float or int
    - dual: a dual number part of type float or int
  - Methods:
    - We overloaded some of the following elementary operations to correctly work for dual numbers as abstractly explained in the background section. The derivative is thus saved as the dual component of the DualNumber. 
    - Overloaded methods:
      - +, -, \, *, **, ==, __neg__ 
      -sin()
      -cos()
      -tan()
      -exp()
      -log()
      -sqrt()

- **AutoDiff class**: 
  - Attributes: 
    - func: An input function associated with the object
    - Methods
      - A given instance of the class can be passed a value (at this time a float or int) and will return the evaluated derivative value at the parameter value; the derivative is returned. 

  Our code also makes use of helper functions that can be found in src/team48_autodiff_package/ad.py. There are 3 functions that serve various testing and implementation purposes that are described below:

  - **single_derivative_approximation(fx, x)**:
    - This function accepts a function and a value (integer or float) x and closely approximates the derivative at x using $\frac{f(x) - f(x - h)}{h}$ where $h = 1.e-8$.
    - This is used for testing purposes. 
  -**check_error(test_value, approx)**:
    -This function checks if `test_value` (integer or float) is very close to `approx` (integer or float). It checks if the difference between the two values is smaller than 1e-6*(1 + abs(approx)). If the difference is smaller than the error, True is returned; otherwise False is returned.
    - This is used for testing purposes to compare single_derivative_approximation() against the true derivative calculation with our AutoDiff class to ensure correctness.
  -**single_derivative(fx, x)**
    -This function calculates the precise derivative of a function; x is a DualNumber and fx is a function. The derivative is returned.
    -This function is called in the AutoDiff class __call__ function when an AutoDiff object is passed a value.

We also have a multiVarForward.py function that is a function we are currently working on to implement multi-function and multi-value forward automatic differentiation.

Finally, there is a hack_solution.py function that enables the coverage CI test integration through github workflows. This checks to ensure that the tests cover greater than 90% of the code. 

All requirements can be found in requirements.txt and can be installed with The implementation of these classes requires the installation of numpy. This should be installed if the directions for installation in a previous section with the command 'pip install -r requirements.txt.'
    
# Upcoming features:
- **R^m -> R^n mapping**
  - We plan to modify the AutoDiff class to work with functions which accept and return vectors (Numpy array inputs and outputs)
- **Parsing a string into a function**
  - We plan to allow users to submit a function to create an AutoDiff object by passing a string if they so choose, using the eval() function to parse said string.
- **Higher order derivatives**
  - We plan to implement evaluation of higher order derivatives through an AutoDiff instance by evaluating the derivative multiple times.
## Reverse mode:
  - We plan to implement reverse mode evaluation of the derivative. For this, we will add the following classes:
- **Operation class**:
  - Attributes: 
    - Enum class where all elementary operations are assigned a number 
- **Node class**:
  - Attributes:
    - Optional Parameter Dict
    - operation: an Operation type variable representing the operation at the given node 
    - Value: Dual number value at a given "step" in the calculation
  - Methods:
    - getValue: get the DualNumber value at a given node
    - getOperation: get the given operation at a given node
- **Graph class**:
  - Attributes: 
    - Nodes: nodes in the graph of type node
    - Edges: adjacency list representing edges between different nodes
  - Methods:
    - reverseAutoDiff: backwards traversal of the graph to calculate the derivative with respect to the given values

The core data structures are the classes as discussed above. The Dual numbers will consist of a real number and a dual number (derivative) part. Each node will consist of a operation that occurs at the given node and specific parameters of a dual number type to track the value and derivative in a backwards propogation (we are accounting for reverse mode with this). The graph class will have nodes as discussed above and an adjacency list of edges between these nodes to record the relationships; these will be used to represent our computational graph and are necessary to conduct reverse auto differentiation.

We will also overload more operations for dual numbers by declaring custom versions of these functions in the DualNumber class. We will create our own versions of this and default to numpy versions of the functions. We will be using the numpy array and numpy matrix as a fundamental data structures and extend numpy functionality as it allows us to conveniently handle vector input and vector functions. Because handling vector input and numpy data structures are critical to our project we will include the external package numpy. In addition to the functions we already overloaded, we plan on overloading the following functions using numpy:
-Inverse trig functions
-Exponentials (any base)
-Hyperbolic functions
-Logarithms (any base)

NetworkX is an external package that allows us to visualize graphs. We will use this package to visualize large graph structures. We intend to include this package out of convenience for the user, but it is more of a stretch goal.

To handle cases for functions where the input dimensions differ from the output dimensions, we will include checks within nodes to enure that the number of variables and dimension is staying consistent with our defined function. We have discussed a reverseAutoDiff function above and will include that in our library for the function.

Given we have enough time, we may also implement a layer class for our graph to help visualize neural net propagation more effectively. 

# Licensing:
We will use the MIT License for our project. Because we are using numpy and possibly NetworkX, we shouldn't have to deal with any issues of patents. We are okay with people making and distributing closed source versions of our code.
    
You can find a copy of this license here: https://choosealicense.com/licenses/mit/