## Introduction

This is a package that offers the feature of automatic differentiation.

Automatic differentiation is useful in many fields, including but not limit to:

- Calculation of derivatives when using some iterative methods to solve linear systems
- Calculation of the gradient of an objective function in optimization
- Calculation of derivatives/gradients which are parts of some numerical methods to solve differential equation systems

Automatic differentiation is better than other differencing methods like finite-difference because it is much cheaper. Finite differences are expensive, since you need to do a forward pass for each derivative. Automatic differentiation is both efficient (linear in the cost of computing the value) and numerically stable. Traditional methods of differentiation such as symbolic differentiation do not scale well to vector functions with multiple variable inputs, which are widely used to solve real world problems.

The functions and features in this package can evaluate derivatives/gradients of specified expressions and free users from manual calculation.

## Background

For a function, even a complicated one, the computer is able to compute its derivatives by breaking it down into smaller parts, applying chain rule to the elementary operations, and calculate intermerdiate results at each step.

In the graph structure of such calculation, each node is an intermediate result, and each arrow is an elementary operation. An elementary operation are such as addition, subtraction, multiplication, division, or taking exponential, log, sine, cosine, etc. In short, AD represent a function as a composition of elementary functions through elemtary operations by a sequence of intermediate values.

An example is provided below.

\begin{aligned}
&f(x,y) = \sin(x) - y^2, \quad v_{-1} = x, \quad v_0 = y \\
&v_1 = \sin(v_{-1}) = \sin(x), \quad v_2 = v_0^2 = y^2, v_3 = -v_2 = -y^2, \quad v_4 = v_1 + v_3 = \sin(x) - y^2 = f(x,y)
\end{aligned}

[AD structure can be visualized as nodes connected by arrows]

There are two modes of Automatic Differentiation: one is Forward Mode, and the other is Reverse Mode.



In forward mode, AD starts from the inputs and work towards the outputs, evaluating the value of each intermediate value along with its derivative with respect to a fixed input variable using the chain rule.

$$\dot{v}_k = \frac{\partial{v_k}}{\partial{x_i}} = \sum_{v_m \in \text{parent}(v_k)} \frac{\partial{v_k}}{\partial{v_m}} \frac{\partial{v_m}}{\partial{x_i}} = \sum_{v_m \in \text{parent}(v_k)} \frac{\partial{v_k}}{\partial{v_m}} \dot{v}_m$$

In the example above, a trace table for forward AD would look like the following to compute and store intermediate values and derivatives:

| Trace | Elementary Function | Current Value|Elementary Function Derivative | $\nabla_x$ Value | $\nabla_y$ Value |
| :------------------: | :-------------------: | :------------: | :------------: | :------------: | :------------: |
| $v_{-1}$ | $x$ | $x$ |$\dot v_{-1} = \dot x$ | $1$ | $0$ |
| $v_0$ | $y$ | $y$ | $\dot v_0 = \dot y$ | $0$ | $1$ |
| $v_1$ | $sin(v_{-1})$ | $sin(x)$ | $\dot v_1 = cos(v_{-1}) \cdot \dot v_{-1}$ | $cos(x)$ | $0$ |
| $v_2$ | $v_0^2$ | $y^2$ | $\dot v_2 = 2v_0 \dot v_0$ | $0$ | $2y$ |
| $v_3$ | $-v_2$ | $-y^2$ | $\dot v_3 = -\dot v_2$ | $0$ | $-2y$ |
| $v_4$ | $v_1+v_3$ | $sin(x)-y^2$ | $\dot v_3 = \dot v_1 + \dot v_3$ | $cos(x)$ | $-2y$ |
|<img width=50/>|<img width=80/>|<img width=80/>|<img width=150/>|<img width=50/>|<img width=50/>|



In reverse mode, AD starts from the inputs to do a forward pass to calculate all the intermediate values, and then starts from the outputs to do a reverse pass to compute the derivatives of the function with respect to the intermediate values backwards using the chain rule.

$$\bar{v}_k = \frac{\partial{f}}{\partial{v_k}} = \sum_{v_n \in \text{child}(v_k)} \frac{\partial{f}}{\partial{v_n}} \frac{\partial{v_n}}{\partial{v_k}} = \sum_{v_n \in \text{child}(v_k)} \bar{v}_n \frac{\partial{v_n}}{\partial{v_k}}$$

| Forward pass: | Intermediate | Partial Derivatives | | Reverse pass:  | Adjoint |   |   |
| :------------------: | :------------------- |:------------ | :------------ | :------------: | :------------ | :------------ |:------------ |
| $\downarrow$ | $v_{-1}=x$| | | $\uparrow$ | $\bar{v}_{-1} = \frac{\partial{f}}{\partial{v_{-1}}} = \frac{\partial{f}}{\partial{v_1}}\frac{\partial{v_1}}{\partial{v_{-1}}} = \bar{v_1} \frac{\partial{v_1}}{\partial{v_{-1}}} =cos(x)$ | 
| $\downarrow$ | $v_0=y$| | | $\uparrow$ | $\bar{v}_0 = \frac{\partial{f}}{\partial{v_0}} = \frac{\partial{f}}{\partial{v_2}}\frac{\partial{v_2}}{\partial{v_0}} = \bar{v_2} \frac{\partial{v_2}}{\partial{v_0}} =-2y$ | 
| $\downarrow$ | $v_1=sin(v_{-1})$| $\frac{\partial{v_1}}{\partial{v_{-1}}} = cos(v_{-1}) = cos(x)$ | | $\uparrow$ | $\bar{v}_1 = \frac{\partial{f}}{\partial{v_1}} = \frac{\partial{f}}{\partial{v_4}}\frac{\partial{v_4}}{\partial{v_1}} = \bar{v_4} \frac{\partial{v_4}}{\partial{v_1}} =1$ | 
| $\downarrow$ | $v_2=v_0^2$| $\frac{\partial{v_2}}{\partial{v_0}} = 2v_0=2y$ | | $\uparrow$ | $\bar{v}_2 = \frac{\partial{f}}{\partial{v_2}} = \frac{\partial{f}}{\partial{v_3}}\frac{\partial{v_3}}{\partial{v_2}} = \bar{v_3} \frac{\partial{v_3}}{\partial{v_2}} =-1$ | 
| $\downarrow$ | $v_3=-v_2$| $\frac{\partial{v_3}}{\partial{v_2}} = -1$ | | $\uparrow$ | $\bar{v}_3 = \frac{\partial{f}}{\partial{v_3}} = \frac{\partial{f}}{\partial{v_4}}\frac{\partial{v_4}}{\partial{v_3}} = \bar{v_4} \frac{\partial{v_4}}{\partial{v_3}} =1$ | 
| $\downarrow$ | $v_4=v_1+v_3$| $\frac{\partial{v_4}}{\partial{v_1}} = 1$, $\frac{\partial{v_4}}{\partial{v_3}} = 1$ | | $\uparrow$ | $\bar{v}_4 = \frac{\partial{f}}{\partial{v_4}} = \frac{\partial{v_4}}{\partial{v_4}} = 1 $ | 
|<img width=10/>|<img width=120/>|<img width=200/>|<img width=10/>|<img width=10/>|<img width=260/>|<img width=5/>|<img width=5/>|


## How to use

### Installation

```python
python -m pip install -i https://test.pypi.org/simple/cs107_ADpackage
```

You are recommended to use the package under Python version 3.6.2 or later.

### Demo

Import package

```python
import cs107_ADpackage as ad
```

Specify the problem as a string. (AD graph structure visualization functionality will be added later.)

```python
f_expression = '3*sin(x)+4'
```

Get the first derivatives of the function using forward propagation.

```python
value, derivative = ad.derive(func_expr='3*sin(x)+4', varname='x', value=np.pi, mode='forward')
print(value, derivative)
```

## Software Organizatoin

### Directory Structure

```
cs107project/
    LICENSE
    README.md
    src/
        cs107_package
            __init__.py
            __main__.py
            fowardNode.py
            reverseNode.py
            utils.py    
    docs/
        milestone1
        milestone2_progress
        milestone2
    tests/
        __init__.py
        test.py
        test_forwardNode.py
        test_reverseNode.py
        test_rootFind.py
    .travis.yml
```

### Included Modules and their Basic Functionality

We plan on using NumPy, Matplotlib, PyTest and PyTorch. We intend to use NumPy to create matrices and perform elementary calculations, Matplotlib to properly portray graphical structures of functions consisting of elementray operations, PyTest to run tests on our new code, and PyTorch to perform benchmarks on these tests.

### Test Suite
Our test suite will live a test file /tests directory and it will be tested by TravisCI.

### Package Distribution
We will distribute our package by uploading it to PyPI so everyone can use it.

### Notes
We will not be packing out software. The code will be on GitHub and PyPI so it will be accessible by everyone.

As of right now we are still working on this project, so we could potentially make changes to the software later.

## Implementation

### Core Data Structures

Node structure that is able to represent all the intermediate function expressions. Every instance of a Node stores the actual value of the variable, a derivative/gradient related attribute, as well as children/parents of this node.

### Classes

1. `class ForwardNode`: This is the most generic base class to accomodate for the different nodes in the AD structure in Forward Mode. 

2. `class ReverseNode`: This is the most generic base class to accomodate for the different nodes in the AD structure in Reverse Mode. 



### Methods and Name Attributes

The `ForwardNode` class has 3 attributes:
- `self.value`: the actual value of the function expression $v_k$ represented by a ForwardNode instance
- `self.trace`: the gradient $\frac{\partial v_k}{\partial x_i}$ of this intermediate function expression with respect to the target input variable $x_i$

The elementary operations are overloaded in this class. Doing any one of the operation would return a new `ForwardNode` instance that represents the new intermediate function expression, and it would contain the attributes mentioned above.
```python
v1 = ForwardNode(value1, trace1)
v2 = ForwardNode(value2, trace2)
```
- Addition: $v_k = v_1 + v_2, \ \Rightarrow \ \dot{v}_k = 1 \cdot \dot{v}_1 + 1 \cdot \dot{v}_2$  
  The new `FowardNode` instance for $v_k$ given by addition of $v_1$ and $v_2$ will have value being the sum of their values and the trace being the sum of their traces.
```python
vk = ForwardNode(value1+value2, trace1+trace2)
```
- Subtraction: $v_k = v_1 - v_2, \ \Rightarrow \ \dot{v}_k = 1 \cdot \dot{v}_1 - 1 \cdot \dot{v}_2$  
  The new `FowardNode` instance for $v_k$ given by subtraction of $v_1$ and $v_2$ will have value being the difference of their values and the trace being the difference of their traces.
```python
vk = ForwardNode(value1-value2, trace1-trace2)
```
- Multiplication: $v_k = v_1 \cdot v_2, \ \Rightarrow \ \dot{v}_k = v_1 \cdot \dot{v}_2 + v_2 \cdot \dot{v}_1$  
  The new `FowardNode` instance for $v_k$ given by multiplication of $v_1$ and $v_2$ will have value being the product of their values and the trace being the results of the product rule.
```python
vk = ForwardNode(value1*value2, value1*trace2+value2*trace1)
```
- Division: $v_k = \frac{v_1}{v_2}, \ \Rightarrow \ \dot{v}_k = \frac{v_2 \cdot \dot{v}_1 - v_1 \cdot \dot{v}_2}{v_2^2}$  
  The new `FowardNode` instance for $v_k$ given by multiplication of $v_1$ and $v_2$ will have value being the quotient of their values and the trace being the results of the quotient rule.
```python
vk = ForwardNode(value1/value2, (value2*trace1-value1*trace2) / (value2**2))
```
- Power: $v_k = v_1^{v_2}, \ \Rightarrow \ \dot{v}_k = v_2 \cdot v_1^{v_2-1} + $  
  The new `FowardNode` instance for $v_k$ given by the power of $v_1$ and $v_2$ will have value being $v_1$ value to the degree of $v_2$ value and the trace being the results of derivation.
```python
value = value1**value2
trace = trace2*(value1**(value2 - 1))*trace1 + (value1**value2)*np.log(value1)*trace2
vk = ForwardNode(value, trace)
```



In [1]:
import numpy as np

class ForwardNode():
    def __init__(self, value, trace = 1.0):
        self.value = value
        self.trace = trace

    def __add__(self, other):
        try:
            new = ForwardNode(self.value + other.value, self.trace + other.trace)
        except AttributeError:
            new = ForwardNode(self.value + other, self.trace)
        return new

    def __radd__(self, other):

        return self.__add__(other)

    def __sub__(self, other):
        
        return self.__add__(-1 * other)

    def __rsub__(self, other):

        return (-1 * self).__add__(other)
    
    def __mul__(self, other):
        try:
            new = ForwardNode(self.value * other.value, self.value * other.trace + self.trace + other.value)
        except AttributeError:
            new = ForwardNode(self.value * other, self.trace * other)
        return new

    def __rmul__(self, other):
        return self.__mul__(other)

    def __truediv__(self, other):
        try:
            new = ForwardNode(self.value / other.value, (self.trace * other.value - self.value * other.trace) / (other.value ** 2))
        except AttributeError:
            new = ForwardNode(self.value / other, self.trace / other)
        return new

    def __rtruediv__(self, other):
        try:
            new = ForwardNode(other.value / self.value, (other.trace * self.value - other.value * self.trace)/(self.value ** 2))
        except AttributeError:
            new = ForwardNode(other / self.value, other * (-self.value**(-2)) * self.trace)
        return new

    def __pow__(self, other):
        try:
            value = self.value ** other.value
            trace = other.trace * (self.value ** (other.value - 1)) * self.trace + (self.value ** other.value) * np.log(self.value) * other.trace
            new = ForwardNode(value, trace)
        except AttributeError:
            new = ForwardNode(self.value ** other, self.trace * other * self.value ** (other - 1))
        return new

    def __rpow__(self, other):
        try:
            value = other.value ** self.value
            trace = self.trace * (other.value ** (self.value - 1)) * other.trace + (other.value ** self.value) * np.log(other.value) * self.trace
            new = ForwardNode(value, trace)
        except AttributeError:
            new = ForwardNode(other ** self.value, other ** self.value * np.log(other) * self.trace)
        return new

    def __repr__(self):
        return 'Value: ' + str(self.value) + ' , Derivative: ' + str(self.trace)

    def __str__(self):
        return 'Value: ' + str(self.value) + ' , Derivative: ' + str(self.trace)


The `ReverseNode` class has 3 attributes:
- `self.value`: the actual value of the function expression $v_k$ represented by a ReverseNode instance
- `self.adjoint`: the gradient $\frac{\partial f}{\partial v_k}$ of the target function expression with respect to the intermediate variable $v_k$
- `self.children`: a list to record the node's children by storing the tuples containing ($\frac{\partial v_n}{\partial v_k}$, child node $v_n$)

It has a function `gradient()` to calculate the gradient $\frac{df}{dv_k}$ using recursion
- It is exactly calculating $$\bar{v}_k = \frac{\partial{f}}{\partial{v_k}} = \sum_{v_n \in \text{child}(v_k)} \frac{\partial{f}}{\partial{v_n}} \frac{\partial{v_n}}{\partial{v_k}} = \sum_{v_n \in \text{child}(v_k)}  \bar{v}_n \frac{\partial{v_n}}{\partial{v_k}}$$

The elementary operations are overloaded in this class:
- Addition, subtraction, multiplication, division, and power operations are overloaded to accomodate the node structure.
- Doing any one of the operation would return a new `ReverseNode` instance that represents the new intermediate function expression, and it would contain the attributes mentioned above.

Further explainations and more function overloading will be added in the future.

In [2]:
class ReverseNode():
    def __init__(self, value):
        self.value = value
        self.children = []
        self.adjoint = 1.0

    def gradient(self):
        if len(self.children) > 0:
            self.adjoint = sum(coef * child.gradient() for coef, child in self.children)
        return self.adjoint

    def __add__(self, other):
        new = ReverseNode(self.value + other.value)
        self.children.append((1.0, new))
        other.children.append((1.0, new))
        return new

    def __sub__(self, other):
        new = ReverseNode(self.value - other.value)
        self.children.append((1.0, new))
        other.children.append((-1.0, new))
        return new

    def __mul__(self, other):
        new = ReverseNode(self.value * other.value)
        self.children.append((other.value, new))
        other.children.append((self.value, new))
        return new

    def __pow__(self, other):
        new = ReverseNode((self.value) ** (other.value))
        self.children.append((other.value * ((self.value) ** (other.value - 1)), new))
        other.children.append((np.log(self.value) * (self.value) ** (other.value), new))
        return new


Other elementary operations are overloaded outside of the class. They take input of `ForwardNode` or `ReverseNode`. Doing any one of the operation would return a new `ForwardNode` or `ReverseNode` instance that represents the new intermediate function expression, and it would contain the attributes mentioned above.
```python
v = ForwardNode(value, trace)
```
- Constant: $v_k = constant, \ \Rightarrow \ \dot{v}_k = 0$  
For a node $v_k$ that represents a constant, the trace $\frac{\partial v_k}{\partial x_i} = 0$ in forward mode and the adjoint $\frac{\partial f}{\partial v_k} = 0$ in reverse mode.
```python
value = 12
vk = ForwardNode(value, 0)
vk = ReverseNode(value, 0)
```

- Sine : $v_k = \sin(v), \ \Rightarrow \ \dot{v}_k = \cos(v) \cdot \dot{v}, \ \bar{v_k} = \bar{v} \frac{\partial{v}}{\partial{v_k}}$, which can only be computed after the whole problem has been set up properly.
```python
vk_F = ForwardNode(np.sin(value), trace1*np.cos(value)
vk_R = ReverseNode(np.sin(value))
v.children.append((np.cos(value), vk_R)
```

- Cosine: $v_k = \cos(v_1), \ \Rightarrow \ \dot{v}_k = -\sin(v_1) \cdot \dot{v}_1$
```python
vk_F = ForwardNode(np.cos(value), -trace*np.sin(value)
```

- Natural log: $v_k = \log(v), \ \Rightarrow \ \dot{v}_k = \frac{1}{v} \cdot \dot{v}$
```python
vk_F = ForwardNode(np.log(value), trace/value)
```

- Exponential: $v_k = \exp(v), \ \Rightarrow \ \dot{v}_k = \exp(v) \cdot \dot{v}$
```python
vk_F = ForwardNode(np.exp(value), trace1*np.exp(value))
```

- Square root: $v_k = \sqrt{v}, \ \Rightarrow \ \dot{v}_k = \frac{1}{2} v^{-\frac{1}{2}} \cdot \dot{v}$
```python
vk_F = ForwardNode(value**0.5, 0.5*trace*value**(-0.5))
```

- Tangent: $v_k = \tan{v}, \ \Rightarrow \ \dot{v}_k = \frac{1}{\cos^2(v)} \cdot \dot{v}$
```python
vk_F = ForwardNode(np.tan(value), trace/np.cos(value)**2)
```

More explanations about the other overloaded functions and more details regarding their usage for `ReverseNode` class will be added in the future documentation.



In [3]:
def constant(val, mode='forward'):
    if mode == 'forward':
        new = ForwardNode(val, 0)
    elif mode == 'reverse':
        new = ReverseNode(val)
        new.adjoint = 0
    return new


def sin(node):
    if type(node) is ForwardNode:
        new = ForwardNode(np.sin(node.value), node.trace * np.cos(node.value))
        # new.depends.append((np.cos(node.value), node))  # sin(x) -> d/dx = cos(x)
    elif type(node) is ReverseNode:
        new = ReverseNode(np.sin(node.value))
        node.children.append((np.cos(node.value), new))  # sin(x) -> d/dx = cos(x)
    return new


def cos(node):
    if type(node) is ForwardNode:
        new = ForwardNode(np.cos(node.value), -1 * node.trace * np.sin(node.value))
        # new.depends.append((-np.sin(node.value), node))  # cos(x) -> d/dx = -sin(x)
    elif type(node) is ReverseNode:
        new = ReverseNode(np.cos(node.value))
        node.children.append((-np.sin(node.value), new))  # cos(x) -> d/dx = -sin(x)
    return new


def log(node):
    if type(node) is ForwardNode:
        new = ForwardNode(np.log(node.value), node.trace / node.value)
        # new.depends.append((1.0/node.value, node))  # log(x) -> d/dx = 1/x
    elif type(node) is ReverseNode:
        new = ReverseNode(np.log(node.value))
        node.children.append((1.0 / node.value, new))  # log(x) -> d/dx = 1/x
    return new


def exp(node):
    if type(node) is ForwardNode:
        new = ForwardNode(np.exp(node.value), node.trace * np.exp(node.value))
        # new.depends.append((np.exp(node.value), node))  # e^x -> d/dx = e^x
    elif type(node) is ReverseNode:
        new = ReverseNode(np.exp(node.value))
        node.children.append((np.exp(node.value), new))  # e^x -> d/dx = e^x
    return new


def sqrt(node):
    if node.value < 0:
        raise ValueError(f"Invalid value: cannot calculate the square root for {node.value}.")
    elif type(node) is ForwardNode:
        new = ForwardNode(node.value ** 0.5, 0.5 * node.trace * node.value ** -0.5)
        # new.depends.append((node.value ** -0.5, node))  # sqrt(x) -> d/dx = 1/2 * x ^ -1/2
    elif type(node) is ReverseNode:
        new = ReverseNode(node.value ** 0.5)
        node.children.append((node.value ** -0.5, new))  # sqrt(x) -> d/dx = 1/2 * x ^ -1/2
    return new


def tan(node):
    if node.value % (np.pi / 2) == 0 and node.value % np.pi > 0:
        raise ValueError(f"Invalid value: the tangent for {node.value} doesn't exist.")
    elif type(node) is ForwardNode:
        new = ForwardNode(np.tan(node.value), node.trace / np.cos(node.value) ** 2)
        # new.depends.append((1 / np.cos(node.value) ** 2, node))  # tan(x) -> d/dx = 1 / cos(x) ^ 2
    elif type(node) is ReverseNode:
        new = ReverseNode(np.tan(node.value))
        node.children.append((1 / np.cos(node.value) ** 2, new))  # tan(x) -> d/dx = 1 / cos(x) ^ 2
    return new


def arctan(node):
    if type(node) is ForwardNode:
        new = ForwardNode(np.arctan(node.value), node.trace / (1 + node.value ** 2))
        # new.depends.append((1 / (1 + node.value ** 2), node)) # arctan(x) -> d/dx = 1 / 1 + x^2
    elif type(node) is ReverseNode:
        new = ReverseNode(np.arctan(node.value))
        node.children.append((1 / (1 + node.value ** 2), new))  # arctan(x) -> d/dx = 1 / 1 + x^2
    return new


def arcsin(node):
    if np.abs(node.value) >= 1:
        raise ValueError(f"Invalid value: derivative of arcsin for {node.value} doesn't exist.")
    elif type(node) is ForwardNode:
        new = ForwardNode(np.arcsin(node.value), 1 / np.sqrt(1 - node.value ** 2))
        # new.depends.append((1 / (1 + node.value ** 2), node)) # arctan(x) -> d/dx = 1 / 1 + x^2
    elif type(node) is ReverseNode:
        new = ReverseNode(np.arcsin(node.value))
        node.children.append((1 / np.sqrt(1 - node.value ** 2)), new)  # arctan(x) -> d/dx = 1 / 1 + x^2
    return new


def arccos(node):
    if np.abs(node.value) >= 1:
        raise ValueError(f"Invalid value: derivative of arccos for {node.value} doesn't exist.")
    elif type(node) is ForwardNode:
        new = ForwardNode(np.arccos(node.value), -1 * node.trace * (1 / np.sqrt(1 - node.value ** 2)))
        # new.depends.append((1 / (1 + node.value ** 2), node)) # arctan(x) -> d/dx = 1 / 1 + x^2
    elif type(node) is ReverseNode:
        new = ReverseNode(np.arccos(node.value))
        node.children.append((1 / np.sqrt(1 - node.value ** 2), new))  # arctan(x) -> d/dx = 1 / 1 + x^2
    return new


def tanh(node):
    if type(node) is ForwardNode:
        new = ForwardNode(np.tanh(node.value), (1 / np.cosh(node.value)) ** 2 * node.trace)
        # new.depends.append((1 / (1 + node.value ** 2), node)) # arctan(x) -> d/dx = 1 / 1 + x^2
    elif type(node) is ReverseNode:
        new = ReverseNode(np.tanh(node.value))
        node.children.append(((1 / np.cosh(node.value)) ** 2, new))  # arctan(x) -> d/dx = 1 / 1 + x^2
    return new


def sinh(node):
    if type(node) is ForwardNode:
        new = ForwardNode(np.sinh(node.value), np.cosh(node.value) * node.trace)
        # new.depends.append((1 / (1 + node.value ** 2), node)) # arctan(x) -> d/dx = 1 / 1 + x^2
    elif type(node) is ReverseNode:
        new = ReverseNode(np.sinh(node.value))
        node.children.append((np.cosh(node.value), new))  # arctan(x) -> d/dx = 1 / 1 + x^2
    return new


def cosh(node):
    if type(node) is ForwardNode:
        new = ForwardNode(np.cosh(node.value), np.sinh(node.value) * node.trace)
        # new.depends.append((1 / (1 + node.value ** 2), node)) # arctan(x) -> d/dx = 1 / 1 + x^2
    elif type(node) is ReverseNode:
        new = ReverseNode(np.cosh(node.value))
        node.children.append((np.sinh(node.value), new))  # arctan(x) -> d/dx = 1 / 1 + x^2
    return new


def log_base(node, base=10):
    if node.value < 0:
        raise ValueError(f"Invalid Value: the log operation for {node.value} doesn't exist.")
    elif type(node) is ForwardNode:
        new = ForwardNode(np.log(node.value) / np.log(base), 1 / (node.value * np.log(base)) * node.trace)
        # new.depends.append((1 / (1 + node.value ** 2), node)) # arctan(x) -> d/dx = 1 / 1 + x^2
    elif type(node) is ReverseNode:
        new = ReverseNode(np.log(node.value))
        node.children.append((1 / (node.value * np.log(base)), new))  # arctan(x) -> d/dx = 1 / 1 + x^2
    return new


def log_base(node, base=10):
    if node.value < 0:
        raise ValueError(f"Invalid Value: the log operation for {node.value} doesn't exist.")
    elif type(node) is ForwardNode:
        new = ForwardNode(np.log(node.value) / np.log(base), 1 / (node.value * np.log(base)) * node.trace)
        # new.depends.append((1 / (1 + node.value ** 2), node)) # arctan(x) -> d/dx = 1 / 1 + x^2
    elif type(node) is ReverseNode:
        new = ReverseNode(np.log(node.value))
        node.children.append((1 / (node.value * np.log(base)), new))  # arctan(x) -> d/dx = 1 / 1 + x^2
    return new


def cot(node):
    if node.value % np.pi == 0:
        raise ValueError(f"Invalid Value: the cotangent operation for {node.value} doesn't exist.")
    elif type(node) is ForwardNode:
        new = ForwardNode(1 / np.tan(node.value), (1 / np.sin(node.value)) ** 2 * node.trace)
        # new.depends.append((1 / (1 + node.value ** 2), node)) # arctan(x) -> d/dx = 1 / 1 + x^2
    elif type(node) is ReverseNode:
        new = ReverseNode(1 / np.tan(node.value))
        node.children.append((1 / np.sin(node.value) ** 2, new))  # arctan(x) -> d/dx = 1 / 1 + x^2
    return new


def sec(node):
    if node.value % (np.pi / 2) == 0 and node.value % np.pi > 0:
        raise ValueError(f"Invalid Value: the secant operation for {node.value} doesn't exist.")
    elif type(node) is ForwardNode:
        new = 1 / cos(node)
        # new.depends.append((1 / (1 + node.value ** 2), node)) # arctan(x) -> d/dx = 1 / 1 + x^2
    elif type(node) is ReverseNode:
        new = ReverseNode(1 / np.tan(node.value))
        node.children.append((1 / np.sin(node.value) ** 2, new))  # arctan(x) -> d/dx = 1 / 1 + x^2
    return new


Then, wrap up everything to be in one function so that a one line call can give the derivative of the target function expression. (Here for now our AD only works for 1-D 1-variable function.)

The input function should be typed in string. Input as python function will be possible in future implementation.

In [4]:
import re

In [5]:
def derive(func_expr, varname, value, mode='reverse'):
    if mode == 'forward':
        x = ForwardNode(value)
        y = eval(re.sub('(\d+\.*\d*)', f'constant(\\1, mode=\'{mode}\')', func_expr))
        val, grad = y.value, y.trace
    elif mode == 'reverse':
        x = ReverseNode(value)
        y = eval(re.sub('(\d+\.*\d*)', f'constant(\\1, mode=\'{mode}\')', func_expr))
        val, grad = y.value, x.gradient()
    
    return val, grad

In [6]:
derive(func_expr='3*sin(x)+4', varname='x', value=np.pi, mode='forward')

(4.0, -3.0)

In [7]:
derive(func_expr='3*sin(x)+4', varname='x', value=np.pi, mode='reverse')

(4.0, -3.0)

### External Dependencies

We will rely on the latest version of numpy. Other required dependencies will be the latest version of matplotlib, to be used specifically for outputting the visual representation of our data structures.

### How will we deal with elementary functions?

Most elementary functions can be obtained from the numpy dependency. For more niche arithmetic functions that are not included in the package (or have definitions that are different from the standard implementations), we will likely overload with our home-grown implementations.

## License

The license that we decided to choose is the MIT License. We chose this license because our research showed that this license is usually the one that developers choose if they want their software to be easily accessible and quickly distributed to other developers and others in the community. We ultimately settled for this license because we believe in allowing other developers to freely use the software written for their desired purposes.

## Future Features

1. More functions to be overloaded for the ReverseNode class.

2. Modify the whole thing so that the derivation function can take in python function as input but not limited to function expression string input.

3. Drawing the graph structure of AD, with nodes (intermediate functions/variables) connected by arrows (elementary operations)

4. A more developed version of AD that is able to take in multivariate expression or vector containing multivariate expressions to calculate gradients or Jacobians.


## Broader Impact and Inclusivity Statement

Over the past few years, people have put in an increased effort to bridge the gap in STEM between underrepresented groups and inclusivity. However, even with this increased effort there is much more that can and should be done to fill this gap. While creating our software, we kept in mind that people from different backgrounds and experience levels would access this. Therefore, we tried to add docstrings and the proper documentation in order to make this software as accessible and understandable as possible, especially for those who might not be as familiar with Automatic Differentiation, Python, or this type of software in general. However, we do understand that there is more work that needs to be done to make our software more accessible and user-friendly. Currently, our software is targeted towards those who are familiar with English mathematical terms and symbols. Our software is catered towards the average English speaker. Moving forward, to make our software more inclusive, we would try to make it more accessible for those who are not as familiar with the English language.

We understand that our software has both positive and negative implications. However, we believe the positive implications outweigh the negative ones. Our team has simply found one way to tackle the problem using Automatic Differentiation and believe that we are adding to the diversity of technology in the community by contributing our software.

Furthermore, Harvard's diversity statement says, "[their] commitment to diversity in all forms is rooted in [the] fundamental belief that engaging with unfamiliar ideas, perspectives, cultures, and people creates the conditions for dramatic and meaningful growth." Our team believes that by engaging with the software we have created, we are sharing our ideas and perspectives on a certain way to solve a problem. However, we are open to suggestions and any feedback our users have. We are constantly seeking to improve the way we implemented our software.