# Overview and Notes
The purpose of this section is to give a sketch of the mathematical background of automatic differentiation to motivate our implementation of the forward mode.

# Software Organization

We intend to 

# Implementation

## Classes and Data Structures
Recall that automatic differentation is a method for computing the Jacobian of a function $f : \mathbb{R}^n \rightarrow \mathbb{R}^m$ by accumulating and combining partial derivates of elementary functions.  In the _forward mode_ of automatic differentiation, we observe that $g(\alpha + \beta \epsilon) = g(\alpha) + g'(\alpha)\beta\epsilon$ where $\alpha + \beta\epsilon$ is the dual representation of the real $\alpha$ (with nilpotent $\epsilon$, $\epsilon^2 > 0$).  This observation, along with the chain rule, allows us to compute the Jacobian of $f$ as:
$$\frac{df_j(x)}{d{x_i}} \bigg|_{\vec{x}=\alpha} = \epsilon \text{-coefficient of } f_j(\alpha + 1\epsilon) \ \ j = 1, \ldots, m$$

Note that this technique requires us to compute a gradient for each of the input variables $x_i$, which means our computational cost scales as $O(n)$ where $n$ is the number of input variables.  Hence, forward mode is preferred in cases with more output variables than input variables.  More generally, however, we motivated a software implementation of the dual numbers, which we go into greater depth below.

After reviewing the literature ([1](https://arxiv.org/pdf/1811.05031.pdf), [2](http://www.jmlr.org/papers/volume18/17-468/17-468.pdf), and [3](https://www.mcs.anl.gov/papers/P1152.pdf)), we concluded that we'd make use of _operator overloading_ for implementing dual numbers for the forward mode of automatic differentiation.

Our classes will be called \_DualNumber(), AD\_fun(DualNumber), Parallelized(AD\_fun), and PostProcess().  We break down the functionality, method names, core data strctures of each class separately below.

### DualNumber()
DualNumber() will enable us to get the dual representation of a scalar. The user may interact with the class to set up scalar- or vector-valued variables that will later be inputted into functions.  For example, the user may specify $x = AD.DualNumber()$ and later use this $x$ as a variable in a function.

Pseudocode for this class is included below:

```python

class DualNumber():
    '''
    Description: a class to hold dual number representations of vectors/scalars.
    '''
    
    def __init__(self, real_part, imag_part=1):
        if type(real_part) or type(imag_part) == str:
            # Ensure that the input into constructor is valid
            raise(ValueError('The input cannot be string'))
            
        elif len(real_part) != len(image_part):
            raise ValueError
            
        else:
            # can be scalar or np.array (i.e. component-wise representation of dual number)
            self.real=x
            self.imag=1 # set initial imaginary part to 1, since this represents the derivative of x
```

### AutoDiff()

This class will form the crux of our automatic differentation implementation.  In particular, here we will make use of _operator overloading_ to return the function values and derivatives.  Note that the user should

So, for example, the user should write 
1. x = AD.AutoDiff(x)
2. function = AD.AutoDiff.sin(x)
3. function.get_value(5)

to get the value of $\sin(x)$ at $x=5$.

```python
import numpy as np
class AutoDiff():
    #Automatic differentiation function class. inherented from Dual number class
    def __init__(self, x, dim):
        self._value=(lambda self, x: array of size dim, initialized with xs)
        self._gradient=(lambda self, x: np.ones(dim)) #Define the function gradient, intended use is private, note that this is a function

    def exp(self, x):
        self._value= (lambda self, x: np.exp(x._value()))
        self._gradient = (lambda self, x: np.exp(x._value())*x._gradient())
        
    def __add__(self, second_var):
        #Override default adding with dunder method
        self._value = (lambda self,second_var: self._value() + second_var._value())
        self._gradient = (lambda self,second_var: self._gradient() + second_var._gradient())
        
    def __radd__(self,second_var):
        # This function should be able to handle left and right add
        
    def __mul__(self,second_var):
        #Similarly, we can override mutiply
    def __rmul__(self,second_var):
        #Similarly, we can override right mutiply
        
    def __sub__(self,second_var):
        #Similarly, we can override subtraction
    def __rsub__(self,second_var):
        #Similarly, we can override right subtraction
        
    def __div__(self, second_var):
    def __rdiv__(self, second_var):
        
    def sin(self,x):
        #Similarly, we can give user defined sin function, 
        self._value=(lambda self, x: np.sin(x._value()))
        self._grad= (lambda self, x: np.cos(x._value())*x._gradient())
        
    def cos(self,x):#Similarly, we can give user defined cos function, 
    def log(self,x):#Similarly, we can give user defined log function, 
        
    def get_value(self, value)
        return self._value(value)
    
    def get_gradient(self, value, direc)
        convert direc to direc_unit_vector
        return np.dot(self._gradient(value), direc_unit_vector)
```

In [None]:
Note that this function