### Full Example Usage Including Imports for a Scalar Function of a Scalar Variable

Here we show several examples of scalar functions of scalar variables, including an example of using our package to implement Newton's Method. 

In [1]:
#commands to change to the 
%pwd 
%cd ..
%cd AD20

/Users/jiwhanyoon/Desktop/cs207/cs207-FinalProject
/Users/jiwhanyoon/Desktop/cs207/cs207-FinalProject/AD20


In [2]:
#necessary imports
import AD20
import numpy as np
from AD20.ADnum_multivar import ADnum
from AD20 import ADmath_multivar as ADmath

x = ADnum(3, ins = 2, ind = 0) # Step 1: intialize x to a specific value
y = ADnum(4, ins = 2, ind= 1)
f = 2 * y + 2*x**2
# f = 2 * x # Step 2: write a function which we would like to take the derivative

# Steps 3 and 4: Use the class attributes to access the value and deriviative of the function at the value of the input x 

print(f.val) #should equal 81
print(f.der) #should equal 72
print(x.val) #should equal 3
print(x.der) #should equal 1
print(y.val)
print(y.der)

26.0
[12.  2.]
3.0
[1. 0.]
4.0
[0. 1.]


In [3]:
#another example with a trignometric function
x = ADnum(np.pi, der = 1) # Step 1: initialize x, this time at pi
f = ADmath.sin(x) # Step 2: create a function, using elementary functions from the ADmath module

#Steps 3 and 4: Use the class attributes to access the value and derivative
print(f.val) # should print 1.22e-16 due to floating point error in numpy implementation (should be 0)
print(f.der) # should print -1.0
print(x.val) # should print 3.14
print(x.der) # should print 1

1.2246467991473532e-16
-1.0
3.141592653589793
1.0


Suppose we wanted to easily be able to access the value and derivative of a function at many different points.  As an alternative to the method for defining `f` in the previous two examples, we could define `f` as a python function:

In [7]:
#example to easily access value and derivative at multiple points by defining f as a function
def f(x):
    return x+ADmath.exp(x)

#get the value and derivative at 0
y = ADnum(1, der = 1)
print(f(y).val, f(y).der)

#an alternate approach to get the value and derivative at 1
print(f(ADnum(1, der = 1)).val, f(ADnum(1, der = 1)).der)

3.718281828459045 3.718281828459045
3.718281828459045 3.718281828459045


Notice that in the above example, we required the natural exponential, an elementary function, to be used from the ADmath package, so that f may take as input and return an ADnum object.

## 3.5 Newton's Method for a Scalar Valued Function
One basic application of differentiation is Newton's method for finding roots of a function.  For demonstration of using our package for such an application, we will consider the function
$$f(x) = x^2 + \sin(x)$$
which we know has a root at $x=0$.  The plot below also shows that the function has an additional root near -1.

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

x = np.linspace(-2.5, 2.55, 1000)
f = x**2+np.sin(x)

plt.plot(x, f, linewidth = 2)
plt.plot(x, np.zeros((1000,)), '--')
plt.xlabel('x', fontsize = 16)
plt.ylabel('f(x)', fontsize = 16)
plt.xticks(fontsize = 14)
plt.yticks(fontsize =14)
plt.title('Plot of f(x) Showing Two Roots', fontsize = 18)

In [None]:
#implementation of Newton's method using AD20, without hardcoding the derivative

#function that we wish to find the roots of
def f(x):
    return x**2+ADmath.sin(x)

#Newton's method
x = ADnum(1) #set an initial guess for the root

for i in range(1000):
    dx = -f(x).val/f(x).der #get change using ADnum attributes
    if np.abs(dx) < .000001: #check if within some tolerance
        print('Root found at:' + str(x.val))
        break
    x = x+dx #update the guess


In the above, we found the root at zero.  Using a different initialization point, we can find the other root of the function.

In [None]:
y = ADnum(-1) #set an initial guess for the root

for i in range(1000):
    dy = -f(y).val/f(y).der #get change using ADnum attributes
    if np.abs(dy) < .000001: #check if within some tolerance
        print('Root found at:' + str(y.val))
        break
    y = y+dy #update the guess

## 3.6 Future Extensions of the Basic Scalar Implementation
In the future, we will extend our basic package to find the gradient of scalar valued functions of multiple variables and the Jacobian of vector valued functions of vector valued input.  The following gives an outline of the usage of this future work.


### Functions of Multiple Variables
In case case with a function with more than one variable, the sequence is similar except when creating ADnum objects, the user must specify the total number of input variables, and the index of each variable in the gradient (so the the constructor of the ADnum class can correctly assign the derivative of the input variable):
    1. initialize each variable to a specific value where the function should be evaluated
    2. return the gradient as a numpy array when `f.der` is called 
    
```python
    # scalar function, multi variable
    >>> x = ADnum(2, ins = 2, ind = 0)
    >>> y = ADnum(3, ins = 2, ind = 1)
    >>> f = 3 * x**3 + 2 * y**3
    >>> print(f.val)
    >>> print(f.der)
    >>> print(x.val)
    >>> print(x.der)
    >>> print(y.val)
    >>> print(y.der)
    78
    np.array([36, 54])
    2
    np.array([1, 0])
    3
    np.array([0, 1])
```

### Vector-valued Functions
Each component of a vector valued function is just a scalar valued function of one or more input variables.  Thus, we can easily combine the previous results to get the Jacobian of a vector valued function.  By updating our methods to broadcast appropriately for an array, we can easily access these attributes,

```python
    >>> x = ADnum(2, ins = 2, ind = 0)
    >>> y = ADnum(3, ins = 2, ind = 0)
    >>> F = [x**2, x+y, 4*y]
    >>> F[1].der = [1 , 1]
    >>> F.der = [[4, 0], [1, 1], [0, 4]]
    >>> F[0].val = 4
    >>> F.val = [4, 5, 12]
```

# 4. Software Organization
We would like to let the user use all numerical operations defined in our AD20 package. The AD20 package contains the `ADnum` module, the `ADmath` module, and the `ADgraph` module.

For either a scalar or vector input (either as a numpy array or a list), we will convert the input into an `ADnum` object, which can interact with the other modules. `ADnum` will also contain an overloaded version of basic operations, including addition, subtraction, multiplication, division, and exponentiation, so that the value and derivative are correctly updated after combining ADnum objects through each of these operations.

For special functions, we will use `ADmath` to compute the numerical values and the corresponding derivatives. In particular, `ADmath` will contain functions abs, exp, log, sin, cos, and tan.

To show a calculation graph, we use `ADgrap`h (and `ADtable`) to show the forward mode calculation process.

###  4.1 Directory Structure
    AD20/
        AD20/
            __init__.py
            ADnum.py
            ADmath.py

        Tests/
            __init__.py
            test_AD20.py
    docs/
        Milestone 1.ipynb
        Milestone 2.ipynb
        figs/
    README.md
    setup.cfg
    requirements.txt
    LICENSE

###  4.2 Modules and Functionality
Our package consists of three main modules:

- **ADnum:** Contains the `ADnum` class (fully described below).  Create `ADnum` objects, which (inspired by the dual numbers) are defined by the attributes of a value and a derivative, from numbers or tuples.  Define all of the numerical operations for `ADnum` objects, so that they correctly track all derivatives.

- **ADmath:** Define elementary functions for `ADnum` objects, correctly tracking all of the derivatives.

and in future implementation,

- **ADgraph:** Create `ADgraph` objects, which can be used to show the computation process in either a graph (ADgraph.py) or table (ADtable.py)

###  4.3 Testing and Coverage
All tests are contained in the test_AD20.py file in the tests directory (see the repo structure above).  We will use pytest to perform our testing, using `TravisCI` and `Coveralls` for continuous integration and verifying code coverage respectively.  The test suite contains unit tests for all of the class methods implemented in ADnum and all the elementary functions implemented in ADmath.  This suite also contains several functions which are composed of several different operations and elementary functions for more advanced testing.

###  4.4 Package Distribution
For the final project submission, we will use `PIP` in `PyPi` to distribute our package. This will allow the user to install the package by using the command

    pip install AD20
    
Note that the current method for installing `AD20` through git is outlined in user interaction with numpy as the only current external dependency.

# 5. Implementation
Automatic differentiation is implemented through the use of `ADnum` objects and building the functions for which we want to take derivatives from these `ADnum` objects as well as the special elementary functions defined for `ADnum` objects in the `ADmath` module.  Each of these functions is itself an `ADnum` object so has an associated value and derivative which was updated when constructing the `ADnum` object through basic operations and elementary functions.

### 5.1 Core Data Structures
`ADnum` objects can be thought of as a tuple, where the first entry is the value and the second entry is the derivative.  Each of these attributes is either a scalar or a numpy array for ease of computation.  In the case of scalar input, the derivative is also a scalar.  For vector valued input, the derivative is the gradient of the function, stored as a numpy array.

In order to build and store computational graphs in the ADgraph module, we will use a dictionary to represent the graph, where the keys are the nodes of the graph, stored as `ADnum` objects, and the values associated with each key are the children of that node, stored as lists of ADnum objects.

### 5.2 Implemented Classes, Methods, and Attributes
The main class is the `ADnum` module, which is used to create `ADnum` objects.  It takes as input a single scalar input or a vector input (as a numpy array) and outputs an `ADnum` object.  The `ADnum` objects store the current value of the function and its derivative as attributes. 

These two attributes represent the two major functionalities desired of the class.  The `val` attribute is the ADnum object evaluated at the given value and the `der` attribute is its derivative at the given value. The constructor for this class, sets the value of the object and optionally also sets the value of its derivative,

```python
#ADnum.py
class ADnum():
    def __init__(self, a, d = 1):
        self.val = a
        self.der = d
        self.graph = {} #for future implementation
```

The `ADnum` class also includes methods to overload basic operations, __add__(), __radd__(), __mul__(), __rmul__(), __sub__(), __rsub__(), __truediv__(), __rtruediv__(), __pow__(), and __rpow__().  The result of overloading is that the adding, subtracting, multiplying, dividing, or exponentiating two `ADnum` objects returns an `ADnum` object as well as addition or multiplication by a constant.  For example, Y1, Y2, and Y3 are all recognized as `ADnum` objects:

```python
    X1= ADnum(7)
    X2 = ADnum(15)
    Y1 = X1+X2
    Y2 = X1*X2+X1
    Y3 = 5*X1+X2+100
```

The resulting ADnum objects have both a value and derivative.  An example overloaded function is the following:


```python
#ADnum.py
def __mul__(self,other):
        try:
            return ADnum(self.val*other.val, self.val*other.der+self.der*other.val)
        except AttributeError:
            other = ADnum(other, 0)
            return self*other
```

By combining simple `ADnum` objects with basic operations and simple functions, we can construct any function we like.

```python
    X = ADnum(4)
    F = X + ADmath.sin(4-x)
```    
Where F is now an `ADnum` object, and ADmath.sin() is a specially defined sine function which takes as input an `ADnum` object and returns an `ADnum` object, which allows us to evaluate F and its derivative,

```python
    F.val = 4
    F.der = 0
    X.val = 4
    X.der = 1
```

In addition to the sine function used in the example above, the `ADmath` module also implements the trigonometric functions: `sin()`, `cos()`, `tan()`, `csc()`, `sec()`, `cot()`, the inverse trigonometric functions: `arcsin()`, `arccos()`, `arctan()`, the hyperbolic trig functions: `sinh()`, `cosh()`, `tanh()`, and the natural exponential `exp()` and natural logarithm `log()`.  All of the functions defined in the `ADmath` module define elementary functions of `ADnum` objects, so that the output is also an `ADnum` object with the val and deriv attributes updated appropriately.  For example,

```python
#ADmath.py
def sin(X):
    try:
        return adn.ADnum(np.sin(X.val), np.cos(X.val)*X.der)
    except AttributeError:
        X = adn.ADnum(X, 0)
        return sin(X)
    
def log(X):
    try:
        return adn.ADnum(np.log(X.val), 1/X.val*X.der)
    except AttributeError:
        X = adn.ADnum(X, 0)
        return log(X)
```

We will also implement a class, `ADgraph`, for computational graphs.  The constructor takes as input a dictionary, as described above where the keys are nodes and values are the children of the key node.  The `ADgraph` class will be constructed from a dictionary, stored in the attribute dict.  This class will also have an attribute inputs, which stores the nodes which have no parents.  This class will implement methods to display the computational graphs and tables used to compute the derivatives of the `ADnum` objects.

### 5.3 External Dependencies
In order to implement the elementary functions, our ADmath relies on numpy’s implementation of the trigonometric functions, exponential functions, and natural logarithms for evaluation of these special functions, as demonstrated in the definition of the sine function for `ADnum` objects above.

We will also use numpy to implement matrix and vector multiplication in cases where the function is either vector valued or takes a vector as an input.

### 5.4 Elementary Functions
As outlined above, all elementary operations are defined for `ADnum` objects within the `ADnum` class and we have a special `ADmath` module which defines the trigonometric, exponential, and logarithmic functions to be used on ADnum objects, so that they both take as input and return an `ADnum` object, completing the set of defintions of all elementary operations and functions that can be composed to construct more complex functions.

### 5.5 Future Implementation
Our current implementation performs automatic differentiation for scalar functions of scalar variables.  Our constructor needs to be modified for functions of multiple inputs so that the `der` attribute will now be represented by a numpy array in these cases.  Minor revisions will also need to be applied to the methods defined in the `ADnum` and `ADmath` classes to ensure that these multidimensional `der` attributes are correctly updated, using the correct elementwise matrix multiplication for the numpy arrays.

We will also need to implement the functionality described in section 6, which includes an implementation of reverse mode for backpropagation and an implementation of the ADgraph class for building computational graphs and tables.

## 6. Project Extension

In order to expand our project from the basic forward mode automatic differentiation, we will make two additional developments for pedagogical and application purposes.

### 6.1 Computational Graphs and Tables
We will implement the class `ADgraph` which stores the computational graph used to compute the derivatives in forward mode.  For every operation we create an additional node which represents another trace in the program.  This graph will be stored as an attribute of the `ADnum` objects.  For pedagogical purposes, we want to be able to visualize this process so we will use external packages for displaying graphs where the edge labels display the corresponding operation.  Correspondingly, we will also develop the functionality to display a table showing the trace, elementary operation, value, and derivative at each step.  Such a tool could be useful in the classroom for teaching students how automatic differentiation works.

This will require modifying all of our methods to correctly add to the dictionary which contains the computational graph information for each operation that we have previously overloaded.  This will also involve the added challenge of ensuring compatability between our program and an external program for visualizing graphs and tables.

### 6.2 Backpropagation
The computational graph is also necessary to implement the reverse mode of automatic differentiation, which is important to many applications.  For example, backpropagation is the underlying method to fit the parameters of neural networks.  We will use the computational graph to store the structure needed to backpropagate derivatives.  Thus, we will have expanded our project to perform both forward and reverse automatic differentiation, making the package more suitable to a variety of applications. 

To perform backpropagation requires writing methods to correctly traverse the computational graph and propagate derivatives from the output through the intermediates to the inputs, which will require developing new methods for the `ADgraph`class.