# AD20 Milestone 1
Group 20: Lindsey Brown, Xinyue Wang, Kevin Yoon

# Table of Contents

**1. Introduction**

    1.1 Automatic Differentiation as a Solution to the Problem of Computing Derivatives
    1.2 Application of AD Techniques
    
**2. Background**

    2.1 Chain Rule
    2.2 Computational Graph Structure
    2.3 Dual Numbers
    2.4 Elementary Functions
    
**3. Package Usage**

    3.1 User Interaction
    3.2 Importing AD20
    3.3 Instantiating AD20 Objects
    
**4. Software Organization**

    4.1 Directory Structure
    4.2 Modules and Functionality
    4.3 Testing and Coverage
    4.4 Package Distribution
    
**5. Implementation**

    5.1 Core Data Structures
    5.2 Classes
    5.3 Class Methods and Attributes
    5.4 External Dependencies
    5.5 Elementary Functions

# 1. Introduction
The AD20 package performs the forward mode of automatic differentiation of user defined functions, evaluating both the function and its derivatives to machine precision.

## 1.1 Automatic Differentiation as a Solution to the Problem of Computing Derivatives

Differentiation is a fundamental operation for computational science. Used in a variety of applications from optimization to sensitivity analysis, differentiation is most useful when two conditions are met: it must be exact (up to machine precision) and computationally efficient.

Automatic differentiation (AD) (i.e. algorithmic differentiation, computational differentiation) computes the derivative of a function, unique for its ability to handle complex combinations of functions without sacrificing the accuracy. Regardless of how complex the function may be, AD takes advantage of the fact that the function can be decomposed to a sequence of elementary arithmetic operations (addition, subtraction, multiplication, division, etc.) and elementary functions (exp, log, sin, cos, etc.). 

Through computing the derivatives of these basic elementary functions and repeatedly applying the chain rule, AD meets the two aforementioned conditions and distinguishes itself from other modes of differentiation, namely numerical differentiation and symbolic differentiation. 

- While numerical differentiation may be easy to implement and can flexibly handle any types of functions, accuracy is sacrificed due to truncation and rounding errors - numerical differentiation serves more as an estimation technique based on small inputs. Unlike numerical differentiation, automatic differentiation does not rely on approximating the derivative through the choice of a small perturbation in the input, and instead computes derivatives exactly to machine precision, thus avoiding these accuracy and stability problems.


- While symbolic differentiation may ensure accuracy up to machine precision, computational efficiency is sacrified due to its nature of building complex expression trees. For complex functions, these expression trees can quickly become very large with mathematical expressions. Unlike symbolic differentiation, automatic differentiation views functions as compositions of basic operations, remains accurate up to machine precision, and maintains computational efficiency since it does not require the buildup and evaluation of complex expression trees.
 
Thus, it is clear that automatic differentiation has advantages over other commonly used techniques for computing derivatives. These advantages make the use of AD attractive to many scientific applications. 

## 1.2 Application of AD Techniques

Through its improved accuracy and efficiency, AD has many different applications where accuracy, precision, and efficiency is crucial in computation. Some include 

- Machine learning (ability to understand data and make models/predictions)
- Parameter optimization (ability to choose best parameter values under given conditions)
- Sensitivity analysis (ability to understand different factors and their impact)
- Physical modeling (ability to visualize and depict data through models)
- Probabilistic inference (i.e. Hamiltonian Monte Carlo)

# 2. Background

## 2.1 Chain Rule

Chain rule lies at the heart of AD as it decomposes complex combinations of functions into simpler, more elementary functions then computing the derivatives of the elementary functions to piece them together to get the overall derivative. By expressing the function as a composition of elementary functions and operations, derivative of the function can be calculated.

Suppose we have a function $f\left(g\left(t\right)\right)$ and we want the derivative of $f$ with respect to $t$.  The derivative is $$\dfrac{\partial f}{\partial t} = \dfrac{\partial f}{\partial g}\dfrac{\partial g}{\partial t}.$$

## 2.2 Graph Structure on Calculations

### The Computational Graph
Consider the example function $$f\left(x,y\right) = x^{3} + \sin(5y)$$

 The evaluation trace looks like:

| Trace | Elementary Function |  Elementary Function Derivative | $\nabla_{x}$ Value  | $\nabla_{y}$ Value  | 
| :---: | :-----------------: | :-----------: | :-------------: | :----------------------: | :---------------------: | 
| $x_{1}$ | $x_{1}$ |$\dot{x_{1}}$|$1$|$0$|
| $x_{2}$ | $x_{2}$ |$\dot{x_{2}}$|$0$|$1$|
| $x_{3}$ | $x_{1}^3$ |$3x_{1}^2\dot{x_{1}}$|$3x^3$|$0$|
| $x_{4}$ | $5x_{2}$ |$5\dot{x_{2}}$|$0$|$5$|
| $x_{5}$ | $\sin{x_{4}}$ |$\cos{x_{4}}\dot{x_{4}}$|$0$|$5\cos{5y}$|
| $x_{6}$ | $x_{3}+x_{5}$ | $\dot{x_{3}}+\dot{x_{5}}$ |$3x^2$|$5\cos{5y}$|

Then, in the end, the derivative of $f$, call it $f'$, comes out to be $$f' = 3x^2 + 5\cos(5y)$$

One way to visualize what is going on is to represent the evaluation trace with a graph.

![comp-graph](figs/graph.jpg)


## 2.3 Dual Numbers
A dual number has a real part and a dual part.  We write $$f = y + \epsilon y^{\prime}$$ and refer to $y^{\prime}$ as the dual part.  We *define* the number $\epsilon$ so that $\epsilon^{2} = 0$.  **This does not mean that $\epsilon$ is zero!**  $\epsilon$ is not a real number.

#### Some properties of dual numbers:
* Conjugate:  $f^{*} = y - \epsilon y^{\prime}$.
* Magnitude: $\left|f\right|^{2} = ff^{*} = \left(y+\epsilon y^{\prime}\right)\left(y-\epsilon y^{\prime}\right) = y^{2}$.
* Polar form: $f = y\left(1 + \dfrac{y^{\prime}}{y}\right)$.

### Example (from lecture)
Recall that the derivative of $f=z^{2}$ is $f^{\prime} = 2zz^{\prime} = 2z$.

Now if we extend $z$ so that it has a real part and a dual part ($z\leftarrow z + \epsilon z^{\prime}$) and evaluate $f$ we have
\begin{align}
  f &= \left(z + \epsilon z^{\prime}\right)^{2} \\
    &= z^{2} + 2zz^{\prime}\epsilon + \underbrace{z^{\prime^{2}}\epsilon^{2}}_{=0} \\
    &= z^{2} + 2zz^{\prime}\epsilon.
\end{align}

## 2.4 Elementary Functions
Any complex equation can be broken into combinations of the elementary functions. Some of those include the elementary arithmetic operations (addition, subtraction, multiplication, division, etc.) and elementary functions (exp, log, sin, cos, tan, sqrt etc.). We will not go into details about how to calculate the derivatives of those functions here, but more information can be found on the following link.

http://www.nabla.hr/FU-DerivativeA5.htm


# 3. Package Usage

## 3.1 User Interaction
Users should use ADnum objects to wrap up all mathematical meaning values and formulas. All operations are processed as an ADnum object. Users need to create an ADnum object for each input variable and use all the mathematical functions defined in the ADmath library to implement special functions.

## 3.2 Importing AD20
	import AD20
or 

	from AD20 import ADnum
    
	from AD20 import ADmath
    
	from AD20 import ADgraph


## 3.3 Instantiating AD20
	from AD20 import ADnum
	from AD20 import ADmath
	a = ADnum(2)
	b = ADmath.sin(a)
	
Both a and b are ADnum objects, which have the attributes described in the class implementation below.


# 4. Software Organization
We would like to let the user use all numerical operations defined in our AD20 package. Within AD20 package, there is ADnum module, ADmath module  and ADgraph module

For either a scalar or vector input (either as a numpy array or a list), we will convert the input into an ADnum object, which can interact with the other modules. ADnum will also contain an overloaded version of basic operations, including addition, subtraction, multiplication, division, and exponentiation, so that the value and derivative are correctly updated.

For special functions, we will use ADmath to compute the numerical values and the corresponding derivatives. In particular, ADmath will contain functions abs, exp, log, sin, cos, and tan.

To show a calculation graph, we use ADgraph (and ADtable) to show the forward mode calculation process.

###  4.1 Directory Structure
    AD20/
        AD20/
            __init__.py
                ADnum/
                    __init__.py
                    ADnum.py
                ADmath/
                    __init__.py
                    ADmath.py
                ADgraph/
                    __init__.py
                    ADgraph.py
                    ADtable.py
        Tests/
            __init__.py
            test_AD20.py
    README.md
    setup.py
    LICENSE

###  4.2 Modules and Functionality
ADnum: wrap numbers or tuples as a AD object. Moreover, do all of the numerical operations and keep track of all derivatives
ADmath: assign special math meanings and functions to ADnum’s and keep track of the derivatives
ADgraph: trace the calculation process and generate table or graph

In particular, these modules contain the following:
ADnum.py contains the class for ADnum.  This class is fully described below.  It takes as input a single scalar input or a vector input (as either a numpy array or list) and outputs an ADnum object.  Within this class, we will overload basic operations as outlined below.

###  4.3 Testing and Coverage
The tests will be stored in the tests directory (see the repo structure above).  We will use pytest to perform our testing, using TravisCI and Coveralls for continuous integration and verifying code coverage respectively.

###  4.4 Package Distribution
We will use PIP in PyPi to distribute our package.

# 5. Implementation
Automatic differentiation will be implemented through the use of ADnum objects and building the functions for which we want to take derivatives from these ADnum objects as well as the special functions defined for ADnum objects in the ADmath module.  Each of these functions is itself an ADnum object so has an associated value and derivative which was updated when constructing the ADnum object through basic operations and special functions.

### 5.1 Core Data Structures
The main data structure used to represent the functions on which we are performing automatic differentiation will be tuples, with the first entry the value of the ADnum object and the second entry its derivative.  In the case of scalar input, the derivative is also a float.  For vector valued input, the derivative is the gradient of the function, stored as a numpy array.
In order to build and store computational graphs, we will use a dictionary as the computational graph, where the keys are the nodes of the graph, stored as ADnum objects, and the values associated with each key are the children of that node, stored as lists of ADnum objects.

### 5.2 Implemented Classes
The main class will be implemented in the ADnum module, which will create ADnum objects.  The ADnum objects will store the current value of the function and its derivative as attributes.  By combining simple ADnum objects with basic operations and simple functions, we can construct any function we like.  For example,

    X = AD20.ADnum(4)
    Y = AD20.ADnum(0)
    F = X+ADmath.sin(Y)
    
Where F is now an ADnum object, and ADmath.sin() is a specially defined sine function which takes as input an ADnum object and returns an ADnum object, which allows us to evaluate F and its derivative,

    F.val = 4
    F.deriv = [1, 1] 
    X.val = 4
    X.deriv = 1

In addition to the sine function, the ADmath module will also implement the other trigonometric functions, the natural exponential, and the natural logarithm.

We will also implement a class, ADgraph, for computational graphs.  The constructor takes as input a dictionary, as described above where the keys are nodes and values are the children of the key node. 	This can then be used to perform forward propagation and could be extended later to include back propagation as an extension of our project.
 
### 5.3 Class Methods and Attributes
Each ADnum object will have two attributes for the two major functions desired of the class.  The val attribute will be the ADnum object evaluated at the given value and the der attribute will be its derivative.  In addition, each ADnum object will have a graph attribute, which stores the dictionary which can be used to build a computational graph in the ADgraph class.  The ADnum class will also include methods to overload basic operations, __add__(), __radd__(), __mul__(), __rmul__(), __sub__(), __truedivide__(), and __pow__().  The result of overloading is that the adding, subtracting, multiplying, dividing, or exponentiating two ADnum objects returns an ADnum object as well as addition or multiplication by a constant.  For example, Y1, Y2, and Y3 would all be recognized as ADnum objects:

    X1= ADnum(7)
    X2 = ADnum(15)
    Y1 = X1+X2
    Y2 = X1*X2+X1
    Y3 = 5*X1+X2+100

The resulting ADnum objects have both a value and derivative.

The ADgraph class will be constructed from a dictionary, stored in the attribute dict.  This class will also have an attribute inputs, which stores the nodes which have no parents.  This class will implement a deriv method which returns the derivative from the computational graph.

### 5.4 External Dependencies
In order to implement the elementary functions, our class will rely on numpy’s implementation of the trigonometric functions, exponential functions, and natural logarithms for evaluation of these special functions.

We will also use numpy to implement matrix and vector multiplication in cases where the function is either vector valued or takes a vector as an input.

### 5.5 Elementary Functions
As outlined above, we will have a special ADmath module which defines the trigonometric, exponential, and logarithmic functions to be used on ADnum objects, so that they both take as input and return an ADnum object.