# Milestone2


## Introduction

The ability to compute rates is fundamental to nearly all areas of scientific understanding ranging from basic science to machine learning. In order to gain insights into the inner workings of a system, we must be able to understand how that system changes over time, and how different perturbations affect those changes. Oftentimes we can model the phenomena underlying a system through equations, such as those that describe temperature, motion, and force, to name a few. However, many real-world phenomena are too complex to be reduced to simple mathematical equations. Not only are these equations complex, but it can be very onerous to compute their derivatives by hand, which is necessary for a number of different applications (described below). While numerical differentiation techniques like the method of finite differences offers an alternative to symbolic differentiation for functions whose derviative cannot be computed analytically, the discretization of the problem introduces round-off errors that can accumulate to give an ultimate answer that is not accurate enough for the scientific applications at hand. Our software provides a streamlined, automated computational tool to quickly calculate the derivatives of functions by composing elementary mathematical operations. Automatic differentiation avoids the expression swell of symbolic differentiation and the imprecision of numerical differentiation. The overarching motivation of automatic differentiation is to compute the rate of change for a function of arbitrary complexity through the use of point values and elementary derivatives. AD allows us to get a numerical value of a function's derivative without approximation error.

The equations that describe these sorts of situations are often complex and messy to work with, but there are a number of ways we can gain insight from them. If we want to understand not just the current state of a system but also how it changes over time, we need to take the derivative of these complex equations.  Moreover, oftentimes it is useful to model the local behavior of a system within an extremely small window. In such instances, we may find it helpful to use a derivative . Lastly, if we want to identify extreme cases, we will need to take a derivative to identify extrema or inflection points. Our automatic differentiation package allows us to accomplish all of this to the accuracy of machine precision. 


## Background


**Automatic differentiation** is a series of processes automated by a computer program to calculate the derivative of a given function. 

A function consists of **elementary functions**, which are the building blocks of more complex functions. They are functions of a single variable (real or complex). Examples of such functions involve taking sums, products, polynomial, exponential, inverse. 

The decomposition of derivatives is enabled by the **chain rule**. 
For a given function $h(u(t), v(t))$:

$\frac{dh}{dt}$=($\frac{\partial h}{\partial u}$)($\frac{du}{dt}$)+($\frac{\partial h}{\partial v}$)($\frac{dv}{dt}$)

AD has two modes: the forward mode and the reverse mode. In this project milestone 1, we will apply **chain rule** in forward mode from inside to outside. The corresponding **graph structure of the forward mode** first calculates forward primal trace, forward tangent trace and passes of the independent variables. Then, it computes those of the dependent variables. 

The function can be accomplished using **Dual Number**. **Dual Number** is similar to **Complex Number** ($z=x + iy$ where x is the real part and y is the imaginary part. $i^{2}=-1$). Dual Number can be expressed in the form of $z=a+b*epilson$ where a,b are real numbers, $epilson^{2}=0$ and $epilson!=0$. In the context of automatic differentiation, the real part of the dual number corresponds to the primal trace whereas the dual part corresponds to the tangent trace. 

We need to apply **operator overloading** on the Dual Class. **Operator overloading** is the act of changing the behavior of an operator on its arguments. Depending on the exact argument the operator acts on, the operator may implement differently. Therefore, operator overloading is a form of polymorphism. 

When we have multiple variables, we can express the partial derivatives in the form of **Jacobian**. For a function $f(x)$ : $R^{m}$ -> $R^{n}$, the **Jacobian** is a n*m matrix consisted of the first derivatives of the mapping. 

**Newton’s method** belongs to a class of algorithms which can be solved using AD. Newton’s method is a root-finding algorithm of a non-linear function f(x) to satisfy f(x)=0. An initial guess is taken first, then Newton’s method runs iteratively to find the root of the function. Convergence depends on good initial guesses and is not guaranteed. 

### Graph structure of calculations
For the forward mode, the graph structure of the auto differentiation first calculates forward primal trace, forward tangent trace and passes of the independent variables. Then, it computes those of the dependent variables. 
For the reverse mode, the graph structure works backward from the outputs to the inputs. Since the last node has no children, the initial value of the adjoint of the last node will be 1. Then the second to last node’s adjoint is evaluated iteratively until reaching the adjoint of the last node in reverse order.

### Elementary functions 
Elementary functions are the building blocks of more complex functions. They are functions of a single variable (real or complex). Examples of such functions involve taking sums, products, polynomial, exponential, inverse. 



## How to Use 

Users can access the Jacobian of a self-defined function using our package with multiple inputs.

### Step 1 - Installation: 
Currently, our package can be git cloned with the following command:

In [None]:
git clone https://github.com/cs107-4thPrime/cs107-FinalProject.git

### Step 2 - Install Dependencies:
Navigate into the cloned folder ‘cs107-FinalProject’ and install the dependencies by command:


In [None]:
pip install -r requirements.txt

### Step 3 - Run Tests:
Run all the given tests in the tests folder.
This will report how many tests are passing and failing and the coverage report of the tests. 
We have two files (test_dual.py and test_derivatives.py) to test each of the two modules and an integration test (test_integrate.py). 

Picture of our code coverage result: 
![coverage.jpeg](attachment:coverage.jpeg)

Codes that can run locally to generate the same result. 

In [None]:
pytest --cov=src tests/

### Demo with 3 functions


![Screen%20Shot%202021-11-18%20at%208.53.21%20PM.png](attachment:Screen%20Shot%202021-11-18%20at%208.53.21%20PM.png)


![Screen%20Shot%202021-11-18%20at%208.53.36%20PM.png](attachment:Screen%20Shot%202021-11-18%20at%208.53.36%20PM.png)

![Screen%20Shot%202021-11-18%20at%208.53.42%20PM.png](attachment:Screen%20Shot%202021-11-18%20at%208.53.42%20PM.png)

![Screen%20Shot%202021-11-18%20at%208.53.50%20PM.png](attachment:Screen%20Shot%202021-11-18%20at%208.53.50%20PM.png)

### Software Organization

![Screen%20Shot%202021-11-18%20at%208.36.09%20PM.png](attachment:Screen%20Shot%202021-11-18%20at%208.36.09%20PM.png)

What modules do you plan on including? What is their basic functionality?
- We have the Dual_class.py for the Dual number data structure and Derivatives.py for the additional elementary functions which can be implemented on the Dual number. The Dual class in Dual_class.py includes an initialization function, a string function, representation function, partial function, gradient function, and multiple overloading operators. Elementary functions that are not included in the Dual class such as exponent, natural log and square root are written in the Derivatives.py.

Where will your test suite live? Will you use TravisCI? CodeCov?
- We will test our code with CodeCov which gives insight into code coverage
- We have written the tests with current code coverage of 93% . All the tests are stored in the tests directory. There are in total three test files. 
    - test_derivatives.py: test all the elementary functions
    - test_dual.py: test all the Dual number functions 
    - test_integrate.py： integration test on both elementary functions and Dual number overloading operators


How will you distribute your package (e.g. PyPI)?
- Our package will be distributed through PyPI


How will you package your software? Will you use a framework? If so, which one and why? If not, why not?
- We will package our software using a framework named Kivy which is free to use under MIT license . It can let us create cross-platform apps that work on desktop computers, iOS devices, and Android devices which gives us a lot of flexibility in choosing where we can deploy our software. There is also a complete tutorial regarding the whole process including learning the design language that supports an easy and scalable GUI. 



## Implementation

### **Dual_class.py**

We have a **Dual** class and a function (**createVariable**) in this file. 

Dual numbers are the core data structure utilized in our implementation of forward mode automatic differentiation. By defining a dual number class and taking advantage of Python’s operating overloading capabilities, we are able to simplify otherwise messy derivative calculations into a series of simple steps. Our dual class works by storing both the current value of the function under evaluation as well the values of its partial derivatives at that point. We chose to use a dictionary to store the dual component as it enables easy lookups for our function’s partial derivatives. Additionally, the derivatives dictionary as a whole represents our function’s gradient at a particular point. By recasting the elementary binary operations of addition, subtraction, multiplication, and division to work properly on numbers of the dual class, we are able to capture and track both our function’s value and partial derivative values as we trace through its computational graph. 

Function **createVariable(variable_name, value)** helps initialize a Dual number whenever we want to add a variable. It takes in string variable_name - the unique name for the variable - and a number value  -the current value stored in the variable. This returns a Dual number with the variable information stored. 

We have covered 16 functions in Dual class. The specifics are the in the following:
- **\__init\__(self,value, ders)** takes in a number (value) and a dictionary (ders) to initialize the Dual number. Value is the current value of the Dual number, and the dictionary stores the Jacobian
- **partial(self,variable_name) -> float** : Get the partial derivative in float type  with respect to the given variable_name
- **gradient(self) -> dict**: Get the Jacobian dictionary that contains the current partial derivatives with respect to each variable
- **getvalue(self) -> float**: Get the current value of the Dual number
- **\__str\__(self) -> str**: a string that shows the current value and Jacobian
- **\__repr\__(self) -> str**: a string that can reconstruct the Dual with eval()
- **\__add\__(self, other) -> Dual**: Dual(self) + other where other can be Dual or a constant
- **\__radd\__(self,other) -> Dual**: other + Dual(self) where other is a constant
- **\__mul\__(self,other) -> Dual**: Dual(self) * other where other can be Dual or a constant
- **\__rmul\__(self,other) -> Dual**: other * Dual(self) where other is a constant
- **\__sub\__(self,other) -> Dual**: Dual(self) - other where other can be Dual or a constant
- **\__rsub\__(self,other) -> Dual**: other - Dual(self) 
- **\__truediv\__(self,other)  -> Dual**: Dual(self) / other where other can be Dual or a constant
- **\__rtruediv\__(self,other)  -> Dual**: other / Dual(self) 
- **\__pow\__(self,other)  -> Dual**: Dual(self) ** other where other can only be a constant
- **\__neg\__(self) -> Dual**: -Dual(self) which is negative of Dual


### **Derivatives.py**

The implementation of derivatives relies on the property of the dual number. Dual number in the context of automatic differentiation has its real part corresponding to the primal traces and its dual part corresponding to the tangent trace. Derivative.py takes in dual class’ instance as input and includes the derivative definitions of a comprehensive array of elementary functions. The value of a given elementary function is computed by evaluating it at the primal trace (a.value).  The derivative of a given elementary function is computed using the chain rule: multiplying the derivative of the function evaluated at primal trace with the tangent trace (value in a.ders).
 
We have covered 14 elementary functions in Derivatives.py. The specifics are the in the following:
- Trig functions
    - **sin(a)** function takes in Dual number a as input and calculates the sine of dual number a. 
    - **cos(a)** function takes in Dual number a as input and calculates the cosine of dual number a.
    - **tan(a)** function takes in Dual number a as input and calculates the tangent of dual number a.
- Inverse trig functions
    - **arcsin(a)** takes in Dual number a as input and calculates the inverse of sine of dual number a. 
    - **arccos(a)** takes in Dual number a as input and calculates the inverse of cosine of dual number a. 
    - **arctangent(a)** takes in Dual number a as input and calculates the inverse of tangent of dual number a. 
- Exponential functions
    - **exp(a)** function takes in Dual number a as input and calculates the power a of base e.
    - **power(a, p)** function takes in two arguments: Dual number a  and base p (integer or float) and calculates the power a of base p.
- Hyperbolic functions
    - **sinh(a)** takes in Dual number a as input and calculates sinh of dual number a.
    - **cosh(a)** takes in Dual number a as input and calculates the cosh of dual number a.
    - **tanh(a)** takes in Dual number a as input and calculates the tanh of dual number a. 
- Logistic functions
    - **logistic(a)** takes in Dual number a as input and calculates the logistic value of dual number a. 
- Logarithms
    - **ln(a)** function takes in Dual number a as input and calculates its natural log.
    - **log(a, base)** function takes in two arguments: Dual number a and base (integer) and calculates the log of dual number a with base n. 
- Square root
    - **sqrt(a)** takes in Dual number a as input and calculates the square root of dual number a. 


## Future Features
### Reverse AD 
We will extend our software package to support reverse mode automatic differentiation. One of the drawbacks of our forward mode implementation is that it scales linearly with the number of inputs that we wish to find derivatives with respect to. This makes forward mode very costly and inefficient for calculating the gradient of large functions of many inputs (as is often the case in machine learning applications.) By taking advantage of the symmetry of the chain rule and storing the parent-child relationships on our first pass, we can actually compute complex gradients in two passes (one forward and one reverse pass). This is the functionality and flexibility that we will enable in reverse mode. 

### Extension to support vectorized inputs 
Instead of differentiating on one variable we can extend our framework to handle many variables at once by supporting vectorized inputs. This would allow our 


## Licensing

We would like to use the MIT license because automatic differentiation has been implemented by many people already, and the math python package we will be using is also a standard python package. Therefore, we are happy to make our code open source and free to use for people who are also interested in automatic differentiation.  

[The link to MIT's licensing website](https://opensource.org/licenses/MIT)


## Feedback

Introduction (1.75/2): Your introduction should motivate the need for automatic differentiation (AD). What methods can we use to calculate derivatives numerically? What are the strengths/weaknesses of each of these methods, and how does AD address these problems?

Background (1.5/2): It feels as though the background is just a collection of things that have been stitched together without much thought. How do each of these topics fit together in the context of AD?

Example usage (3/3): Very thorough!

Software organization (2/2)

Implementation (4/4): The math module is fine; you might also consider numpy for better performance.

Licensing (2/2)

One last comment: going forward, you should always work on a feature branch.