# CS207 Project Group 9
# Milestone 1

*****

# I. Introduction

The software implements **‘Automatic Differentiation’ (AD)**. This is a technique to computationally evaluate the derivative of a specified function. Importantly, AD is not the same as symbolic differentiation or numerical differentiation, and holds important advantages over both. Symbolic differentiation, which is equivalent to analytically solving differential equations, can find the exact solution (with machine precision), but is very computationally expensive, and so with very large functions can be infeasible. Numerical differentiation, which uses the finite-difference approximation method, is computationally efficient, but is ultimately only approximate, and can be subject to both rounding error and discretisation error, meaning that it cannot be perfectly accurate. Both of these ‘traditional’ methods of differentiation run into problems and inefficiencies when calculating higher derivatives and partial derivatives with respect to many inputs (which is an important component of gradient-based optimisation methods). 


Automatic differentiation solves all these problems as it is able to solve derivatives to machine precision with comparative computational efficiency. As a result, automatic differentiation has incredibly important applications; in its ‘reverse-mode’ (discussed below), it is the basis of back-propagation, a fundamental process in neural network machine learning algorithms - as such this technique is leveraged by open-source machine learning libraries such as TensorFlow. A result of its efficient accuracy and iterative method, AD is capable of algorithmic differentiation: Because of the fact that every computer program, from mathematical algorithms to web-pages, can be expressed as a sequence and combination of arithmetic operations and elementary functions, the derivative of any computer program can be found using automatic differentiation.

# II. Background

Automatic differentiation is essentially the iterative application of the chain-rule. As mentioned above, any function can be considered a sequence of basic arithmetic operations or elementary functions (addition, multiplication, division, subtraction, trigonometric functions, exponential, logarithms etc.) and so any function can be interpreted in the following way (albeit often less simply):
	$$y = f(x) = f(g(h((x)))$$

This can be rewritten as:
$$y = f(g(h(x0))) = f(g(x1)) = f(x2) = x3$$
	
Often, this decomposition is represented as an acyclic, directed computational graph that illustrates the route from the base function x0 to y, as illustrated by the example below:

$ x_0\rightarrow^{h(x)}x_1\rightarrow^{g(x)}x_2\rightarrow^{f(x)}x_3\rightarrow y $



In forward mode, automatic differentiation works by decomposing the function into this structure, and working through each component function finding the derivative using the chain rule ‘inside out’. That is to say, dx0/dx is found first, following by dx1/dx and so on until dy/dx itself is found. All this requires initial values to be set for x0, and x0’.


Reverse mode, however, works in the opposite direction; rather than finding the derivative of the most fundamental component, and then finding the derivative of parent expressions in terms of these children components recursively until the final gradient is found, reverse mode goes the other way. It finds the derivative of each ‘child’ function in terms of its parent function recursively until the basic level derivative is found, at which point the final gradient can be found.


One way of achieving forward mode AD is to use dual numbers. These are an extension of real numbers, somewhat analogous to imaginary numbers, such that every number additionally contains a dual component, $\epsilon$, where $\epsilon^2$ = 0. Given any polynomial function (or, in fact, any analytic real function via its Taylor series), if we replace x with (x+x'$\epsilon$), we find that the function will become: f(x) + f'(x)$\epsilon$. This provides a routine to automatically compute the derivative of the function f(x), and so is used in forward AD.

Sources: https://en.wikipedia.org/wiki/Automatic_differentiation,
	   http://www.columbia.edu/~ahd2125/post/2015/12/5/


# III. How to Use

## _a) Installation_ 

PyPI

## _b) Usage_

The package can be imported simply through:
```python
from AutoDiff import AD
```

An alternative option that improves readability in the long run is:

In [4]:
import numbers
import numpy as np
from AutoDiff.AD import AutoDiff, AD_create, AD_stack


For a simple, univariate example, let's find the value of the derivate of $y=3x^2-4x$  at $x=3$.

we create an instance of the AutoDiff object as the basic building block for the equation - in other words, a single value of the independent variable. This object can then be used with binary and unary mathematical operators to construct the function being evaluated. For each operation, a new function value (AutoDiff.val) and derivative value (AutoDiff.der) is calculated, such that once all operations are complete for the function, the function object's der attribute will be that function's derivative at the point specificied at AutoDiff object creation.

In [29]:
a = 3.0
x = AutoDiff(a)
y = 3*x**2 - 4*x
y.der


array([[14.]])

More complex cases can be handled (N.B.: for mathematical operations that are not binary operations that can be overloaded, the notation is different, as the operation must be called as a method of the AutoDiff object):

$$y = \frac{x^2(1-x^3)}{\sin(x) - 2\cos^2(x)}$$

In [31]:
x = AutoDiff(a)

y = (x**2*(1-x**3))/(x.sin()-2*x.cos()**2)

y.der

array([[109.81641666]])

# IV. Software Organisation

## _a) Directory Structure_

## _b) Modules_

All code is contained within the **AutoDiff** module. Within here are two Python modules:

1. AD.py: This contains all functional code for automatic differentiation. within this file there are (currently) three files:
    * AD_create

In [1]:
type(1) == int

True

In [2]:
np.array([1])[0]

NameError: name 'np' is not defined

In [34]:
np.array([1]).reshape(-1)[0]

numpy.int64

In [3]:
der=1
der = np.array(der)

# This is the broken code:
for i in der.reshape(-1):
    if type(der[0])==list:
        raise ValueError('Input dimensions do not match!')

NameError: name 'np' is not defined