**Introduction**

In the mid-17th century, Isaac Newton and Gottfried Leibniz independently discovered calculus. These groundbreaking scientific advances unfortunately led to a bitter dispute between the two that spanned the duration of their lives. [1] While the origins of calculus may be disputed, its applications are not. Differentiation allows us to identify the maxima, minima and zeros for that function. The ability to do each of these things is crucial in the context of optimization and modern machine learning.

The classical way of computing the derivative of a function spans two distinct approaches: approximate, numerical methods and symbolic methods. Each of these methods suffers from its own pitfalls, including numerical instability and long computation time. These pitfalls are magnified as our functions increase in complexity. [2] Automatic Differentiation (AD) suffers from neither instability nor long computational time, and it computes derivatives up to machine precision. AD is easily implemented via computer code, and our package will allow the user to implement to forward mode of Automatic Differentiation in python. [2]


**Background**


***What is automatic differentiation (AD)?***


Automatic differentiation (AD) is also known as algorithmic differentiation or computational differentiation [1].

AD is a set of techniques for numerically evaluating derivatives (gradients) by executing a sequence of arithmetic operations and elementary functions. The derivatives can therefore be computed automatically when applying chain rules to such a sequence of operations [2-3]. There are two major modes of AD: forward and reverse [1-3]. 

***Why is AD important?***

AD and symbolic differentiation both result in more accurate computation than numerical difference estimations. However, unlike symbolic approach, AD evaluates expressions numerically at particular numeric values and does not construct symbolic expressions [1].

***How AD works?***

The core of AD is the chain rule from Calculus. Chain rule computes the derivative for composition of two or more functions, where the derivative of a function measures the change of the output value relative to the change in input.
For a composition of f and h, its derivative can be calculated as the following:
        $$\frac{d}{dx}[f(h(x))] = f'(h(x))h'(x)$$

With the help of a computational graph and its traces, partial derivatives relative to x and y are combinations of derivatives of elementary functions, which can be calculated analytically.
Let’s consider a simple example: $$f = xcos(y)+xy$$
In AD, its computational graph and evaluation trace for forward mode looks like the following:

**Computational Graph:**

![Computational Graph](BitterDispute_ADGraph.png)

**Evaluation Trace:**

| Trace   | Elementary Function      | Current Value                   | Elementary Function Derivative       | $\nabla_{x}$ Value  | $\nabla_{y}$ Value  |
| :---: | :-----------------: | :----------------------: | :----------------------------: | :-----------------:  | :-----------------: |
| $x_{1}$ | $x_{1}$                  | $x$                     | $\dot{x}_{1}$                     | $1$ | $0$ |
| $x_{2}$ | $x_{2}$                  | $y$                     | $\dot{x}_{2}$                     | $0$ | $1$ |
| $x_{3}$ | $cos(x_{2})$             | $cos(y)$                | $-sin(x_{2})\dot{x}_{2}$          | $0$ | $-sin(y)$ |
| $x_{4}$ | $x_{1}x_{2}$             | $xy$           | $\dot{x}_{1}x_{2}+x_{1}\dot{x}_{2}$      | $y$ | $x$ |
| $x_{5}$ | $x_{1}cos(x_{2})$        | $xcos(y)$       | $\dot{x}_{1}(-sin(x_{2}))+x_{1}\dot{x}_{2}$|$-sin(y)$ | $x$ |
| $x_{6}$ | $x_{3}+x_{5}$        | $cos(y)+xcos(y)$       | $\dot{x}_{3}+\dot{x}_{5}$    | $-sin(y)$ | $-sin(y)+x$ |


We found that $$\frac{d}{dx}f=-sin(y)$$ and $$\frac{d}{dy}f=sin(y)+x$$
In forward mode, the values and their derivatives are stored along the chain accumulatively.

***Applications of AD:***

AD has been used in many applications, including optimization (solving nonlinear equations utilizing gradients/Hessians), inverse problems/data assimilation, neural networks, etc. [4]


**How to Use BitterDispute**

*How do we envision that a user will interact with our package? What should they import? How can they instantiate AD objects?*

First, a user should be able to download our package using pip or conda. Our instructions will include steps on creating a custom environment to ensure our package runs effectively. We will encourage the use of conda for the creation and activation of our custom environment but will aim to allow other tools like pipenv or virtualenv. We will provide the required package version dependencies in a file called AD_requirements.txt. Some of these dependencies will include open-source packages like numpy, scipy, pandas or math to provide a user with more flexibility in their function definition and to be used by our software during differentiation calculations.

    ‘pip install BitterDispute' or ‘conda install BitterDispute'

After installation in a user’s environment, the user will be able to import our AD package into their current Python, Jupyter or other session using the simple command:

    from BitterDispute import AD
  
We intend to bundle all functionality into a single module for the user to import. Within that module, we will have multiple callable properties for the user to call when they need to retrieve specific information about the input formula and the derived value or formula.

A user will initiate automatic differentiation by instantiating the class with a single command ***AD( )***. From there, the software package will begin to print a series of instructional steps for the user to follow, including inputting the number of formula variables, the formula itself written in terms of those variables and hard-coded values for each of those variables that the user wishes to derive the equation at. These input parameters will be saved as a series of properties on the object, for later reference by the user if desired. The object will print a statement summarizing the formula and variable values and return to output the derived value based on the inputted numbers.
If the user wishes to reference these values later, they will be saved in callable properties and the user will need to save an instance of our class into a local variable. This can be done using commands such as ***‘X = AD( )’***. After that, the user will be able to reference saved parameters such as:

    X.derivative   (Outputted value based on variable value inputs)

    X.values       (Variable value inputs)

    X.formula      (String representing formula input)

    X.trace_count  (Count of trace steps necessary during derivation)


We want to provide advanced users of our package with the ability to accelerate through steps if they understand our package’s functionality. To do this, we intend to allow a user to enter an arbitrary number of parameters, where the first is the formula and the remainder are of type int or float during instantiation, using a command like ***AD(1, 4, 8, 3)***. This will automatically derive answers to the question about how many variables are in the formula and will assign the input values to those variables in the order they appear in the formula from left to right.

Lastly, we are considering providing the user with the ability to re-use an instance of our package with a saved formula. If a user has saved an instance to a local variable (like X from earlier), the user will be able to call X(x, y) where x and y are two values of type int or float and X is an instantiation of AD with a saved formula of just two variables. Calling an existing instance will update the X.derivative value with the new derived value, keep the saved formula and update the input values saved in X.values. 
 


**Software Organization**

*What will the directory structure look like?*

    /bitterdispute
        __init__.py
        README.md
        LICENSE
        requirements.txt
        forward_mode/
            __init__.py
            AD.py
            AD_scalars.py
            AD_functions.py
            AD_vectors.py
        optimization/
            __init__.py
            AD_optimum.py
        tests/
            __init.py
            tests_forward.py
            tests_optimization.py
 
*What modules do you plan on including? What is their basic functionality?*

Without our package */bitterdispute*, we will have separate modules for our forward mode implementation, optimization extension and test cases. Our forward mode module will have all necessary classes and functions to execute forward mode automatic differentiation. Our optimization module will include all code extending forward mode against an optimization use case. Lastly, our tests module will include all tests run against both modules.

*Where will your test suite live? Will you use TravisCI? CodeCov?*

Our tests will live in a dedicated module to assist with robust test creation. We will be using both Travis-CI and CodeCov to monitor our commits and to ensure that keeping our tests in a separate module doesn’t prevent us from maintaining sufficient code coverage.

*How will you distribute your package (e.g. PyPI)?*
We intend to use the PyPI package distribution mechanism.

*How will you package your software? Will you use a framework? If so, which one and why? If not, why not?*

At this time, we do not intend to use a framework but do wish to research Flask as a potential option as we further develop our software organization and implementation. Flask would increase our ability to build interactivity, is simpler to get started with than its primary alternative Django and is more explicit, which will help us define our automatic differentiation steps in a clear manner.



**Implementation (@Selina)**

Discuss how you plan on implementing the forward mode of automatic differentiation.
What are the core data structures?

Parent class **AD ()**: call one of the three classes



> scalar values: **AD_scalars**

First create a *variable object* **x** and then define the symbolic expression for *function* **f**. 

```
>>> a = 2.0
>>> x = autodiff.Variable(a)
>>> alpha = 2.0
>>> beta = 3.0
>>> f = alpha * x + beta
```
Where **f.der** contains derivative to 'x', **f.val** contains function value


```
>>> print("alpha * x + beta = ", f.val, "; Derived value for f = ", f.der)
alpha * x + beta =  7.0 ; Derived value for f =  2.0

```

If *variable object* is a special function


```
>>> a = 4.0
>>> x = autodiff.Variable(a)
>>> f = 7 * autodiff.Sin(x) + 3 

```



> scalar functions of vectors: **AD_functions**

First create two *variable objects* **x1, x2** and then define the symbolic expression for *function* **f**. 
```
a1 = 4.0
a2 = 3.0
x1 = autodiff.Variable(a1, name='x1')
x2 = autodiff.Variable(a2, name='x2')
f = x1 - x2 * x_1

```

Where f.der is a dictionary that contains derivative to ‘x1’ and derivative to ‘x2’ 


```
>>>  print(f.der)
{'x1': -3, 'x2': -4}
>>>  print(f.der['x1'])
-3

```



> vector functions of vectors, **AD_vectors**

For example: 
$$f_1(x_1, x_2) = x_1 - x_2*x_1$$
$$f_2(x_1, x_2) = \frac{x_1}{x_2}$$

$$f(x_1, x_2) = (f_1(x_1, x_2), f_2(x_1, x_2))$$

```
a1 = 4.0
a2 = 3.0
x1 = autodiff.Variable(a1, name='x1')
x2 = autodiff.Variable(a2, name='x2')
f1 = x1 - x2 * x_1
f2 = x1 / x2


Jacobian (f1 f2) = f1.der(on='x1'), f1.der(on='x2') = (f1.der, f2.der)
                   f2.der(on='x1'), f2.der(on='x2')

```

What classes will you implement?



```
class Variable:
  
  def __init__(self, value, der=1, name='x'):
    """Claim variable
    
    INPUTS
    =======
    value: float, 
           value of claimed variable
    der  : float, optional, default value is 1
           derivative of claimed variable
    name : string, optional, default value is 'x'
           name of claimed variable
    
    RETURNS
    ========
    Variable: Variable object
              contains parameters val, der, and name
    
    EXAMPLES
    =========
    >>> x = autodiff.Variable(a1, name='x1')
    >>> print(x.val, x.der, x.name)
    (2.0, {'x': 1}, 'x')
    """
        
  def der(self, on="x"):
    """
    Return the derivative on given variable name
    """
    
  def __add__(self, other):
    """
    Add two Variable objects : self and other,
    Return a new Variable object
    """
  
class Sin:

  def __init__(self, x):
    """Claim a special function
    
    INPUTS
    =======
    x    : Variable object, 
           A pre-claimed variable
    
    RETURNS
    ========
    element: Special element object
             contains parameters val, der, and name
    
    EXAMPLES
    =========
    >>> x = autodiff.Variable(a1, name='x1')
    >>> s = autodiff.Sin(x)
    >>> print(s.val, s.der, s.name)
    (2.0, {'x': cos(1)}, 'x')
    """
  
```





What method and name attributes will your classes have?

* Methods

```
__add__
__radd__
__sub__
__rsub__
__mul__
__rmul__

``` 
What external dependencies will you rely on?


```
Numpy

Math

scipy*

```


How will you deal with elementary functions like ***sin, sqrt, log, and exp*** (and all the others)?

> Element function class : **autodiff.Sin(x)**


```
f = 2 * autodiff.Sin(x) + 3

print(f.val, f.der)
```





**References**

[1] Automatic differentiation. Wikipedia. Available at https://en.wikipedia.org/wiki/Automatic_differentiation.

[2] Griewank, A., in Complexity in Nonlinear Optimization (ed. Pardalos, P.), World Scientific, Singapore, 1993, pp. 128–161.

[3] Coleman, T. F. and Verma, A., in Computational Differentiation: Techniques, Applications and Tools (eds Berz, M., Bischof, C., Corliss, G. and Griewank, A.), SIAM, Philadelphia, 1966, pp. 149–159.

[4] Andreas Griewank: Evaluating Derivatives. SIAM 2000.

[5] Derivative. Wikipedia. Available at: https://en.wikipedia.org/wiki/Derivative#History

[6] Hoffman, Philipp H.W. “A Hitchhiker’s Guide to Automatic Differentiation.” Numerical Algorithms, 72, 24 October 2015, 775-811, Springer Link, DOI 10.1007/s11075-015-0067-6. 


In [3]:
from IPython.display import Image