# Introduction: 

&nbsp;&nbsp;&nbsp;&nbsp; Our project will implement the core parts of NN gradient descent. 

In order to achive this goal we will need to implement automatic differentation in the forward and backward direction. 

This will allow us to make an inference and then back proprogate the effects of the loss onto individual weights. Our suite of tools should also be able to evaluate and differentiate equations that don't fit into a NN. Gradient descent is the backbone of all machine learning, and auto differentiation is what makes gradient descent possible. Our project will allow us to explore and implement these foundations, hopefully giving us a greater understanding of the domain.

# Background:

&nbsp;&nbsp;&nbsp;&nbsp;In forward automatic differentiation, a given function is first broken down into elementary functions. Elementary functions include multiplication, addition, subtraction, division, and other basic functions. Once a given function is "divided" by the elementary functions, a graph can be generated where each node is a specific stage in the calculation and each edge is an elementary operation applied to a given node. Applying the derivative chain rule in this graph will yield the gradiants and  calculate the "contributions" of a given node. 

Our group thinks that 3Blue1Brown does a great exploration of this topic: https://www.youtube.com/watch?v=tIeHLnjs5U8

Here is an image from the video which shows a computation graph which represent the operations in a trival NN: 
<div>
<img src="https://www.3blue1brown.com/content/lessons/2017/backpropagation-calculus/tree-extended.png" width="400">
</div>

Our package should be able to implement this graph and eventually be able to use it to train and make inferences.

# How to Use AutoDiffPackage:

&nbsp;&nbsp;&nbsp;&nbsp;We are calling out package AutoDiffPackage. Our goal is for this package to be accessible from at least a test server. The users will have to have NumPy and NetworkX installed inorder to use AutoDiffPackage. NumPy is fundamental to the calculations while NetworkX is for graph visualization.

&nbsp;&nbsp;&nbsp;&nbsp;A user should be able to both parse an equation from a string or represent that equation/NN via objects/activation functions created and composed using our package. Users should also be able to execute this graph structure given input data. Users should be able to also directly calculate gradients using the forward AD class and pasing it a function and inputs. Users should have the ability to choose which part of their package is best for their use case be it forward AD which requires less space or backward passes on a graph data structure which allow for gradient descent.


### figure out a way to put this inside of a code cell in jupyter notebook
###### like makeit  look like code
from AutoDiffPackage import Graph, AutoDiff
equation = Graph(function)
graph_diff = equation.autodiff()
forward_diff = AutoDiff(Symbolic Reprentation of function or Graph, input vector)
diffs = AutoDiff(Graph or symbolic)
outs = diffs(input vector)


# Software Organization:

We think that the design philosophy of PyTorch is excellent, and will try our best to emulate their clean and thoughtful stucturing of code as the package grows.

For now our directory will follow the following structure: 

```
team48/
├── docs
│   └── milestone1
├── LICENSE
├── README.md
├── tests
└── src 
```
&nbsp;&nbsp;&nbsp;&nbsp;Inside of the src folder we plan on including a module called Autodiff for calculating the gradient of a given input function. 

&nbsp;&nbsp;&nbsp;&nbsp;We also plan on having a graph-oriented module called Graph that will be used to build a graph representation of the process and will build out this representation. Our tests suite will be found in the tests directory of our folders and our package will be distributed via Pytools and PyPI.

# Implementation:

At this time nothing is set in stone, and we are doing our best to plan out how we want to implement the assignment. We think we will need the following classes for our implementation:
    
- **Dual Numbers**:
  - Attributes: 
    - real: a real number part of type float or int
    - dual: a dual number part of type float or int
  - Methods:
    - We will overload elementary operations (addition, multiplication, division, and subtraction, sin, etc) to correctly work for dual numbers. 
- **Operation class**:
  - Attributes: 
    - Enum class where all elementary operations are assigned a number 
- **Node class**:
  - Attributes:
    - Optional Parameter Dict
  - operation: an Operation type variable representing the operation at the given node 
  - Value: Dual number value at a given "step" in the calculation
- **Graph class**:
  - Attributes: 
    - Nodes: nodes in the graph of type node
    - Edges: adjacency list representing edges between different nodes
- **AD class**: 
  - Attributes: 
    - This calss will accept a function and possibly accept values to be evaluated. 
    - Methods
      - This class will have a function grad() to calculate the gradient of the given function. 
      - This class will also make sure to account for the altered addition in multiplication (this will mainly be addressed by the dual number class but will be considered here too).


&nbsp;&nbsp;&nbsp;&nbsp;The core data structures are the classes as discussed above. The Dual numbers will consist of a real number and a dual number (derivative) part. Each node will consist of a operation that occurs at the given node and specific parameters of a dual number type to track the value and derivative in a backwards propogation (we are accounting for reverse mode with this). The graph class will have nodes as discussed above and an adjacency list of edges between these nodes to record the relationships; these will be used to represent our computational graph and are necessary to conduct reverse auto differentiation.

&nbsp;&nbsp;&nbsp;&nbsp;We will overload basic operations for dual numbers by declaring custom versions of these functions in the DualNumber class. We will create our own versions of this and default to numpy versions of the functions. We will be using the numpy array and numpy matrix as a fundamental data structures and extend numpy functionality as it allows us to conveniently handle vector input and vector functions. Because handling vector input and numpy data structures are critical to our project we will include the external  package numpy. NetworkX is an external package that allows us to visualize graphs. We will use this package to visualize large graph structures. We intend to include this package out of convenience for the user, but it is more of a stretch goal.

&nbsp;&nbsp;&nbsp;&nbsp;To handle cases for functions where the input dimensions differ from the output dimensions, we will include checks within nodes to enure that the number of variables and dimension is staying consistent with our defined function. We have discussed a gradient function above and will include that in our library for the function.

We may also implement a layer class should we have enough time

# Licensing:

&nbsp;&nbsp;&nbsp;&nbsp;We will use the MIT License for our project. Because we are using numpy and possibly NetworkX, we shouldn't have to deal with any issues of patents. We are okay with people making and distributing closed source versions of our code.
    
You can find a copy of this license here: https://choosealicense.com/licenses/mit/