# Deep Neural Network Using Object Oriented Programming

I'm creating this notebook to better understand how neural networks work.  One way to better understand anything is to build it from scratch and that's what I will do here.  I will try to provide a clear explaination and associated code with a concrete example.  By the end of this notebook, there will be a clear explaination of the feed foward and backpropagation processes and a full neural network written in Python using object oriented programming.  I chose to use object oriented programming instead of functional programming with matrices to ensure less ambiguity.

## Overview

A artificial neural network (ANN) is a mathematical function.  Basically this means that it represents a function that takes some inputs and computes an output(s).  In fact, neural networks are said to be able to compute any function.  Suppose we are given the function below:

<figure>
  <img src="Images/function.png" style="width:20%" />
  <figcaption><b>figure:&nbsp;&nbsp;</b>Random Function</figcaption>
</figure> 

No matter what the function, there's guaranteed to be a  neural network so that for every possible input, x, the value f(x) (or some close approximation) is output from the network.  Neural networks have a _universal_ nature to them, in that they can universally approximate any function, whether it's a 2-Dimensional, 3-Dimensional, or up to N-Dimensional function.  The function above is a 2-D function since it's represented only by x and y points.  This property of neural networks is described by the [Universal Approximation Theorem](https://en.wikipedia.org/wiki/Universal_approximation_theorem) and it can be called a universal function approximator.  The word approximation means that the neural network approximates a function. When a neural network can approximate a function, it is said that the neural network models the function or the data.  A neural network has the ability to model the above function exactly, but it will take a long time to compute and usually an approximate is just as good.  Check out this e-book for more information, [A visual proof that neural nets can compute any function, by Michael Nielsen](http://neuralnetworksanddeeplearning.com/chap4.html)

> _A feedforward network with a single layer is sufficient to represent any function, but the layer may be infeasibly large and may fail to learn and generalize correctly.
— Ian Goodfellow, DLB_

> _Introducing non-linearity via an activation function allows us to approximate any function. It’s quite simple, really. — Elon Musk_

For a neural network to model the function in the above figure it would need to be _trained_ first.  To train a neural network means that it will look at training data and based on that it'll adjust it's properties until the desired output is given.  More concretely, to train a neural network to model the above function, the set of $(x,y)$ points (called the _training set_) in the function above will be passed through the graph and the properties of the network would be udated accordingly.

## Neural Network Structure

Now that we basically know what a neural network does, let's look at how it's graphically represented.  Neural networks are structured by a series of layers and the layers are composed of neurons.  Each neuron can take 1 or more inputs and return an output.  Below is a graphical representation of a neural network.  The circles represet the neurons and the lines represent inputs and outputs.  As you can see, neurons in the layer are connected to all neurons of the next layer.  Each layer is also annotated with the type of layer it is e.g.) Hidden Layer 1.  Each connection also has a weight, $w_n$, applied to it.  The weights are the neural network properties that will be changed during training.

We're going to use a neural network with 1 input layer, 2 hidden layers, and 1 output layer to make up a 3-layered network.  Remember, the input layer isn't usually counted when counting the number of layers in a neural network.  Below is the graphical representation of our neural network.

<figure>
  <img src="Images/deep_neural_network_1.png" style="width:50%" />
  <figcaption><b>figure:&nbsp;&nbsp;</b>3-Layer Neural Network</figcaption>
</figure> 

Each neuron also has a structure.  A neuron takes a set of inputs and returns an output.  The output is computed be first taking the weighted sum of the inputs, plus the bias (we'll talk about bias later).  For now think of the bias as another weight.  The output of this calculation is then used as input to the activation function.  There are many different activation functions that can be used. I wrote about many of the main ones in [Activation Functions](ActivationFunctions.ipynb).  The computation is below:

> $z = (\sum\limits_{i=1}^n x_i \cdot w_i) + b$  
$output = activation(z)$

And the next figure below is a diagram of each neuron (circle) above in more detail.

<figure>
  <img src="Images/neuron.png" style="width:50%" />
  <figcaption><b>figure:&nbsp;&nbsp;</b>Neuron</figcaption>
</figure> 

## Algorithm

Before we start writing code, let's decompose the learning process into steps.

1. **Initialize the model**  
To initialize the model is to set the weights $w_n$ and bias $b$ values.  The initialization is random, meaning these parameters are set to any random value. There're some rules and techniques for initializing the model so that it learns faster which can be viewed in the [Initialization Notebook](NeuralNetworkInitialization.ipynb)  

Repeat
> 2. **Feedforward**  
As the name suggests, this step moves the data from the input layer, through the hidden layers, and produces an output(s).  At each node, the weighted sum of the inputs and weights are calculated and then that value is used to compute the activation function.  Refer to the _**figure:** Neuron_ above.
1. **Compute loss**  
After an output(s) is calculated by the feedforward step, then the total error is calculated.
1. **Backpropagation**  
This is the step where the magic happens and the network learns. I wrote about how backpropagation works in [Backpropagation](Backpropagation.ipynb).
1. **Update weights**  

while ((maximum  number of iterations < than specified) AND  
          (Error Function is > than specified))

These are the general steps used to train a neural network to approximate a function/model for a given dataset.

## Development

Let's first start with developing a neuron.  This is not developed in an optimized fashion, but in a way it's understandable.

#### Import Pacakages and Set Defaults

In [17]:
import math
import random
import matplotlib.pyplot as plt

# Set the matplotlib backend
%matplotlib inline

# Set default plot parameters
plt.rcParams['figure.figsize'] = (5.0, 4.0)
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# Ensure we get the same random numbers each time by using a constant seed value. (for debugging purposes only)
random.seed(1)

#
# ReLU function
#
def relu_forward(x):
    y = max(0, x)
    return y

#
# Derivative of ReLU function
#
def relu_backward(x):
    if x > 0:
        y = 1
    elif x <= 0:
        y = 0
    return y

#
# Sigmoid function
#
def sigmoid_forward(x):
    y = 1/(1+math.e**(-x))
    return y

#
# Derivative of Sigmoid function
#
def sigmoid_backward(x):
    y = sigmoid(x) * (1 - sigmoid(x))
    return y

# group the activation functions and their derivatives so 
# so they can be referenced during the feed forward and backpropagation processes
sigmoid = [sigmoid_forward, sigmoid_backward]
relu = [relu_forward, relu_backward]


#
# Neuron
#
class Neuron:
    def __init__(self, activation_function, activation_function_derivative):
        self.input_neurons = []
        self.output = 0
        self.weights = []
        self.bias = 0
        self.activation = activation_function
        self.activation_derivative = activation_function_derivative
        
    
    def _calculate_weighted_sum(self):
        weighted_sum = 0
        
        # z = Σxi+wi
        for i in range(len(self.input_neurons)):
            weighted_sum += self.input_neurons[i].output * self.weights[i]
            
        # z = z + b
        weighted_sum += self.bias
        
        return weighted_sum
    
    def calculate_output(self):

        # z
        weighted_sum = self._calculate_weighted_sum()
        
        # activation(z)
        self.output = self.activation(weighted_sum)
        return self.output
    
    # Calculate the partial derivative of the cost with respect to the output, ∂𝐶/∂𝑎
    # Since our cost method, _calculate_cost, is defined as (1/2)(𝑎(𝐿) − 𝑦)^2,
    # the derivate is (𝑎(𝐿) − 𝑦)
    def calculate_pd_c_wrt_a(self, target):
        return (self.output - target)
    
    # Calculate ∂𝑎 with respect to ∂𝑧, ∂𝑎/∂𝑧
    # Made the activation_derivative dynamic because we need to be able to use different 
    # activiation functions.  Activation functions and their derivatives are defined above
    def calculate_pd_a_wrt_z(self):
        return self.activation_derivative(self.output)
    
    # Calculate ∂𝐶 with respect to ∂𝑧, ∂𝐶/∂𝑧
    # The terms/equation, ∂𝐶/∂𝑎 * ∂𝑎/∂𝑧, is common in the set of Chain Rule equations
    def calculate_pd_c_wrt_z(self):
        return self.calculate_pd_c_wrt_a() * self.calculate_pd_a_wrt_z()

    def calculate_pd_z_wrt_w(self):
        return None
    
    def backpropagation(self):
        
        # Initialize an array to cache the new weights
        # Creates an array where the number of rows equals the number of layers
        #   and the number of columns equal the number of weights.
        # The number of weights for each layer is calculated by multiplying the number of neurons in the layer by the number of weights for 
        # one neuron in the same layer, because the number of weights in each neuron for a given layer are the same.
        updated_weights = [[0] * (layer.neuron[0].weights * len(layer.neurons)) for layer in self.layers]
        
        for layer_index in range(len(self.layers) - 1, -1, -1):
            # Cache ∂𝐶/∂𝑧 = ∂𝐶/∂𝑎 * ∂𝑎/∂𝑧
            pd_c_wrt_z = [0] * len(self.layers[layer_index].neurons)
            
            for neuron_index in range(len(self.layers[layer_index].neurons) - 1):
                
                for weight_index in range(len(self.layers[layer_index].neurons[neuron_index].weights) - 1):
                    
            
        
    

#
# Layer of Neurons
#
class Layer:
    def __init__(self, neurons):
        self.neurons = neurons
    
#
# NeuralNetwork
#
class NeuralNetwork:
    def __init__(self, layers_dimensions):
        # 1. Initialize the model
        self.initialize()
        
    def initialize(self):
        self._build_neural_network()
                    
    def _build_neural_network(self):
        # Create layers with the specified number of neurons
        self.layers = [Layer([Neuron(layer[1][0],layer[1][1]) for n in range(layer[0])]) for layer in layers_dimensions]
        
        # Iterate through the layers and connect the neurons from the previous layer to the current layer
        for layer_index in range(1, len(self.layers)):
            for node in self.layers[layer_index].neurons:
                node.input_neurons = [neuron for neuron in self.layers[layer_index - 1].neurons]
                node.weights = [random.random() for neuron in self.layers[layer_index - 1].neurons]
                node.bias = 0
                
    def print_self(self):
        # Print out neural network
        for l, layer in enumerate(self.layers):
            
            if(l == 0):
                print("INPUT LAYER " + str(l))
            elif(l == len(self.layers) - 1):
                print("OUTPUT LAYER " + str(l))
            else:
                print("LAYER " + str(l))
                
            for n, neuron in enumerate(layer.neurons):
                print("   neuron: " + str(n))
                print("      activation: " + neuron.activation.__name__ if neuron.activation != None else "") 
                print("      bias: " + str(neuron.bias))
                print("      weights: " + str(neuron.weights))
                print("      output: " + str(neuron.output))
    
    def feed_forward(self, input_data):
        
        # Set inputs into the neural network input layer
        for index, neuron in enumerate(self.layers[0].neurons):
            neuron.output = input_data[index]
        
        # Compute the output of each node in each layer
        for layer_index, layer in enumerate(self.layers):
            if layer_index == 0:
                continue
                
            for node in layer.neurons:
                node.calculate_output()
    
    def calculate_total_cost(self, training_output):
        total_cost = 0

        for o in range(len(training_output)):
            output = self.layers[len(self.layers) - 1].neurons[o].output
            target_output = training_output[o]
            total_cost += self._calculate_cost(output, target_output)
            
        return total_cost
    
    # Calculates (1/2)(𝑎(𝐿) − 𝑦)^2
    # The derivative of this function is calculate_pd_errors_wrt_output
    def _calculate_cost(self, output, target_output):
            return 0.5 * ((output - target_output) ** 2)
    
    
    def backpropagation(self):
        for l in range(len(self.layers) - 1):
            for n, neuron in enumerate()
     
    # Uses online learning, ie updating the weights after each training case
    def train(self, training_inputs, training_outputs):
        for training_input, training_output in zip(training_inputs, training_outputs):
            # 2 Feedforward
            self.feed_forward(training_input)

            # 3 Calculate cost of outputs for graphing/monitoring
            self.calculate_total_cost(training_output)
            
            # 4 Backpropagation
            self.backpropagation()
            
        
        


In [16]:

layers_dimensions = [[2,[None,None]],[2,relu],[2,relu],[2,sigmoid]]
#[Layer([Neuron(sigmoid, _, _) for n in range(layer[0])]) for layer in layers]

data_in = [[3, 8]]
data_out = [[0.3, 1.2]]
neural_network = NeuralNetwork(layers_dimensions)
neural_network.train(data_in, data_out)

neural_network.print_self()




INPUT LAYER 0
   neuron: 0

      bias: 0
      weights: []
      output: 3
   neuron: 1

      bias: 0
      weights: []
      output: 8
LAYER 1
   neuron: 0
      activation: relu_forward
      bias: 0
      weights: [0.13436424411240122, 0.8474337369372327]
      output: 7.182562627835065
   neuron: 1
      activation: relu_forward
      bias: 0
      weights: [0.763774618976614, 0.2550690257394217]
      output: 4.3318760628452155
LAYER 2
   neuron: 0
      activation: relu_forward
      bias: 0
      weights: [0.49543508709194095, 0.4494910647887381]
      output: 5.505633125085929
   neuron: 1
      activation: relu_forward
      bias: 0
      weights: [0.651592972722763, 0.7887233511355132]
      output: 8.096759139429462
OUTPUT LAYER 3
   neuron: 0
      activation: sigmoid_forward
      bias: 0
      weights: [0.0938595867742349, 0.02834747652200631]
      output: 0.6783674023802385
   neuron: 1
      activation: sigmoid_forward
      bias: 0
      weights: [0.8357651039198697