Here, we will be building a basic neural network from scratch, consisting of only fully connected (dense) layers.

We begin by defining the *Layer* class, which represents a fully connected layer in the NN. Each layer has a set of associated weights and biases, as well as an activation function and its derivative (which defaults to the relu). The weights and biases are initialized to random small numbers (0 - 0.1). The layer stores two values, *X* and *delta*, which are the output of that layer from the forwards and backwards pass, respectively. 

At first, we will handle the forward pass, which computes *X*. When computing forward, we simply multiply the input by the weight matrix and add the bias, and then apply the activation function. The layer then stores the output as *X*, and also stores the output before applying the activation function, as that is needed in backpropagation.

In [10]:
import numpy as np

# The rectified linear unit
def relu(input):
    return np.maximum(0, input)

def relu_derivative(input):
    # If a element >= 0, set it equal to 1, otherwise 0.
    # This is the fastest method, see https://stackoverflow.com/questions/45648668/convert-numpy-array-to-0-or-1 

    return (input >= 0).astype(int)

# Represents a fully connected layer in the neural network
class Layer:
    def __init__(self, output_size, input_size, activation=None, activation_derivative=None):
        self.output_size = output_size
        self.input_size = input_size
        
        self.weights = np.random.rand(output_size, input_size) * 0.1
        self.bias = np.random.rand(output_size, 1) * 0.1
        
        # Default activation function is relu
        if activation is None:
            self.activation = relu
            self.activation_derivative = relu_derivative
        else:
            self.activation = activation
            self.activation_derivative = activation_derivative
        
    def forward(self, input):
        assert input.shape == (input_size, 1)
        
        signal = np.dot(self.weights, input) + np.bias
        
        output = self.activation(signal)
        
        assert output.shape == (output_size, 1)    
        
        # We save the signal for use in backpropagation
        self.signal = signal
        
        self.X = output

Next, we handle the backwards pass. The backward functions calculates *delta* of the current layer, given the layer ahead of it. 

In [None]:
def backward(self, input_layer):
    input = input_layer.delta
    weights = input_layer.weights
    
    assert input.shape == (input_layer.output_size, 1)
    
    output = np.dot(weights.T, input)
    output *= self.activation_derivative(self.signal)
    
    assert output.shape == (output_size, 1)
    
    self.delta = output