# Building a Neural Network from Scratch Using Object-Oriented Programming (OOP)

In this notebook I develop an object-oriented sequential neural network from scratch as a personal project. The main motivator for this work is a more rigorous application of concepts learned in Andrew Ng's machine learning courses:

- [Machine Learning Specialisation](https://www.coursera.org/specializations/machine-learning-introduction)
- [Deep Learning Specialisation](https://www.coursera.org/specializations/deep-learning)


The full implementation will have two main components:


1.   A neural network layer constructor that allows the user to define layer properties similarly to the Keras API for TensorFlow.
2.   A model compiler that takes a series of layers and forms the neural network.



In [1]:
# Import packages
import numpy as np
import copy

# Neural Network Layer Object

The neural network layer object is created below. While the goal is mimic Keras' ease of use through OOP, the features included in this build are not an exhaustive reproduction of Keras' feature list, since that is well beyond the scope of this work. The following functionality  is included in the current version:
- Generalised object architechture allows the user to initialise a layer with
  - a number of units/neurons
  - an activation function
  - weight and bias parameters
  - a layer id/name
- A <u>print()</u> call produces a brief summary of the layer properties.
- The <u>initialiseWB()</u> method initialises the layer weights and biases based on an expected input.
- Once the model has been initialised for an input, the layer object validates any future inputs using the <u>is_consistent()</u> method.
- The <u>get_parameters()</u> and <u>set_parameters()</u> methods allow for the inspection and  update of the layers weights and biases to facilitate the learning mechanisms.
- The <u>forward()</u> method computes the local forward propagation step (vectorised) using one of the following activation functions:
  - Linear (i.e, no activation function)
  - Sigmoid
  - Hyperbolic Tangent Function (tanh)
  - Softmax
  - Rectified Linear Unit (ReLU)


In [2]:
class NN_DenseLayer():
  # class variables for scaling initial weights/biases
  WB_SCALE = 0.01

  def __init__(self,units,activation='sigmoid',name=None,W_init=None,B_init=None):
    # initialise layer data
    self.neurons = units          # number of neurons in the layer
    self.activation = activation  # activation function applied at the layer
    self.name = name              # layer name (if desired)

    # features will be updated when an input is passed to the layer
    self.features = None

    # initialise layer weights and biases (inactive by default)
    self.wb_initialised = False
    self.W = W_init
    self.B = B_init

    # input sanitation parameter
    self.expectedInputShape = None


  def __str__(self):
    '''
      Prints a brief summary of the layer properties when print() is called.
    '''
    # print dialogue
    return f"\nNeural-network layer (dense) with {self.neurons} neurons. {self.activation.title()} activation. \n"


  def initialiseWB(self,featureCount,neuronCount):
    '''
      Produces initial values for weight and bias arrays:

      PARAMETER INITIALISATION PROTOCOL:
      --------------------------------------------------------------------------
        - initialise the weights matrix to the desired (n,j), populating with a random sample from a normal distributiion
        - initialise the bias matrix to the desired (1,j), populating with zeros
        - scale both values using WB_SCALE
    '''
    # initialise weight array, and scale
    weights_matrix = np.random.randn(featureCount,neuronCount) * self.WB_SCALE

    # initialise bias array, and scale
    bias_matrix = np.zeros((1,neuronCount)) * self.WB_SCALE

    # indicate that weights/biases have been initialised
    self.wb_initialised = True

    return weights_matrix, bias_matrix


  def is_consistent(self,X_in):
    '''
    Checks if the input X_in is consistent with previous inputs to the layer.
    '''
    # returns true if the input shape is consistent with the previous iterations
    if X_in.shape == self.expectedInputShape:
      return True
    return False


  def update_input(self, X_in):
    '''
      Updates the expected input shape for the layer based on the structure of a new input X_in
    '''
    # update the expected input shape to match X_in
    self.expectedInputShape = X_in.shape



  def count_features(self,X_in):
    '''
    Counts the number of features, n, present in the input X_in
    '''
    try:
      # if X_in is a matrix, n is the number of columns, index 1
      numFeatures = X_in.shape[1]
    except IndexError:
      # raise an error if the input is a 1D array
      raise Exception('Input must be a 2D matrix.')

    return numFeatures


  def forward(self, X_in):
    '''
      Evaluates the layer output(s) for an input X_in:

      X_in -  input data    (m,n)  |  m examples with n features each
      W    -  layer weights (n,j)  |  n features per neuron for j neurons/units
      B    -  bias vector   (1,j)  |  j neurons/units
    '''
    self.current_input = copy.deepcopy(X_in)
    # check if the layer was expecting an input of this format
    if self.is_consistent(self.current_input):
      pass # do nothing if input was expected in this format
    else:
      # we're dealing with a new input. so update the expected input shape
      self.update_input(self.current_input)
      # count the number of features in the new input format
      self.features = self.count_features(self.current_input)
      # initialise weight/bias arrays for the new input
      self.W, self.B = self.initialiseWB(self.features,self.neurons)

    # if weights/biases have not yet been initialised, then do so.
    if not self.wb_initialised:
      # estimate inital values for W and B
      self.W, self.B = self.initialiseWB(self.features,self.neurons)

    # apply linear function using weights an biases on the layer input; (z = X_in*W + B)
    z = np.matmul(self.current_input, self.W) + self.B

    # library of built-in activation functions
    forward_activations = {
        'linear':self.linear(z,'forward'),
        'sigmoid':self.sigmoid(z,'forward'),
        'tanh':self.tanh(z,'forward'),
        'softmax':self.softmax(z,'forward'),
        'relu':self.relu(z,'forward'),
        'prelu':self.prelu(z,'forward')
    }

    # attempt to apply the chosen activation function
    try:
      # pull the relevant activation output
      self.out = forward_activations[self.activation.lower()]
    except KeyError:
      # activation call failed
      self.out = None
      # find valid activation functions
      valid_activation_keys = [key for (key, values) in forward_activations.items()]
      # throw exception and recommend valid activation functions
      raise Exception(f"Invalid activation function. Pick one of the following: {valid_activation_keys}")

    # cache input values for backward propagation step
    linear_inputs = (self.current_input, self.W, self.B)
    activation_inputs = (z)
    cache = (linear_inputs, activation_inputs)

    return self.out, cache


  def get_parameters(self):
    '''
      Returns the current weights and biases for the layer
    '''
    # package parameters into a dictionary and return
    return {'W':self.W, 'B':self.B}


  def set_parameters(self,W_set,B_set):
    '''
      Overwrites the weights and biases of the current layer
    '''
    try:
      # if setpoint values are null, then W/B are set to None
      if (W_set == None) and (B_set == None):
        self.W = copy.deepcopy(W_set)
        self.B = copy.deepcopy(B_set)
        # indicate that the weights/biases have been nullified
        self.wb_initialised = False

    except ValueError:
      if (self.W == None) or (self.B == None):
        # create proxy weights/biases using the current input structure
        W_proxy, B_proxy = self.initialiseWB(self.features,self.neurons)
      else:
        W_proxy = copy.deepcopy(self.W)
        B_proxy = copy.deepcopy(self.B)

      # if non-null, the shape of the weight and bias structures should be consistent
      if (W_set.shape == W_proxy.shape) and (B_set.shape == B_proxy.shape):
        self.W = copy.deepcopy(W_set)
        self.B = copy.deepcopy(B_set)
      # if not, the W/B structures chosen are incompatible
      else:
        return print(f"\nINVALID COMMAND(!): \n\nLayer is only compatible with a {W_proxy.shape} weight matrix, and {B_proxy.shape} bias matrix")


  #----------------------------------------------------------------------------#
  # ACTIVATION FUNCTIONS
  #----------------------------------------------------------------------------#
  # Linear function (regression problems; negative or positive)
  def linear(self,z,direction='forward'):
    '''
      Evaluates the linear function g(z) or its derivative g'(z) on the input z

      Where,    g(z) = z

      And,      g'(z) = 1
    '''
    # during forward propagation...
    if direction.lower() == 'forward':
      # return g(z)
      return copy.deepcopy(z)

    # during backward propagation...
    elif direction.lower() == 'backward':
      # return g'(z)
      return np.ones(z.shape)

    else:
      # otherwise, indicate invalid method call
      raise Exception('Invalid acivation direction. Must be "forward" or "backward"')


  # Sigmoid function (binary classification problems; 1 or 0, etc.)
  def sigmoid(self,z,direction='forward'):
    '''
      Evaluates the sigmoid function g(z) or its derivative g'(z) on the input z

      Where, g(z) =      1
                     ----------
                     1 + e^(-z)

      And,   g'(z) = g(z) * (1 - g(z))
    '''

    # during forward propagation...
    if direction.lower() == 'forward':
      # return g(z)
      return 1/(1 + np.exp(-z))

    # during backward propagation...
    elif direction.lower() == 'backward':
      # return g'(z)
      g_z = 1/(1 + np.exp(-z))
      return (g_z) * (1 - (g_z))

    else:
      # otherwise, indicate invalid method call
      raise Exception('Invalid acivation direction. Must be "forward" or "backward"')



  # Hyperbolic Tangent function (alternative to sigmoid, varies between -1 an 1 instead of 0 and 1)
  def tanh(self,z,direction='forward'):
    '''
      Evaluates the Hyperbolic Tangent (tanh) function g or its derivative g' on an input z

      Where, g(z) =    e^(z) - e^(-z)
                       --------------
                       e^(z) + e^(-z)


      And,   g'(z) =   1 - (g(z))^2
    '''

    # during forward propagation...
    if direction.lower() == 'forward':
      # return g(z)
      return np.tanh(z)

    # during backward propagation...
    elif direction.lower() == 'backward':
      # return g'(z)
      return 1 - np.tanh(z)**2

    else:
      # otherwise, indicate invalid method call
      raise Exception('Invalid acivation direction. Must be "forward" or "backward"')



  # ReLU function (regression problems; threshold utility; non-negative values only)
  def relu(self,z,direction='forward'):
    '''
      Evaluates the Rectified Linear Unit (ReLU) function g or its derivative g' on an input z

      Where, g(z) = 0, if (z < 0)
             g(z) = z, if (z >= 0)

      And,   g'(z) = 0 if (z < 0)
             g'(z) = 1 if (z >= 0)
    '''

    # during forward propagation...
    if direction.lower() == 'forward':
      # return g(z)
      return np.maximum(0,z)

    # during backward propagation...
    elif direction.lower() == 'backward':
      # return g'(z)
      return (z >= 0) * 1

    else:
      # otherwise, indicate invalid method call
      raise Exception('Invalid acivation direction. Must be "forward" or "backward"')


  # Parametric ReLU function
  def prelu(self,z,direction='forward',scalar=0.01):
    '''
      Evaluates the Rectified Linear Unit (ReLU) function g or its derivative g' on an input z

      Where, g(z) = 0, if (z < 0)
             g(z) = z, if (z >= 0)

      And,   g'(z) = 0 if (z < 0)
             g'(z) = 1 if (z >= 0)
    '''

    # during forward propagation...
    if direction.lower() == 'forward':
      # return g(z)
      return np.maximum(scalar*z, z)

    # during backward propagation...
    elif direction.lower() == 'backward':
      # return g'(z)
      return (z >= 0) * scalar

    else:
      # otherwise, indicate invalid method call
      raise Exception('Invalid acivation direction. Must be "forward" or "backward"')


  # Softmax activation (multi-class classification, i.e. mutiple dicrete choices, one correct choice per example)
  def softmax(self,z,direction='forward'):
    '''
    Evaluates the softmax activation function g or its derivative g' on an input z. Softmax converts
    z for all possible choice into a distribution of probabilities that each
    choice is correct.

    For a problem with N choices, the probability that the kth choice is
    correct is represented by:

           g(z_{i,j,k,...,N}) =         e^z_k
                                 -------------------
                                 e^z_i + ... + e^z_N
    '''

    # during forward propagation...
    if direction.lower() == 'forward':
      # return g(z)
      return np.exp(z)/np.sum(np.exp(z))

    # during backward propagation...
    elif direction.lower() == 'backward':
      # return g'(z)
      return None

    else:
      # otherwise, indicate invalid method call
      raise Exception('Invalid acivation direction. Must be "forward" or "backward"')


# Function Tests - Layer Behaviour

Below are some tests to ensure that the neural network layer produces the expected behaviour.

In [None]:
# create a new dense layer
L1 = NN_DenseLayer(3,'Sigmoid','layer1')
# create another dense layer with a bogus activation function
L2 = NN_DenseLayer(3,'Bogus','layer2')

In [None]:
# print new layer
print(L1)


Neural-network layer (dense) with 3 neurons. Sigmoid activation. 



In [None]:
# check layer weights
L1.get_parameters()

{'W': None, 'B': None}

In [None]:
# Check the expected input shape
L1.expectedInputShape

In [None]:
# activate the layer L1 by passing an array input
test_input = np.array([[1,2,3]])
L1.forward(test_input)

array([[0.50542075, 0.50019414, 0.52050466]])

In [None]:
# summary of L2 layer
print(L2)


Neural-network layer (dense) with 3 neurons. Bogus activation. 



In [None]:
# attempt to activate L2 using the same input. input validation should catch bogus activation
L2.forward(test_input)

In [None]:
# The expected input shape should now match the most recent input (1D array)
L1.expectedInputShape

(1, 3)

In [None]:
# check layer weights
L1.get_parameters()

{'W': array([[-0.00165777, -0.00157456,  0.00859604],
        [-0.00418253, -0.0061997 , -0.00519123],
        [ 0.01056889,  0.00491683,  0.02795036]]),
 'B': array([[0., 0., 0.]])}

In [None]:
# reset weights
W = None
B = None
L1.set_parameters(W, B)
L1.get_parameters()

{'W': None, 'B': None}

In [None]:
# set invalid weights
W = np.arange(1,3,1)
B = np.arange(1,10,2)
L1.set_parameters(W, B)


INVALID COMMAND(!): 

Layer is only compatible with a (3, 3) weight matrix, and (1, 3) bias vector


In [None]:
# check that layer weights were NOT changed
L1.get_parameters()

{'W': None, 'B': None}

In [None]:
# activate for a new input (matrix type)
new_in = np.array([[1,2,3],[4,5,6]])
L1.forward(new_in)

array([[0.49409322, 0.50071983, 0.49162361],
       [0.49121147, 0.49958555, 0.48173364]])

In [None]:
# check layer weights
L1.get_parameters()

{'W': array([[ 0.00820837, -0.00685627,  0.00587853],
        [-0.00431803,  0.00629611, -0.01783766],
        [-0.00773351, -0.00095221, -0.0012373 ]]),
 'B': array([[0., 0., 0.]])}

In [None]:
# The expected input should update to match the new input type (2D matrix)
L1.expectedInputShape

(2, 3)

# Neural Network Model Compiler

In [None]:
# enforce consistent results
np.random.seed(1)

class NN_Model:
  # enforces consistent results
  W_RAND =  [np.random.randn(3,2) * 0.01, np.random.randn(2,1) * 0.01]

  def __init__(self,layers):
    # assign layer identifiers for the model
    self.layers = layers
    self.num_layers = len(self.layers)
    self.layer_activations = [layer.activation for layer in self.layers]
    self.parameters = {}
    self.current_input = None


  def summarise(self):
    '''
    Summarises the features of the Neural Netwrok instance
    '''
    print(f"{len(self.layers)} Layer Neural-Network (Sequential):")
    print(f"------------------------------------\n")
    for id in range(len(self.layers)):
      print(f"Layer {id + 1}")
      print(self.layers[id])


  def count_features(self,X_in):
    '''
    Counts the number of features, n, present in the input X_in
    '''
    try:
      # if X_in is a matrix, n is the number of columns, index 1
      numFeatures = X_in.shape[1]
    except IndexError:
      # raise an error if the input is a 1D array
      raise Exception('Input must be a 2D matrix.')

    return numFeatures


  def initialise_parameters(self, X_in):

    # initialise the model dimensions starting with the input X_in
    self.model_dimensions = [self.count_features(X_in)]

    # loop through all layers in the model
    for layer in range(self.num_layers):

      # get the current layer
      current_layer = self.layers[layer]

      # add current layer dimensions to list
      self.model_dimensions.append(current_layer.neurons)

      # evaluate layer properties
      W_temp, b_temp = current_layer.initialiseWB(self.model_dimensions[layer],self.model_dimensions[layer+1])
      # assign weights and biases to a dictionary
      self.parameters["W" + str(layer+1)] = copy.deepcopy(self.W_RAND[layer]) # enforced results with W_RAND for testing
      self.parameters["b" + str(layer+1)] = copy.deepcopy(b_temp)


  def get_parameters(self):
    '''
      Returns the models current weights and biases
    '''
    return self.parameters


  def fit(self, X_in, Y_in):
    '''
      Computes a set of weight and bias matrices, W and B, for a given training dataset: X_in and Y_in, where
      X_in is a set of input features
      Y_in are the corresponding labels
    '''
    pass


  def forward_propagation(self, X_in):
    '''
      Computes one full forward propagation pass on the NN given the input X_in. Returns the NN output/prediction
      and a cache of internal layer inputs for the backwards propagation step.
    '''

    # initialise layer caches. these will store local input data for each layer
    self.layer_caches = []

    # initialise the current input
    self.current_input = copy.deepcopy(X_in)

    # loop through the entire network, and compute forward activations for each layer
    for count, layer in enumerate(self.layers):

      # extract the layer output and the cache of inputs
      layer_output, layer_cache = layer.forward(self.current_input)
      # append the input cache to a model instance
      self.layer_caches.append(layer_cache)

      # if the current layer is the last layer...
      if count == self.num_layers - 1:
        # ...then the output is our model prediction
        self.model_prediction = layer_output

      # for intermediate layers...
      else:
        # ...update current_input so that the next layer takes in the output of the current layer
        self.current_input = layer_output

    # return model output, and input logs
    return self.model_prediction, self.layer_caches


  def compute_cost(self, Y_in):
    pass


  def backward_propagation(self):
    pass


  def gradient_descent(self):
    pass


  def update_parameters(self):
    pass


  def predict(self):
    pass



# Function Tests - Model Behaviour

In [None]:
# create NN layer objects
L1 = NN_DenseLayer(2,'relu','layer1')
L2 = NN_DenseLayer(1,'sigmoid','layer3')

# pass layers to NN sequantial constructor model
model = NN_Model([L1,L2])

In [None]:
# model
model.summarise()

2 Layer Neural-Network (Sequential):
------------------------------------

Layer 1

Neural-network layer (dense) with 2 neurons. Relu activation. 

Layer 2

Neural-network layer (dense) with 1 neurons. Sigmoid activation. 



In [None]:
test_input = np.random.randn(20,3)
model.initialise_parameters(test_input)
model.get_parameters()

{'W1': array([[ 0.01624345, -0.00611756],
        [-0.00528172, -0.01072969],
        [ 0.00865408, -0.02301539]]),
 'b1': array([[0., 0.]]),
 'W2': array([[ 0.01744812],
        [-0.00761207]]),
 'b2': array([[0.]])}

In [None]:
AL, _ = model.forward_propagation(test_input)
print(AL)

[[0.4999803 ]
 [0.50007778]
 [0.49996192]
 [0.50007881]
 [0.5       ]
 [0.5       ]
 [0.49999542]
 [0.49997661]
 [0.50001552]
 [0.49995607]
 [0.50002117]
 [0.50008683]
 [0.50001695]
 [0.50002047]
 [0.50015765]
 [0.49995681]
 [0.50000205]
 [0.50004541]
 [0.5001186 ]
 [0.5000574 ]]


In [4]:
np.array([[1,2],[3,4],[5,6]])**2

array([[ 1,  4],
       [ 9, 16],
       [25, 36]])