# Lesson 2: Neural Net Computations

In the previous assignment we used Keras to train a neural network. In this assignment you will build your own minimal neural net library. The basic structure is given to you; you will need to fill in details such as weight updating for backpropogation. Then you will test the network on learning the XOR function.

Read through the class definitions below first to understand the basic architecture.

Then you should add code as necessary where marked "TODO" in the code below and remove the NotImplementedError exceptions.

In [849]:
import numpy as np
import random

## Define a Neural Network Class

In [850]:
class NNet():
    """Implements a basic feedforward neural network."""

    def __init__(self):
        self._layers = []  # An ordered list of layers. The first layer is the input; the final is the output.

    def _add_layer(self, layer):
        if self._layers:
            # Update pointers. We keep a doubly-linked-list of layers for convenience.
            prev_layer = self._layers[-1]
            prev_layer.set_next_layer(layer)
            layer.set_prev_layer(prev_layer)

        self._layers.append(layer)

    def add_input_layer(self, size, **kwargs):
        assert type(size).__name__ == 'int', ('Input layer requires integer size. Type was %s instead.'
                                              % type(size).__name__)
        layer = InputLayer(size=size, **kwargs)
        self._add_layer(layer)

    def add_dense_layer(self, size, **kwargs):
        assert type(size).__name__ == 'int', ('Dense layer requires integer size. Type was %s instead.'
                                              % type(size).__name__)
        # Find the previous layer's size.
        prev_size = self._layers[-1].size()
        layer = DenseLayer(shape=(prev_size, size), **kwargs)
        self._add_layer(layer)

    def summary(self, verbose=False):
        """Prints a description of the model."""
        for i, layer in enumerate(self._layers):
            print('%d: %s' % (i, str(layer)))
            if verbose:
                print('weights:', layer.get_weights())
                if layer._use_bias:
                    print('bias:', layer._bias)
                print()


    def predict(self, x):
        """Given an input vector x, run it through the neural network and return the output vector."""
        assert isinstance(x, np.ndarray)

        # Ensure the shape of the input data is good since feed forward likes to fail on assertions.
        if x.ndim == 1:
          x = x.reshape(1, -1)

        input_layer = self._layers[0]
        assert x.shape[1] == input_layer.size(), "Input shape does not match the input size"

        for layer in self._layers:
          x = layer.feed_forward(x)
        return x


    def train_single_example(self, X_data, y_data, learning_rate=0.01, verbose=False): # Added verbose
        """Train on a single example. X_data and y_data must be numpy arrays."""

        assert isinstance(X_data, np.ndarray)
        assert isinstance(y_data, np.ndarray)

        # Forward propagation. Can use predict. Need to ensure the data is the right shape just in case.
        X_data = X_data.reshape(1, -1)
        output = self.predict(X_data)

        # Do the output layer backpropagation
        error = output - y_data
        error = self._layers[-1].backpropagate(error, learning_rate)
        if verbose:
          print(f'{self._layers[-1]._name} output error: {error}')

        # Backpropagation for the rest of the layers
        #   Skip the first one because it's the output layer that we just did above
        for layer in reversed(self._layers[:-1]):
          error = layer.backpropagate(error, learning_rate)
          if verbose:
            print(f'{layer._name} error: {error}')

    def train(self, X_data, y_data, learning_rate, num_epochs, randomize=True, verbose=True, print_every_n=100):
        """Both X_data and y_data should be ndarrays. One example per row.

        This function takes the data and learning rate, and trains the network for num_epochs passes over the
        complete data set.

        If randomize==True, the X_data and y_data should be randomized at the start of each epoch. Of course,
        matching X,y pairs should have matching indices after randomization, to avoid scrambling the dataset.
        (E.g., a set of indices should be randomized once and then applied to both X and y data.)

        If verbose==True, will print a status report every print_every_n epochs with these
        results:

        * Results of running "predict" on each example in the training set
        * MSE (mean squared error) on the dataset
        * Accuracy on the dataset
        """
        assert isinstance(X_data, np.ndarray)
        assert isinstance(y_data, np.ndarray)
        assert X_data.shape[0] == y_data.shape[0]

        for epoch in range(num_epochs):
          if randomize:
            indices = np.arange(X_data.shape[0])
            np.random.shuffle(indices)
            X_data = X_data[indices]
            y_data = y_data[indices]

          for i in range(X_data.shape[0]):
            self.train_single_example(X_data[i], y_data[i], learning_rate, verbose=False)

          if verbose and (epoch % print_every_n == 0):
            pred = self.predict(X_data)
            mse = self.compute_mean_squared_error(X_data, y_data)
            acc = self.compute_accuracy(X_data, y_data)
            print(f'Epoch {epoch}: Prediction {pred}\n MSE = {mse:.4f}, Accuracy = {acc:.4f}')

    def compute_mean_squared_error(self, X_data, y_data):
        """Given input X_data and target y_data, compute and return the mean squared error."""
        assert isinstance(X_data, np.ndarray)
        assert isinstance(y_data, np.ndarray)
        assert X_data.shape[0] == y_data.shape[0]

        predictions = self.predict(X_data)
        mse = np.mean(np.square(predictions - y_data))
        return mse

    def compute_accuracy(self, X_data, y_data):
        """Given input X_data and target y_data, convert outputs to binary using a threshold of 0.5
        and return the accuracy: # examples correct / total # examples."""
        assert isinstance(X_data, np.ndarray)
        assert isinstance(y_data, np.ndarray)
        assert X_data.shape[0] == y_data.shape[0]

        correct = 0
        for i in range(len(X_data)):
            outputs = self.predict(X_data[i])
            outputs = outputs > 0.5
            if outputs == y_data[i]:
                correct += 1
        acc = float(correct) / len(X_data)
        return acc

## Define activation functions

In [851]:
class Activation():  # Do not edit; update derived classes.
    """Base class that represents an activation function and knows how to take its own derivative."""
    def __init__(self, name):
        self.name = name

    def activate(x):
        """x is a scalar or a numpy array. Returns the output y, the result of applying the function to input x."""
        raise NotImplementedError()

    def derivative_given_y(self, y):
        """y is a scalar or a numpy array.

        Returns the derivative d(f)/dx given the *activation* value y."""
        raise NotImplementedError()

In [852]:
class IdentityActivation(Activation):
    """Activation function that passes input through unchanged."""

    def __init__(self):
        super().__init__(name='Identity')

    def activate(self, x):
        """x is a scalar or a numpy array. Returns the output y, the result of applying the function to input x."""
        return x

    def derivative_given_y(self, y):
        """y is a scalar or a numpy array.

        Returns the derivative d(f)/dx given the *activation* value y."""
        return 1


class SigmoidActivation(Activation):
    """Sigmoid activation function."""

    def __init__(self):
        super().__init__(name='Sigmoid')

    def activate(self, x):
        """x is a scalar or a numpy array. Returns the output y, the result of applying the function to input x."""
        return 1 / (1 + np.exp(-x))


    def derivative_given_y(self, y):
        """y is a scalar or a numpy array.

        Returns the derivative d(f)/dx given the *activation* value y."""
        return y * (1 - y)


## Define a method to initialize neural net weights

In [853]:
def WeightInitializer():
    """Function to return a random weight. for example, return a random float from -1 to 1."""
    return round(random.uniform(-1.0,1.0), 2)

## Define a neural net Layer base class

In [854]:
class Layer():
    """Base class for NNet layers. DO NOT MODIFY THIS CLASS. Update derived classes instead.

    Conceptually, in this library a Layer consists at a high level of:
      * a collection of weights (a 2D numpy array)
      * the output nodes that come after the weights above
      * the activation function that is applied to the summed signals in these output nodes

    So a Layer isn't just nodes -- it's weights as well as nodes.

    Specifically, to send signal forward through a 3-layer network, we start with an Input Layer that does
    very little.  The outputs from the Input layer are simply the fed-in input data.

    Then, the next layer will be a Dense layer that holds the weights from the Input layer to the first hidden
    layer and stores the activation function to be used after doing a product of weights and Input-Layer
    outputs.

    Finally, another Dense layer will hold the weights from the hidden to the output layer nodes, and stores
    the activation function to be applied to the final output nodes.

    For a typical 1-hidden layer network, then, we would have 1 Input layer and 2 Dense layers.

    Each Layer also has funcitons to perform the forward-pass and backpropagation steps for the weights/nodes
    associated with the layer.

    Finally, each Layer stores pointers to the pervious and next layers, for convenience when implementing
    backprop.
    """

    def __init__(self, shape, use_bias, activation_function=IdentityActivation, weight_initializer=None, name=''):
        # These are the weights from the *previous* layer to the current layer.
        self._weights = None

        # Tuple of (# inputs, # outputs) for Dense layers or just a scalar for an input layer.
        assert type(shape).__name__ == 'int' or type(shape).__name__ == 'tuple', (
            'shape must be scalar or a 2-element tuple')
        if type(shape).__name__ == 'tuple':
            assert len(shape)==2, 'shape must be 2-dimensional. Was %d instead' % len(shape)
        self._shape = shape

        # True to use a bias node that inputs to each node in this layer; False otherwise.
        self._use_bias = use_bias

        if use_bias:
            bias_size = shape[-1] if len(shape) > 1 else shape
            self._bias = np.zeros(bias_size)
            if weight_initializer:
                for i in range(bias_size):
                    self._bias[i] = weight_initializer()

        # Activation function to be applied to each dot product of weights with inputs.
        # Instantiate an object of this class.
        self._activation_function = activation_function() if activation_function else None

        # Method used to initialize the weights in this Layer at creation time.
        self._weight_initializer = weight_initializer

        # Layer name (optional)
        self._name = name

        # Calculated output vector from the most recent feed_forward(inputs) call.
        self._outputs = None

        # Doubly linked list pointers to neighbor layers.
        self._prev_layer = None  # Previous layer is closer to (or is) the input layer.
        self._next_layer = None  # Next layer is closer to (or is) the output layer.

    def set_prev_layer(self, layer):
        """Set pointer to the previous layer."""
        self._prev_layer = layer

    def set_next_layer(self, layer):
        """Set pointer to the next layer."""
        self._next_layer = layer

    def size(self):
        """Number of nodes in this layer."""
        if type(self._shape).__name__ == 'tuple':
            return self._shape[-1]
        else:
            return self._shape

    def get_weights(self):
        """Return a numpy array of the weights for inputs to this layer."""
        return self._weights

    def get_bias(self):
        """Return a numpy array of the bias for nodes in this layer."""
        return self._bias

    def feed_forward(self, inputs):
        """Feed the given inputs through the input weights and activation function, and set the outputs vector.

        Also returns the outputs vector for convenience."""
        raise NotImplementedError()

    def backpropagate(self, error, learning_rate):
        """Adjusts the weights coming into this layer based on the given output error vector.

        For the output layer, the "error" vector should be a list of output errors, y_k - t_k.
        For a hidden layer, the "error" vector should be a list of the delta values from the following layer, such as delta_z_k

        Returns a list of the delta values for each node in this layer. These deltas can be used as the error
        values when calling backpropagate on the previous layer."""
        raise NotimplementedError()

    def __str__(self):
        activation_fxn_name = self._activation_function.name if self._activation_function else None
        return '[%s] shape %s, use_bias=%s, activation=%s' % (self._name, self._shape, self._use_bias,
                                                              activation_fxn_name)

### Define InputLayer and DenseLayer base classes

The DenseLayer class is where most of the computation happens

In [855]:
class InputLayer(Layer):
    """A neural network 1-dimensional input layer."""

    def __init__(self, size, name='Input'):
        assert type(size).__name__ == 'int', 'Input size must be integer. Was %s instead' % type(size).__name__
        super().__init__(shape=size, use_bias=False, name=name, activation_function=None)

    def feed_forward(self, inputs):
        #assert len(inputs)==self._shape, 'Inputs must be of size %d; was %d instead' % (self._shape, len(inputs)) # This throws lots of issues with tuple input
        self._outputs = inputs
        return self._outputs

    def backpropagate(self, error, learning_rate):
        return None  # Nothing to do.

In [856]:
class DenseLayer(Layer):
    """A neural network layer that is fully connected to the previous layer."""

    def __init__(self, shape, use_bias=True, name='Dense', **kwargs):
        super().__init__(shape=shape, use_bias=use_bias, name=name, **kwargs)

        self._weights = np.zeros(shape)
        if self._weight_initializer:
            for i in range(shape[0]):
                for j in range(shape[1]):
                    self._weights[i,j] = self._weight_initializer()

    def feed_forward(self, inputs):
        """Feed the given inputs through the input weights and activation function, and set the outputs vector.

        Also returns the outputs vector for convenience."""
        inputs = np.asarray(inputs)
        #assert len(inputs)==self._shape, 'Inputs must be of size %d; was %d instead' % (self._shape, len(inputs)) # This throws lots of issues with tuple input
        # Again check the dimensions and ensure 2-dimensional input in case of single input
        if inputs.ndim == 1:
            inputs = inputs.reshape(1, -1)

        neuron = np.dot(inputs, self._weights)
        if self._use_bias:
          neuron += self._bias
        output = self._activation_function.activate(neuron)

        # Update output vector for later use, and return it.
        self._outputs = output
        return self._outputs

    def backpropagate(self, error, learning_rate):
        """Adjusts the weights coming into this layer based on the given output error vector.

        For the output layer, the "error" vector should be a list of output errors, y_k - t_k.
        For a hidden layer, the "error" vector should be a list of the delta values from the following layer, such as delta_z_k

        Returns a list of the delta values for each node in this layer. These deltas can be used as the error
        values when calling backpropagate on the previous layer."""
        assert isinstance(error, np.ndarray)
        assert isinstance(self._prev_layer._outputs, np.ndarray) # Don't we want to go backwards?
        assert isinstance(self._outputs, np.ndarray)

        # Compute deltas. Note initial error is calculated in the neural network class
        deltas = error * self._activation_function.derivative_given_y(self._outputs)

        # Compute gradient and adjust weights.
        if(self._prev_layer is not None): # Are we in the output layer or not?
          if self._outputs.shape[0] == 1: # Single neuron output in this network causes lots of issues, so check shape.
            weight_update = np.dot(self._prev_layer._outputs.reshape(-1, 1), deltas.reshape(1, -1))
          else:
            weight_update = np.dot(self._prev_layer._outputs.T, deltas.reshape(-1, 1)) # np.newaxis more useful here?
          self._weights -= learning_rate * weight_update

        # Adjust bias weights. Check dimensions just in case.
        if self._use_bias:
          if deltas.ndim == 1:
            deltas = deltas.reshape(-1, 1)
          self._bias -= learning_rate * deltas.mean(axis=0)
          pass

        if self._next_layer and self._prev_layer is None:
          return np.dot(self._next_layer._weights.T, deltas) # Input layer
        return deltas # The deltas get fed in the call to backprop in the neural network class for the next layer


# Train a neural net

## Create a dataset for the XOR problem

In [857]:
X_data = np.array([[0,0],[1,0],[0,1],[1,1]])
y_data = np.array([[0,1,1,0]]).T
print(X_data)
print(y_data)

[[0 0]
 [1 0]
 [0 1]
 [1 1]]
[[0]
 [1]
 [1]
 [0]]


## Create a neural network using the library.

In [858]:
nnet = NNet()
nnet.add_input_layer(2)
nnet.add_dense_layer(2, weight_initializer=WeightInitializer, activation_function=SigmoidActivation)
nnet.add_dense_layer(1, weight_initializer=WeightInitializer, activation_function=SigmoidActivation, name='Output')
nnet.summary()

0: [Input] shape 2, use_bias=False, activation=None
1: [Dense] shape (2, 2), use_bias=True, activation=Sigmoid
2: [Output] shape (2, 1), use_bias=True, activation=Sigmoid


In [859]:
nnet.summary(verbose=True)

0: [Input] shape 2, use_bias=False, activation=None
weights: None

1: [Dense] shape (2, 2), use_bias=True, activation=Sigmoid
weights: [[ 0.85 -0.92]
 [-0.33  0.87]]
bias: [ 0.78 -0.46]

2: [Output] shape (2, 1), use_bias=True, activation=Sigmoid
weights: [[ 0.56]
 [-0.32]]
bias: [-0.77]



# Train the network

In [860]:
nnet.train(X_data, y_data, learning_rate=0.01, num_epochs=10000, verbose=True, randomize=True)

Epoch 0: Prediction [[0.35012056]
 [0.40993594]
 [0.38981214]
 [0.37565345]]
 MSE = 0.2659, Accuracy = 0.5000
Epoch 100: Prediction [[0.38755512]
 [0.41250778]
 [0.4287796 ]
 [0.44807672]]
 MSE = 0.2584, Accuracy = 0.5000
Epoch 200: Prediction [[0.45643516]
 [0.4386385 ]
 [0.4746528 ]
 [0.41470235]]
 MSE = 0.2548, Accuracy = 0.5000
Epoch 300: Prediction [[0.43357345]
 [0.45636418]
 [0.47522403]
 [0.49235059]]
 MSE = 0.2532, Accuracy = 0.5000
Epoch 400: Prediction [[0.46819189]
 [0.48779585]
 [0.50389964]
 [0.44654522]]
 MSE = 0.2524, Accuracy = 0.7500
Epoch 500: Prediction [[0.47599377]
 [0.51128734]
 [0.49612686]
 [0.45544883]]
 MSE = 0.2520, Accuracy = 0.7500
Epoch 600: Prediction [[0.48108379]
 [0.51588681]
 [0.5016014 ]
 [0.46159288]]
 MSE = 0.2518, Accuracy = 0.5000
Epoch 700: Prediction [[0.48440067]
 [0.46592304]
 [0.51866938]
 [0.50520677]]
 MSE = 0.2517, Accuracy = 0.5000
Epoch 800: Prediction [[0.46907545]
 [0.48657018]
 [0.5202773 ]
 [0.50759989]]
 MSE = 0.2516, Accuracy = 0

## Print the resuting neural net weights.

In [861]:
nnet.summary(verbose=True)

0: [Input] shape 2, use_bias=False, activation=None
weights: None

1: [Dense] shape (2, 2), use_bias=True, activation=Sigmoid
weights: [[ 1.00997277 -2.06861199]
 [ 0.52874482  1.53818065]]
bias: [ 0.7707856  -1.22763539]

2: [Output] shape (2, 1), use_bias=True, activation=Sigmoid
weights: [[0.6491146 ]
 [1.11102788]]
bias: [-0.79220917]



In [835]:
%%shell

jupyter nbconvert --to html /content/lab02.ipynb

[NbConvertApp] Converting notebook /content/lab02.ipynb to html
[NbConvertApp] Writing 654355 bytes to /content/lab02.html




Add a text block to your Notebook: Explain in 2-3 sentences what the network structure is, its purpose and how it works (overview).

Additionally, in a few bullets or additional short paragraph: Explain your results clearly in your own words, discussing the accuracy of the model and what the results mean for answering the machine learning problem being posed.

# Analysis

1. The model structure is pretty simple. It looks like 2 input neurons, 2 dense neurons and 1 output neuron. It works by taking the input and feeding it forward, multiplying weight by input adding bias, done by utilizing the predict function (which calls feed_forward), then backpropagating on the output layer, then looping through the dense layers back to the input to adjust the weights by calculating the delta/gradient and applying the error, before passing that error backwards to the next layer.
2. The accuracy slowly increases from 50% to 75% before plateauing based on the input across the epochs. If I left it to a smaller number of epochs we do not reach this number.
3. I had a lot of issues finagling with the indices and matrix conversions and at one point was unsure of which way I was iterating through the list of layers so that might have had an impact on my algorithms and code structure. I tried to correct and go back through my implementation to ensure I was iterating properly - at one point I was calculating error twice for different layers and applying it twice.
4. One thing that I was reading on supplemental materials and it isn't here (at least in my implementation) is that the error should be calculated across all layers before updating the weights? I think there is a different approach here where we calculate the error and backpropagate it to each layer with a cascading effect.