# Basics of Feed-Forward Neural Networks
----------------------------------------------------------------------
Author: Kevin Paganini / (MSOE professor)
Date: 3/14/2023
----------------------------------------------------------------------

In this lab, we will start to create a feed-forward neural network from scratch.
We begin with the very basic computational unit, a perceptron,
and then we will add more layers and increase the complexity of our network. Along the way, we will learn how a perceptron works, the benefits of adding more layers, the kind of transformations necessary for learning complex features and relationships from data, and why an object-oriented paradigm is useful for easier management of our neural network framework.

We will implement everything from scratch in Python using helpful
libraries such as NumPy and PyTorch (without using the autograd feature of PyTorch). The purpose of this lab and the following lab
series is to learn how neural networks work starting from the most basic
computational units and proceeding to deeper and more networks. This will help us better understand how other popular deep learning frameworks, such as PyTorch, work underneath. You should be able to easily understand and implement everything in this lab. If you are having trouble consult with your instructors as the next lab series will assume a perfect understanding of the basic feed-forward neural network material.

The recommended Python version for this implementation is 3.7. Recommended reading: sections 4.1 and 4.2 of the book (https://www.d2l.ai/chapter_multilayer-perceptrons/index.html).

## Perceptron

A perceptron or artificial neuron is the most basic processing unit of feed-forward neural networks. A perceptron can be modeled as a single-layer neural network with an input vector $\mathbf{x} \in \mathbb{R}^n$, a bias $b$, a vector of trainable weights $\mathbf{w} \in \mathbb{R}^n$, and an output unit $y$. Given the input $\mathbf{x}$, the output $y$ is computed by an activation function $f(\cdot)$ as follows:

\begin{equation}
y (\mathbf{x}; \Theta) = f\left(\left(\sum_{i=1}^{n} x_i w_i\right) + b \right) = f(\mathbf{w}^\intercal \mathbf{x} + b)\,,
\end{equation}
where $\Theta = \{\mathbf{w}, b\}$ represents the trainable parameter set. 

The figure below shows a schematic view of a single output perceptron. Each input value $x_i$ is multiplied by a weight factor $w_i$. The weighted sum added to the bias is then passed through an activation function to obtain the output, $y$.

![MLP example](img/perceptron.png)
 
The vector $\mathbf{x}$ represents one sample of our data and each element $x_i$ represents a feature. Thus, $\mathbf{x}$ is often referred to as a feature vector. These features can represent different measurements depending on the application. For example, if we are trying to predict if a patient is at high risk of cardiac disease then each element of $\mathbf{x}$ might contain vital signs such as diastolic and systolic blood pressure, heart rate, blood sugar levels, etc. In another application where we are trying to predict if a tissue biopsy is cancerous or not using mid-infrared imaging then each element of $\mathbf{x}$ can represent the amount of mid-infrared light absorbed at a particular wavelength. The output $y$ in the applications above could contain values of $0$ or $1$, indicating if the patient is at high risk of cardiac disease or if the tissue biopsy is cancerous or not.
 
Now, let us begin implementing our first artificial neuron.

### Implementation

Let's assume that our feature vector contains measurements of body temperature pressure, pulse oximeter reading, and presence of cough or not. Then for a 'healthy' patient our input sample might look like $\mathbf{x} = \begin{bmatrix} 98.6 \\ 95 \\ 0 \end{bmatrix}$. Let's say that we are trying to 'predict' the probability of a patient being positive with COVID-19 based on the above measurements.

Each element of our input vector is associated with a unique weight. Let the vector of weights be $\mathbf{w} = \begin{bmatrix} 0.03 \\ 0.55 \\ 0.88 \end{bmatrix}$. Each artifical neuron is also associated with a unique bias. Let the bias be $b = 2.9$. Assuming a linear activation function write the code to produce and print the output $y$ given the above input vector $\mathbf{x}$, weights $\mathbf{w}$, and the bias $b$ using the above perceptron model. Do not use any NumPy or PyTorch functions. Use a Python variable for each element and use Python lists for vectors.

For the activation function, use ReLU. This can be computed as ```x * (x > 0)``` in Python.

In [9]:
x = [98.6, 95, 0]
w = [0.03, 0.55, 0.88]
b = 2.9

def perceptron(x, w, b, activation_function):
    x_w = [x*w for x, w in zip(x, w)] + [b]
    total = sum(x_w)
    return activation_function(total)


def ReLU(x):
    return  x * (x > 0)
     

print(f'The output of the neuron is {perceptron(x, w, b, ReLU)}')


The output of the neuron is 58.108000000000004


### <font color=blue> Question 1: How many parameters does our simple model contain? Be specific.</font>

Ans: Our model contains 4 parameters in this case. We have three weights in the input vector w and one weight in the bias.

### <font color=blue> Question 2: Recall that we were hoping to 'predict' the probability of a patient being positive with COVID-19. Does the output make sense? If not, elaborate on how you could fix it.</font>

Ans: Since we were told that a healthy person would have a sample as such: [98.6, 95, 0], the model should predict that the person is healthy. It depends on what the model defines as someone who is covid positive. If when the model predicts something positive, as a person who is healthy, then the model makes sense. However, if the model predicts something positive as a person who has covid, then our model is not doing a good job. This confusion comes from the fact that the value being predicted is infinite, it could be any positive number. Using a sigmoid, or a softmax function in output layers can improve interpretability as the values are between 0 and 1, which we can equate to 0 and 100%. Then by setting the threshold of what is covid positive vs not we can tune our precision and recall to our needs.

## Perceptron with Multiple Outputs

The perceptron model above has only one output. However, in most applications, we need multiple outputs. For example, in a classification problem, we would expect the model to output a vector $\mathbf{y}$, where each $y_i$ represents the probability of a sample belonging to a particular class $i$. The figure below shows a schematic view of a multiple output feed-forward neural network. Each input value $x_i$ is multiplied by a weight factor $W_{ij}$, where $W_{ij}$ denotes a connection weight between the input node $x_i$ and the output node $y_j$. The weighted sum is added to the bias and then passed through an activation function to obtain the output, $y_j$.

![Multi outout perceptron](lab2/img/multi-output-perceptron.png)

Given an input $\mathbf{x} \in \mathbb{R}^n$ this can be modeled as:

\begin{equation}
y_j (\mathbf{x}; \Theta) = f\left(\left(\sum_{i=1}^{n} x_i W_{ij}\right) + b_j\right) = f(\mathbf{w}_j^\intercal \mathbf{x} + b_j)\,,
\end{equation}
where the parameter set here is $\Theta = \{ \mathbf{W} \in \mathbb{R}^{n \times m}, \mathbf{b} \in \mathbb{R}^m \}$ and $\mathbf{w}_j$ denotes the $j^{th}$ column of $\mathbf{W}$. 


### Implementation


Let $\mathbf{x} = \begin{bmatrix} 98.6 \\ 95 \\ 0 \\ 1 \end{bmatrix}$. Let the output vector $\mathbf{y} \in \mathbb{R}^3$, i.e. consisting of $3$ outputs. Let the weights associated with each output node $y_i$ be $\mathbf{w_1} = \begin{bmatrix} 0.03 \\ 0.55 \\ 0.88 \\0.73 \end{bmatrix}$, $\mathbf{w_2} = \begin{bmatrix} 0.48 \\ 0.31 \\ 0.28 \\ -0.9 \end{bmatrix}$, $\mathbf{w_3} = \begin{bmatrix} 0.77 \\ 0.54 \\ 0.32 \\ 0.44 \end{bmatrix}$. Let the bias vector be $\mathbf{b} = \begin{bmatrix} 2.9 \\ 6.1 \\ 3.3 \end{bmatrix}$. Note that a single bias is associated with each output node $y_i$.

Given the above inputs write the code to print the output vector $\mathbf{y}$.  Use a Python variable for each scalar and use Python lists for vectors.

In [10]:
# TODO: Your code here
import math


x = [98.6, 95, 0, 1]
w_1 = [0.03, 0.55, 0.88, 0.73]
w_2 = [0.48, 0.31, 0.28, -0.9]
w_3 = [0.77, 0.54, 0.32, 0.44]
b = [2.9, 6.1, 3.3]

def sigmoid(x):
    return 1 / (1 + math.exp(-x))


y_1 = perceptron(x, w_1, b[0], ReLU)
y_2 = perceptron(x, w_2, b[1], ReLU)
y_3 = perceptron(x, w_3, b[2], ReLU)


y_vec = [y_1, y_2, y_3]

print(f'The output vector y is: {y_vec}')

The output vector y is: [58.838, 81.97799999999998, 130.96200000000002]


Now that you understand how to do basic computations with a simple perceptron model manually, we will proceed to implement the same model above using matrix-vector operations utilizing PyTorch functions. Organizing the computations in matrix-vector format notation makes it simpler to understand and implement neural network models. 

Write the code to create the same output vector $\mathbf{y}$ as above by expressing the above computations as matrix-vector multiplications and summation with a bias vector using code vectorization in PyTorch. You should get the same output as above, up to floating-point errors. Again use a Python variable for each scalar, but use PyTorch arrays for vectors and matrices.

In [11]:
import torch

weights = torch.tensor([
    [0.03, 0.48, 0.77],
    [0.55, 0.31, 0.54],
    [0.88, 0.28, 0.32],
    [0.73, -0.9, 0.44]
])

inputs = torch.tensor([98.6, 95, 0, 1])

bias = torch.tensor([2.9, 6.1, 3.3])


mat_mul = inputs@weights
mat_mul_bias = mat_mul + bias
result = mat_mul_bias.apply_(ReLU)

print(f'The resultant output vector is: {result}')


The resultant output vector is: tensor([ 58.8380,  81.9780, 130.9620])


### <font color=blue>Question 3: Explain what each of the dimensions of the matrix of weights $\mathbf{W}$ and the vector of biases $\mathbf{b}$ represent?</font>

Ans: Each column in the weight matrix represents the weights of one neuron. Each element of the first column vector is multiplied with the input to produce the output for the first neuron. The first row for instance are the weights that get multiplied with the first feature of the feature vector. The bias vector needs to be reshaped from a column vector to a row vector, so that the element-wise operation matches up shape wise. The bias is what gets added to each neuron in the end before the activation function. The first element of the bias vector gets added to the first node, the second element the second node and so on.     

Shapes:    
Weights = 4X3    
Input = 4X1 (needs to be reshaped) to 1X4

WEIGHTS X INPUTS = 1X4 * 4X3 = 1X3    
Bias = 3X1 need reshape

Element wise addition Bias with 1X3 = 1X3

### <font color=blue>Question 4: What is the total number of parameters for this model?</font>

Ans: 12 weights + 3 biases = 15.

## More Layers


A single-layer perceptron network still represents a linear classifier, even if we were to use nonlinear activation functions. This limitation can be overcome by multi-layer neural networks in combination with nonlinear activation functions, which introduce one or more 'hidden' layers between the input and output layers. Multi-layer neural networks are composed of several simple artificial neurons such that the output of one acts as the input of another. A multi-layer neural network can be represented by a composition function. For a two-layer network with only one output, the composition function can be written as

\begin{equation}
y_j (\mathbf{x}; \Theta) = f^{(2)}\left(\sum_{k=1}^{h}W_{kj}^{(2)}*f^{(1)}\left(\left(\sum_{i=1}^{n}W_{ik}^{(1)}*x_i \right)+b_k^{(1)}\right)+b_j^{(2)}\right)
\end{equation}
where $h$ is the number of units in the hidden layer and the set of unknown parameters is $\Theta = \{\mathbf{W}^{(1)} \in R^{n \times h}, \mathbf{W}^{(2)} \in R^{h \times 1}\}$. In general, for $L - 1$ hidden layers the composition function, omitting the bias terms, can be written as
\begin{equation}
y_j (\mathbf{x}; \Theta) = f^{(L)}\left(\sum_k W_{kj}^{L}*f^{L-1}\left(\sum_{l}W_{lk}^{L - 1}* f^{L - 2}\left( \cdots f^{1}\left(\sum_{i}W_{iz}^{1}*x_i \right)\right) \right)\right)
\end{equation}

The figure below illustrates a feed-forward neural network composed of an input layer, a hidden layer, and an output layer. In this illustration, the multi-layer neural network has one input layer and one output unit. In most models, the number of hidden layers and output units is more than one.

![Feed forward perceptron](img/feed-forward.png)

We will now see how to add an additional layer to our model and then how to generalize to any number of layers.

In [12]:
# To add another layer we need another set of weights and biases. 

W_2 = torch.tensor([
       [-0.3, 0.66, 0.98],
       [0.58, -0.4, 0.38],
       [0.87, 0.69, -0.4]
])

b_2_old = torch.tensor([3.9, 8.2, 0.8])

mat_mul2 = result@W_2

mat_mul_bias2 = mat_mul2 + b_2_old

result2 = mat_mul_bias2.apply_(ReLU)




In [13]:
# TODO: Write the code to print the output of a 2-layer feed-forward network using the previously computed output,
# y, as input to the second layer 
print(f'The output of the two layer network is: {result2}')


The output of the two layer network is: tensor([147.7328, 104.6057,  37.2281])


### <font color=blue>Question 5: Explain the dimensions of the weight matrix for the second layer with respect to the dimensions of the previous layer and the number of artificial neurons in the second layer. or Why are the dimensions of the weight matrix of the second layer 3x3?</font>

Ans: First dimension of weights correspond to the number of inputs (outputs from previous layer), while the second dimension corresponds to the number of nodes the weights go to.

## Layers to Objects

Now, we have a feed-forward model (with an input layer, one hidden layer, and one output layer with 3 outputs) capable of processing a batch of data. It would be cumbersome and redundant if we had to keep writing the same code for hundreds of layers. So, to make our code more modular, easier to manage, and less redundant we will represent layers using an object-oriented programming paradigm. Let's define classes for representing our layers.

All layer objects should have an `output` instance attribute.  Use good object-oriented practices to avoid code duplication.  To initialize an instance attribute in Python, write `self.attribute_name = attribute_value` in the initializer (`__init__` method).  Don't mention the variable at the top of the class as we would usually do in Java -- this is how you define static attributes in Python.

Rather than each layer taking PyTorch arrays as inputs, it should take `Layer`s as inputs, with each layer having its own name. For example, if your network would take $\mathbf{x}$, $\mathbf{W}$, and $\mathbf{b}$ as inputs, you should have attributes `self.x`, `self.W`, and `self.b`.  Then, when you need the values of these inputs, go back and read the output of the previous layer.  For example, if your layer needs the value of $\mathbf{W}$, you could read `self.W.output` to get it.

Two more Python OO hints: (1) `class MyClass1(MyClass2)` is not a constructor call. It is specifying the inheritance relationship. The Java equivalent is `class MyClass1 extends MyClass2`. So you don't want to add arguments on this line. An easy mistake to make.  (2) You must use `self.` every time you access an instance variable in Python. This is how the language was designed.

In [14]:
# TODO: Complete the following classes.
from functools import reduce

class Layer:
    def __init__(self, output_shape, name=''):
        """
        TODO: Initialize instance attributes here.
        
        :param output_shape (tuple): the shape of the output array.  When this is a single number, it gives the number of output neurons
            When this is an array, it gives the dimensions of the array of output neurons.
        """
        
        self.name = name
        print(f'init layer: {output_shape}')
        if isinstance(output_shape, int):
            self.nodes = torch.ones(output_shape, 1)
        else:
            self.nodes = torch.ones(output_shape)
       
       
        

class Input(Layer):
    def __init__(self, num_outputs, values=None, name=''):
        """
        TODO: Accept any arguments specific to this child class.
        num_inputs: How many inputs (features) are there?
        """
        Layer.__init__(self, num_outputs, name)
        if values is not None:
            print(f'Values shape in init: {values.shape}')
            print(f'Self.nodes.shape in init: {self.nodes.shape}')
            assert values.shape == self.nodes.shape, "num_outputs, has to match the shape of values"
            
            self.set(values)
        else:
            self.values = torch.randn(self.nodes.shape)
            

    def set(self,value):
        """
        TODO: set the `output` of this array to have value `value`.
        Raise an error if the size of value is unexpected. An `assert()` is fine for this.
        """
        

        print(f'Self.nodes.shape: {self.nodes.shape}')
        assert value.shape == self.nodes.shape, 'array should have same size as nodes'
        self.values = value
        print(f'Set value shape: {self.values.shape}')
        
    def forward(self):
        """This layer's values do not change during forward propagation."""
        pass


class Linear(Layer):
    def __init__(self, prev_layer, nodes, weights=None, bias=None):
        """
        TODO: Accept any arguments specific to this child class.
        
        Raise an error if any of the argument's size do not match as you would expect.
        prev_layer: should be a layer object that precedes a certain layer
        nodes: an int that represents how many nodes it should have in its layer
        weights: A input layer object
        bias: A input layer object
        """
        Layer.__init__(self, nodes) 
        self.prev_layer = prev_layer
        if weights is None:
            self.weights = torch.randn()
            self.weights = Input((prev_layer.nodes.shape[0], self.nodes.shape[0]))
            self.weights.set(torch.randn(prev_layer.nodes.shape[0], self.nodes.shape[0]))
        else:
            assert isinstance(weights, Input), "weights should be of type Input layer"
            print(f'Weights shape: {weights.nodes.shape}')
            print(f'Previous layers shape: {self.prev_layer.nodes.shape}')
            print(f'Self node shape: {self.nodes.shape}')
            print(f'Predicted shape: {prev_layer.nodes.shape[1]}, {self.nodes.shape[0]}')
            assert weights.nodes.shape == (prev_layer.nodes.shape[1], self.nodes.shape[1]), "Weights shape should be: prev_layer_nodes, curr_layer_nodes"
            self.weights = weights
        
        if bias is None:
            self.bias = Input((self.nodes))
            self.bias.set(torch.randn(self.nodes))
        else:
            
            print(f'Bias shape: {bias.values.shape}')
            print(f'Nodes shape: {self.nodes.shape}')
            
            assert bias.nodes.shape == self.nodes.shape, "bias shape should equal node shape"
            self.bias = bias

    def forward(self):
        """
        TODO: Set this layer's output based on the outputs of the layers that feed into it.
        """
        temp = self.prev_layer.values
        print(f'Previous layer shape: {temp.shape}')
        print(f'Weights shape: {self.weights.values.shape}')
        print(f'Weights: {self.weights.values}')
        mat_mul = temp@self.weights.values
        print(f'Mat_mul: {mat_mul}')
        print(f'After mat mul: {mat_mul.shape}')
        print(f'Bias shape: {self.bias.values.shape}')
        print(f'Bias: {self.bias.values}')
        if mat_mul.shape[0] == self.bias.values.shape[0]:
            mat_mul_bias = mat_mul + self.bias.values
        else:
            expanded_bias = self.bias.values.repeat(mat_mul.shape[0], 1)
            mat_mul_bias = mat_mul + expanded_bias
            
        print(f'Mat mul plus bias: {mat_mul_bias}')
        print(f'After adding bias: {mat_mul_bias.shape}')
        self.values = mat_mul_bias.apply_(ReLU)
        print(f'Output shape: {self.values.shape}')
        print(f'Output: {self.values}')
        
        
        


In [15]:
input_layer = Input((1,4))
input_layer.set(torch.tensor([98.6, 95, 0, 1]).reshape(1,4))
print('Input layer done')
w_1 = Input((4,3), values=torch.tensor([
    [0.03, 0.48, 0.77],
    [0.55, 0.31, 0.54],
    [0.88, 0.28, 0.32],
    [0.73, -0.9, 0.44]]))
print('Weight 1 layer done')


b_1 = Input((1, 3), torch.tensor([2.9, 6.1, 3.3]).reshape(1, 3))
print('Bias 1 layer done')
w_2 = Input((3, 3), values=torch.tensor([
       [-0.3, 0.66, 0.98],
       [0.58, -0.4, 0.38],
       [0.87, 0.69, -0.4]]))
print('weight 2 layer done')
b_2 = Input((1, 3), values=torch.tensor([3.9, 8.2, 0.8]).reshape(1, 3))
print('bias 2 layer done')
print('\n\n\n')
w_layer_1 = Linear(input_layer, (1,3), w_1, bias=b_1)
w_layer_2 = Linear(w_layer_1, (1,3), w_2, bias=b_2) 
print(f'\nInput layer values: {input_layer.values}\n')
w_layer_1.forward()
print(f'\nFirst weight layer outputs: {w_layer_1.values}\n')
w_layer_2.forward()
print(f'\nSecond weight layer outputs: {w_layer_2.values}\n')


init layer: (1, 4)
Self.nodes.shape: torch.Size([1, 4])
Set value shape: torch.Size([1, 4])
Input layer done
init layer: (4, 3)
Values shape in init: torch.Size([4, 3])
Self.nodes.shape in init: torch.Size([4, 3])
Self.nodes.shape: torch.Size([4, 3])
Set value shape: torch.Size([4, 3])
Weight 1 layer done
init layer: (1, 3)
Values shape in init: torch.Size([1, 3])
Self.nodes.shape in init: torch.Size([1, 3])
Self.nodes.shape: torch.Size([1, 3])
Set value shape: torch.Size([1, 3])
Bias 1 layer done
init layer: (3, 3)
Values shape in init: torch.Size([3, 3])
Self.nodes.shape in init: torch.Size([3, 3])
Self.nodes.shape: torch.Size([3, 3])
Set value shape: torch.Size([3, 3])
weight 2 layer done
init layer: (1, 3)
Values shape in init: torch.Size([1, 3])
Self.nodes.shape in init: torch.Size([1, 3])
Self.nodes.shape: torch.Size([1, 3])
Set value shape: torch.Size([1, 3])
bias 2 layer done




init layer: (1, 3)
Weights shape: torch.Size([4, 3])
Previous layers shape: torch.Size([1, 4])
Self


weights = torch.tensor([
    [0.03, 0.48, 0.77],
    [0.55, 0.31, 0.54],
    [0.88, 0.28, 0.32],
    [0.73, -0.9, 0.44]
])

inputs = torch.tensor([98.6, 95, 0, 1])

bias = torch.tensor([2.9, 6.1, 3.3])

tensor([ 58.8380,  81.9780, 130.9620])



W_2 = torch.tensor([
       [-0.3, 0.66, 0.98],
       [0.58, -0.4, 0.38],
       [0.87, 0.69, -0.4]
])

b_2 = torch.tensor([3.9, 8.2, 0.8])

tensor([147.7328, 104.6057,  37.2281])

Trying a different network

In [16]:
input_layer = Input((8,20))

print('Input layer done')
w_1 = Input((20,10))
print('Weight 1 layer done')


b_1 = Input((1, 10))
print('Bias 1 layer done')
w_2 = Input((10, 100))
print('weight 2 layer done')
b_2 = Input((1, 100))
print('bias 2 layer done')
print('\n\n\n')
w_layer_1 = Linear(input_layer, (1,10), w_1, bias=b_1)
w_layer_2 = Linear(w_layer_1, (1,100), w_2, bias=b_2) 
print(f'\nInput layer values: {input_layer.values}\n')
w_layer_1.forward()
print(f'\nFirst weight layer outputs: {w_layer_1.values}\n')
w_layer_2.forward()
print(f'\nSecond weight layer outputs: {w_layer_2.values}\n')


init layer: (8, 20)
Input layer done
init layer: (20, 10)
Weight 1 layer done
init layer: (1, 10)
Bias 1 layer done
init layer: (10, 100)
weight 2 layer done
init layer: (1, 100)
bias 2 layer done




init layer: (1, 10)
Weights shape: torch.Size([20, 10])
Previous layers shape: torch.Size([8, 20])
Self node shape: torch.Size([1, 10])
Predicted shape: 20, 1
Bias shape: torch.Size([1, 10])
Nodes shape: torch.Size([1, 10])
init layer: (1, 100)
Weights shape: torch.Size([10, 100])
Previous layers shape: torch.Size([1, 10])
Self node shape: torch.Size([1, 100])
Predicted shape: 10, 1
Bias shape: torch.Size([1, 100])
Nodes shape: torch.Size([1, 100])

Input layer values: tensor([[-1.8953, -0.8045, -0.8137, -0.5477,  0.2248, -0.0142, -0.5045,  0.2368,
         -0.0717, -0.4759, -0.6889, -2.3197,  0.9233, -0.5495, -1.1696, -0.1345,
          0.8383, -3.0031, -0.5398, -1.8714],
        [-0.7326,  0.1787, -0.8810,  0.7923, -0.6722, -0.0612,  0.5217,  0.8520,
         -0.5931,  0.8966,  0.0712, 

arbitrary network and batch sizes work :)

This concludes this lab... except for the two **required** parting questions:

### <font color=blue>Question 6: Summarize what you learned during this lab.

Ans: The main thing I learned in this lab is the forward pass of a neural network. Besides this I was also able to refresh my pytorch skills including using tensors and creating various tensors as well. I also had to learn matrix operations using pytorch which will also be useful for the future. Lastly, this lab also allowed me to refresh my Linear Algebra skills. 

### <font color=blue>Question 7: Describe what you liked about this lab *or* what could be improved. (Required.)

Ans: One thing that would be useful to add to this lab would be to really hammer down the point that even weights and biases are supposed to be their own layer. Many students, I believe, tend to think of the weights, bias and activation function as one complete layer. This definitely requires some adjusting and could be reenforced in the lab as well. I really enjoyed that the lab made you walk through it by hand and then programatically. Having the previous test case when testing the classes was super useful.