Making A Simple Neural Network In PyTorch then comparing it to a neural network that will be made from scratch.

In [1]:
import torch 
import torch.nn as nn
import torch.nn.functional as F

In [2]:
class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 5x5 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 5 * 5, 120)  # 5*5 from image dimension
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, input):
        # Convolution layer C1: 1 input image channel, 6 output channels,
        # 5x5 square convolution, it uses RELU activation function, and
        # outputs a Tensor with size (N, 6, 28, 28), where N is the size of the batch
        c1 = F.relu(self.conv1(input))
        # Subsampling layer S2: 2x2 grid, purely functional,
        # this layer does not have any parameter, and outputs a (N, 6, 14, 14) Tensor
        s2 = F.max_pool2d(c1, (2, 2))
        # Convolution layer C3: 6 input channels, 16 output channels,
        # 5x5 square convolution, it uses RELU activation function, and
        # outputs a (N, 16, 10, 10) Tensor
        c3 = F.relu(self.conv2(s2))
        # Subsampling layer S4: 2x2 grid, purely functional,
        # this layer does not have any parameter, and outputs a (N, 16, 5, 5) Tensor
        s4 = F.max_pool2d(c3, 2)
        # Flatten operation: purely functional, outputs a (N, 400) Tensor
        s4 = torch.flatten(s4, 1)
        # Fully connected layer F5: (N, 400) Tensor input,
        # and outputs a (N, 120) Tensor, it uses RELU activation function
        f5 = F.relu(self.fc1(s4))
        # Fully connected layer F6: (N, 120) Tensor input,
        # and outputs a (N, 84) Tensor, it uses RELU activation function
        f6 = F.relu(self.fc2(f5))
        # Gaussian layer OUTPUT: (N, 84) Tensor input, and
        # outputs a (N, 10) Tensor
        output = self.fc3(f6)
        return output

net = Net()
print(net)

Net(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)


### Neural Network From Scratch

Here we will build the fundamentals of Neural Networks from scratch, starting with the perceptron.

### Perceptrons

The perceptron approximates a single neuron with n binary inputs. The perceptron computes a weighted sum of its inputs and "fires" if that weighted sum is zero or greater.

In [3]:
#define the perceptron step function
def step_function(x):
    return 1 if x >= 0 else 0

In [4]:
#build dot product function that will be used
def dot(v, w):
    return sum(v_i * w_i for v_i, w_i in zip(v, w))

In [5]:
def perceptron_output(weights, bias, x):
    """
    returns 1 if the perceptron 'fires' and 0 if not
    """
    calculation = dot(weights, x) + bias
    return step_function(calculation)

### Feed-Forward Neural Networks

There are obvious limitations to a simple perceptron so in order to perform more complicated solutions we incorporate what are called "feed-forward neural networks". These networks consist of discrete layers of neurons, each connected to the next and where each layer feeds into the next layer.

These networks are incorporated using the following layers
- input layer: receives inputs and feeds them forward unchanged
- hidden layer (could be multiple): consists of neurons that take the outputs of the previous layer, performs some calculations and passes the result to the next layer
- output layer: produces the final outputs

Feed-Forward networks also implement a smooth approximation of the step function within perceptrons, this smooth approximation is known as the sigmoid function. This is because in order to train these networks we must implement some calculus and therefore we need continuous functions. The sigmoid function can also be thought of as the logistic function.

In [6]:
import math

#sigmoid function
def sigmoid(t):
    return 1 / (1 + math.exp(-t))

Therefore in order to calculate the output of our neurons, we use:

In [7]:
def neuron_output(weights, inputs):
    return sigmoid(dot(weights, inputs))

Therefore we can think of a neuron and represent it as a list of weights whose length is one more than the number of inputs to that neuron (to account for the bias term).

Then we can represent a neural "network" as a list of layers, where each layer is just a list of the neurons in that layer.

Putting it all together our neural network will be a list (layers) of lists (neurons) of lists (weights).

In [8]:
def feed_forward(neural_network, input_vector):
    """
    takes in a list of neural network (represented as a list
    of lists of lists of weights) and returns the output 
    from forward-propagating the input
    """
    #save the outputs
    outputs = []
    #process one layer at a time
    for layer in neural_network:
        #add a bias to the input
        input_with_bias = input_vector + [1]
        #compute the output for each neuron in layer
        output = [neuron_output(neuron, input_with_bias) for neuron in layer]
        #store the ouptut for each neuron
        outputs.append(output)
        #then the output to the next layer is the output of this one
        input_vector = output
    return outputs

Testing out the network by building an XOR gate (or but not and) which could not be done by using a single perceptron.

In [9]:
#build the xor network
xor_network = [
    #hidden layer
    [[20, 20, -30],
     [20, 20, -10]],
     #output layer
     [[-60, 60, -30]]
]

In [10]:
for x in [0, 1]:
    for y in [0, 1]:
        #feed_forward produces the outputs of every neuron
        #feed_forward[-1] is the outputs of the output-layer neurons
        print(x, y, feed_forward(xor_network, [x, y])[-1])

0 0 [9.38314668300676e-14]
0 1 [0.9999999999999059]
1 0 [0.9999999999999059]
1 1 [9.383146683006828e-14]


As we can see, since we scaled the weights up, the neuron outputs are either really close to 0 or really close to 1. By using a hidden layer, we are able to feed the output of an "and" neuron and the output of an "or" neuron into a "second input but not first input" neuron. This is what makes the XOR gate.

### Backpropagation

In order to build neural networks we must train them on data. One popular approach for training which is very similar to gradient descent is called **backpropagation**.

In order to perform backpropagation we must have: a training set that consists of input vectors and corresponding target output vectors, and our network must have some set of weights that are adjusted following this algorithm...
1. Run `feed_forward` on an input vector to produce the outputs of all the neurons in the network

2. This results in an error for each output neuron (the difference between its output and its target)

3. Compute the gradient of this error as a function of the neurons weights and adjust the weights in the direction that most decreases the error

4. "Propagate" these output errors backward to infer errors for the hidden layer

5. Compute the gradients of these errors and adjust the hidden layer's weights in the same manner

In [11]:
def propagate(network, input_vector, targets):
    #get the outputs
    hidden_outputs, outputs = feed_forward(network, input_vector)
    #the output * (1 - output) is from the derivative of sigmoid
    output_deltas = [output * (1 - output) * (output - target)
                     for output, target in zip(outputs, targets)]
    #adjust the weights for output layer, one neuron at a time
    for i, output_neuron in enumerate(network[-1]):
        #focus on the ith output layer neuron
        for j, hidden_output in enumerate(hidden_outputs + [1]):
            #adjust the jth weight based on both this neurons delta and its jth input
            output_neuron[j] -= output_deltas[i] * hidden_output
    #back propagate errors to hidden layer
    hidden_deltas = [hidden_output * (1 - hidden_output) * 
                     dot(output_deltas, [n[i] for n in network[-1]])
                     for i, hidden_output in enumerate(hidden_outputs)]
    #adjust the weights for hidden layer, one neuron at a time
    for i, hidden_neuron in enumerate(network[0]):
        for j, input in enumerate(input_vector + [1]):
            hidden_neuron[j] -= hidden_deltas[i] * input

It is important to note that this is essentially the very similar to writing out the squared error as a function of the weights and used stochastic gradient descent to minimize the error. This is a fundamental concept in machine learning.

### Neural Network Example

For the following example we will be using our neural network to defeat a simple CAPTCHA, where numbers are represented on a 5x5 grid where every i,j value in the grid is a 0 (meaning this pixel is white in the image) or 1 (meaning this pixel is black in the image).

i.e. The number 0 will be represented by 

[
1, 1, 1, 1, 1\
1, 0, 0, 0, 1\
1, 0, 0, 0, 1\
1, 0, 0, 0, 1\
1, 1, 1, 1, 1
]

We will display our targets from 0 to 9 to be the corresponding index in a list i.e. the correct output for 4 will be: `[0, 0, 0, 0, 1, 0, 0, 0, 0, 0]`

In [12]:
#build out our targets
targets = [[1 if i == j else 0 for i in range(10)] for j in range(10)]

In [13]:
import random

#build random seed
random.seed(0)
#each input is a vector of length 25
input_size = 25
#well have 5 neurons in the hidden layer
num_hidden = 5
#well need 10 outputs for each input
output_size = 10

In [16]:
#each hidden neuron has one weight per input plus a bias weight
hidden_layer = [[random.random() for _ in range(input_size + 1)] for _ in range(num_hidden)]

In [17]:
#each output neuron has one weight per hidden neuron, plus a bias weight
output_layer = [[random.random() for _ in range(num_hidden + 1)] for _ in range(output_size)]

In [18]:
#the network starts out with random weights
network = [hidden_layer, output_layer]

In [19]:
network

[[[0.5759529480315407,
   0.3912094093228269,
   0.3701399403351875,
   0.9805166506472687,
   0.036392037611485795,
   0.021636509855024078,
   0.9610312802396112,
   0.18497194139743833,
   0.12389516442443171,
   0.21057650988664645,
   0.8007465903541809,
   0.9369691586445807,
   0.022782575668658378,
   0.42561883196681716,
   0.10150021937416975,
   0.259919889792832,
   0.22082927131631735,
   0.6469257198353225,
   0.3502939673965323,
   0.18031790152968785,
   0.5036365052098872,
   0.03937870708469238,
   0.10092124118896661,
   0.9882351487225011,
   0.19935579046706298,
   0.35855530131160185],
  [0.7315983062253606,
   0.8383265651934163,
   0.9184820619953314,
   0.16942460609746768,
   0.6726405635730526,
   0.9665489030431832,
   0.05805094382649867,
   0.6762017842993783,
   0.8454245937016164,
   0.342312541078584,
   0.25068733928511167,
   0.596791393469411,
   0.44231403369907896,
   0.17481948445144113,
   0.47162541509628797,
   0.40990539565755457,
   0.5691127