## **The Forward and Backward Passes of a Simple MLP / Neural Network**

In [23]:
import pickle, gzip, math, os, time, shutil
import numpy as np
import matplotlib as mpl
import torch
from torch import tensor
from fastcore.test import  test_close
from pathlib import Path

# Configs
torch.manual_seed(42)
mpl.rcParams['image.cmap'] = 'gray'
torch.set_printoptions(precision=2, linewidth=125, sci_mode=False)
np.set_printoptions(precision=2, linewidth=125)

# Path setup
path_data = Path('data')
path_gz = path_data/'mnist.pkl.gz'
with gzip.open(path_gz, 'rb') as f:
    ((x_train, y_train), (x_valid, y_valid), _) = pickle.load(f, encoding='latin-1')
# Loading MNIST data as tensors
x_train, y_train, x_valid, y_valid = map(tensor, [x_train, y_train, x_valid, y_valid])

## Building the Foundations

### Basic Architecture

In [24]:
# Here n is the number of training examples and m is the number of pixels
n, m = x_train.shape
# C is the number of possible values of our outputs / digits
c = y_train.max() + 1
n, m, c

(50000, 784, tensor(10))

We will decide ahead of time what the number of our hidden **activations** or **nodes** for a single layer will be. Lets pick an arbitrary number...

In [25]:
# Number of hidden activations
nh = 50

As we have already established, we will be utilizing what we've learnt about matrix multiplications to get our output probabilities for each input row. We already have training data in a **`50,000x784`** matrix. For the linear function to work, we will need weights and biases.

In [32]:
# The weight matrix will contain 50000x50 random values
w1 = torch.randn(m ,nh)
# Adding the biases for summation operations to create the linear function.
# Creating a matrix of 0s, one for each hidden activation
b1 = torch.zeros(nh)
# For layer two, we will start with nh inputs.
# But, we will stick with 1 output column to simplify the loss calcs. Which will be MSE instead of cross entropy.
w2 = torch.randn(nh, 1)
#  The sample applies for the bias i.e. sticking with a single output for simplicity.
b2 = torch.zeros(1)


In [33]:
# Creating a simple linear function for putting X through a single layer
def lin(x, w, b): return x@w + b

In [34]:
# Verifying the shape of our validation matrix
x_valid.shape

torch.Size([10000, 784])

In [35]:
# Passing the validation data through the linear function with result in a 10000x50 matrix
t = lin(x_valid, w1, b1)
t.shape

torch.Size([10000, 50])

In [36]:
# Passing the results through a ReLU
def relu(x): return x.clamp_min(0.)

In [37]:
t = relu(t)
t

tensor([[ 0.00,  0.00,  0.00,  ...,  4.04,  0.00,  0.00],
        [ 3.56,  0.00,  0.00,  ...,  0.88,  3.10,  0.00],
        [ 6.66,  4.48,  0.00,  ...,  5.81,  8.84,  3.07],
        ...,
        [16.45,  0.00,  0.00,  ..., 11.17,  0.00,  0.00],
        [ 7.03,  0.51,  0.00,  ...,  8.39,  0.00,  3.69],
        [10.94,  0.00,  0.00,  ...,  0.00,  0.00, 11.96]])

In [38]:
# A basic MLP which takes a mini-batch of data
def model(xb):
    l1 = lin(xb, w1, b1)
    l2 = relu(l1)
    return lin(l2, w2, b2)

In [40]:
# The model will now output a single column output
# Again, this is only for simplicity since we will be testing the model using MSE.
res = model(x_valid)
res.shape

torch.Size([10000, 1])

### Simple Loss Function: Mean Square Error

MSE is not a suitable loss function for multi-class classification. The simplicity here is for demonstration purposes only.

In [41]:
res.shape, y_valid.shape

(torch.Size([10000, 1]), torch.Size([10000]))

In [46]:
(res - y_valid).shape

torch.Size([10000, 1])

The calculation above is not correct. At the moment, based on broadcasting rules, each of our 10000 row items will be subtracted by the 10000 items in the array. PyTorch will automatically insert a unit x-axis in `y_valid`. This is not the shape of outputs that we desire!

For correct element-wise operations, we will get rid of the trailing unit axis in `res`

In [56]:
# One option is to use indexing to remove the trailing dimension
res[:, 0].shape

torch.Size([10000])

In [57]:
# Or we can use squeeze() to drop all trailing unit dimensions
res.squeeze().shape

torch.Size([10000])

In [58]:
# Demo: Adding arbitrary unit dimensions
test = res[None, :, None, None]
test.shape

torch.Size([1, 10000, 1, 1, 1])

In [59]:
# Using squeeze()
test.squeeze().shape

torch.Size([10000])

In [60]:
# Our results will now have the correct shape
(res.squeeze() - y_valid).shape

torch.Size([10000])

In [61]:
# Lets calculate our predictions for the training set
# Converting to float for MSE
y_train, y_valid = y_train.float(), y_valid.float()

preds = model(x_train)
preds.shape

torch.Size([50000, 1])

In [62]:
# Creating an MSE function
# Alternatively, we can also use out[:, 0] instead of squeeze as we have already established
def mse(output, target): return (output.squeeze() - target).pow(2).mean()

In [63]:
mse(preds, y_train)

tensor(1549.37)