# Implementation of Multilayer Perceptrons from Scratch

Now that we have characterized
multilayer perceptrons (MLPs) mathematically,
let us try to implement one ourselves. To compare against our previous results
achieved with softmax regression,
we will continue to work with
the Fashion-MNIST image classification dataset.


In [1]:
#This line is only necessary, if you work with google colab
! pip install d2l

import torch
from torch import nn
from d2l import torch as d2l



  Referenced from: <ABE0EE74-6D97-3B8C-B690-C44754774FBC> /Users/jaschob/miniconda3/envs/d2l/lib/python3.8/site-packages/torchvision/image.so
  warn(


In [2]:
batch_size = 256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)

## Initializing Model Parameters

Recall that Fashion-MNIST contains 10 classes,
and that each image consists of a $28 \times 28 = 784$
grid of grayscale pixel values.
Again, we will disregard the spatial structure
among the pixels for now,
so we can think of this as simply a classification dataset
with 784 input features and 10 classes.
To begin, we will **implement an MLP
with one hidden layer and 256 hidden units.**
Note that we can regard both of these quantities
as hyperparameters.
Typically, we choose layer widths in powers of 2,
which tend to be computationally efficient because
of how memory is allocated and addressed in hardware.

Again, we will represent our parameters with several tensors.
Note that *for every layer*, we must keep track of
one weight matrix and one bias vector.
As always, we allocate memory
for the gradients of the loss with respect to these parameters.


In [3]:
num_inputs, num_outputs, num_hiddens = 784, 10, 256

W1 = nn.Parameter(
    torch.randn(num_inputs, num_hiddens, requires_grad=True) * 0.01)
b1 = nn.Parameter(torch.zeros(num_hiddens, requires_grad=True))
W2 = nn.Parameter(
    torch.randn(num_hiddens, num_outputs, requires_grad=True) * 0.01)
b2 = nn.Parameter(torch.zeros(num_outputs, requires_grad=True))

params = [W1, b1, W2, b2]

## Activation Function

To make sure we know how everything works,
we will **implement the ReLU activation** ourselves
using the maximum function rather than
invoking the built-in `relu` function directly.


In [4]:
def relu(X):
    a = torch.zeros_like(X)
    return torch.max(X, a)

## Model

Because we are disregarding spatial structure,
we `reshape` each two-dimensional image into
a flat vector of length  `num_inputs`.
Finally, we **implement our model**
with just a few lines of code.


In [5]:
def net(X):
    X = X.reshape((-1, num_inputs))
    H = relu(X @ W1 + b1)  # Here '@' stands for matrix multiplication
    return (H @ W2 + b2)

## Loss Function

To ensure numerical stability,
and because we already implemented
the softmax function from scratch,
we leverage the integrated function from high-level APIs
for calculating the softmax and cross-entropy loss.
We encourage the interested reader
to examine the source code for the loss function
to deepen their knowledge of implementation details.


In [6]:
loss = nn.CrossEntropyLoss()

## Training

Fortunately, **the training loop for MLPs
is exactly the same as for softmax regression.**
Leveraging the `d2l` package again,
we call the `train_ch3` function,
setting the number of epochs to 10
and the learning rate to 0.1.


In [7]:
num_epochs, lr = 10, 0.1
updater = torch.optim.SGD(params, lr=lr)
d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, updater)

AttributeError: module 'd2l.torch' has no attribute 'train_ch3'

To evaluate the learned model,
we **apply it on some test data**.


In [None]:
d2l.predict_ch3(net, test_iter)

## Summary

* We saw that implementing a simple MLP is easy, even when done manually.
* However, with a large number of layers, implementing MLPs from scratch can still get messy (e.g., naming and keeping track of our model's parameters).


## Exercises

1. Try adding an additional hidden layer with a different number of neurons than the previous one. Use the tangens hyperbolicus activation function after the second hidden layer and see how it affects the results

[Discussions](https://discuss.d2l.ai/t/93)
