# Neural Networks

To overcome the limitations of linear regression, a multilayer perceptron adds a hidden layer that first processes the input matrix by means of a (non-linear) activation function and then feeds the output of the hidden layer to the output layer, which will perform the classification.

In [None]:
!pip install d2l==1.0.0a0

In [None]:
import torch
from torch import nn
from d2l import torch as d2l

Start by creating a multilayer perceptron by defining its weights and activation function.

In [None]:
class MLPScratch(d2l.Classifier):
    def __init__(self, num_inputs, num_outputs, num_hiddens, lr, sigma=0.01):
        super().__init__()
        self.save_hyperparameters()
        self.W1 = nn.Parameter(torch.randn(num_inputs, num_hiddens) * sigma)
        self.b1 = nn.Parameter(torch.zeros(num_hiddens))
        self.W2 = nn.Parameter(torch.randn(num_hiddens, num_outputs) * sigma)
        self.b2 = nn.Parameter(torch.zeros(num_outputs))

In [None]:
def relu(X):
    a = torch.zeros_like(X)
    return torch.max(X, a)

Add the feedforward functions to the multilayer perceptron. <br>
The PyTorch framework already implements the backward pass by symbolic autodifferentiation.

In [None]:
@d2l.add_to_class(MLPScratch)
def forward(self, X):
    X = X.reshape((-1, self.num_inputs))
    H = relu(torch.matmul(X, self.W1) + self.b1)
    return torch.matmul(H, self.W2) + self.b2

Train the model on the MNIST Fashion dataset. <br>
Generally speaking, the training step provides a trend of the training loss, which represents the errors associated to the training procedure, and the validation step, which represents the errors associated to hyperparameter tuning. <br>
A model is said to "overfit" the data if it does very well on the training dataset but cannot generalize for unseen testing inputs: whenever this happens, the training loss decreases but the validation step stays constant. <br>
On the other hand, a model is said to "underfit" the data if too much importance is given to the regularization: whenever this happens, the validation loss decreases but the training loss stays constant.

In [None]:
model = MLPScratch(num_inputs=784, num_outputs=10, num_hiddens=256, lr=0.1)
data = d2l.FashionMNIST(batch_size=256)
trainer = d2l.Trainer(max_epochs=10)
trainer.fit(model, data)

When working with images, convolutional neural networks are more convenient to implement compared to multilayer perceptrons. <br>
Generally speaking, a convolutional neural network takes an input image and, at the convolutional layer, convolves it with a (small) kernel in order to produce an activation map containing the extracted features.

In [None]:
def corr2d(X, K):  #@save
    # This function computes 2D convolution between the input image X and the kernel K.
    h, w = K.shape
    Y = torch.zeros((X.shape[0] - h + 1, X.shape[1] - w + 1))
    for i in range(Y.shape[0]):
        for j in range(Y.shape[1]):
            Y[i, j] = (X[i:i + h, j:j + w] * K).sum()
    return Y

In [None]:
X = torch.tensor([[0.0, 1.0, 2.0], [3.0, 4.0, 5.0], [6.0, 7.0, 8.0]]) # Input tensor.
K = torch.tensor([[0.0, 1.0], [2.0, 3.0]]) # Kernel tensor.
corr2d(X, K) # Activation map obtained by applying convolution.

In [None]:
class Conv2D(nn.Module):
    def __init__(self, kernel_size):
        super().__init__()
        self.weight = nn.Parameter(torch.rand(kernel_size))
        self.bias = nn.Parameter(torch.zeros(1))

    def forward(self, x):
        return corr2d(x, self.weight) + self.bias

A simple application of 2D convolution consists in edge detection, which is computed using the first derivative of the input image.

In [None]:
X = torch.ones((6, 8))
X[:, 2:6] = 0
X

K = torch.tensor([[1.0, -1.0]])

Y = corr2d(X, K)
Y

# Try to perform the convolution on the transposed image.
corr2d(X.t(), K) # Since K only detects vertical edges, nothing will be detected on the transposed image.

Now, try to learn the kernel that generates a filtered image from a given input.

In [None]:
# Construct a two-dimensional convolutional layer with 1 output channel and a kernel of shape 1x2 (ignore the bias).
conv2d = nn.LazyConv2d(1, kernel_size=(1, 2), bias=False)

# The two-dimensional convolutional layer uses four-dimensional input and output in the format of (example, channel, height, width), where the batch size (number of examples in the batch) and the number of channels are both 1.
X = X.reshape((1, 1, 6, 8))
Y = Y.reshape((1, 1, 6, 7))
lr = 3e-2  # Learning rate for optimization.

for i in range(10):
    Y_hat = conv2d(X)
    l = (Y_hat - Y) ** 2
    conv2d.zero_grad()
    l.sum().backward()
    # Update the kernel.
    conv2d.weight.data[:] -= lr * conv2d.weight.grad
    if (i + 1) % 2 == 0:
        print(f'epoch {i + 1}, loss {l.sum():.3f}')

In [None]:
conv2d.weight.data.reshape((1, 2)) # Update the kernel using the learned values.

# Exercises

## Exercise 1
Create an image with diagonal edges.
1. Apply the edge detection kernel to the image.
2. What happens when the image is transposed?
3. What happens when the kernel is transposed?

In [None]:
# Start by creating an input image.
X = torch.eyes(8) # Identity tensor of size 8x8.

# Now, define the edge detection kernel.
K_1 = torch.tensor([[0.0, -1.0], [1.0, 0.0]) # Filter to detect the main diagonal.
K_2 = torch.tensor([[0.0, 1.0], [-1.0, 0.0]) # Filter to detect the anti-diagonal.

# Now, apply convolution to find the edges.
corr2d(X, K_1)
# corr2d(X, K_2)

# Now, try to transpose X.
corr2d(X.t(), K_1)
# corr2d(X.t(), K_2)

# Lastly, try to transpose K.
corr2d(X, K_1.t())
# corr2d(X, K_2.t())