# Concise Implementation of Multilayer Perceptron

Now that we learned how multilayer perceptrons (MLPs) work in theory, let’s implement them. We begin, as always, by
importing modules.

In [1]:
import sys
sys.path.insert(0, '..')
import d2l
from d2l.data import load_data_fashion_mnist
from d2l.train import train_ch3

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

# The Model

The only difference from our softmax regression implementation is that we add two Dense (fully-connected) layers
instead of one. The first is our hidden layer, which has 256 hidden units and uses the ReLU activation function.

In [2]:
class Net(nn.Module):
    def __init__(self, num_inputs = 784, num_outputs = 10, num_hiddens = 256, is_training = True):
        super(Net, self).__init__()
        
        self.num_inputs = num_inputs
        self.num_outputs = num_outputs
        self.num_hiddens = num_hiddens
        
        self.linear_1 = nn.Linear(num_inputs, num_hiddens)
        self.linear_2 = nn.Linear(num_hiddens, num_outputs)
        
        self.relu = nn.ReLU()
    def forward(self, X):
        X = X.reshape((-1, self.num_inputs))
        H1 = self.relu(self.linear_1(X))
        out = self.linear_2(H1)
        return out   
    
net = Net()
print(net) 

Net(
  (linear_1): Linear(in_features=784, out_features=256, bias=True)
  (linear_2): Linear(in_features=256, out_features=10, bias=True)
  (relu): ReLU()
)


Training the model follows the exact same steps as in our softmax regression implementation.

In [3]:
num_epochs, lr, batch_size = 10, 0.5, 256
train_iter, test_iter = load_data_fashion_mnist(batch_size)
criterion = nn.CrossEntropyLoss()
train_ch3(net, train_iter, test_iter, criterion, num_epochs, batch_size, lr)

epoch 1, loss 0.0030, train acc 0.718, test acc 0.673
epoch 2, loss 0.0019, train acc 0.818, test acc 0.834
epoch 3, loss 0.0016, train acc 0.844, test acc 0.832
epoch 4, loss 0.0015, train acc 0.857, test acc 0.838
epoch 5, loss 0.0014, train acc 0.865, test acc 0.786
epoch 6, loss 0.0014, train acc 0.870, test acc 0.847
epoch 7, loss 0.0013, train acc 0.878, test acc 0.856
epoch 8, loss 0.0013, train acc 0.882, test acc 0.859
epoch 9, loss 0.0012, train acc 0.884, test acc 0.834
epoch 10, loss 0.0012, train acc 0.889, test acc 0.860


# Exercises

1. Try adding a few more hidden layers to see how the result changes.
2. Try out different activation functions. Which ones work best?
3. Try out different initializations of the weights.

# References

[1] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). JMLR