# PyTorch MLP example

In this notebook we will show an example of how to implement an MLP in PyTorch. Firstly, import PyTorch:

In [1]:
import torch

Define the training data with `torch.tensor`. Most often you have to use `.float()` to avoid error messages when training.

In [2]:
# Data for XOR
x = torch.tensor([[0, 0],
                  [0, 1],
                  [1, 0],
                  [1, 1]]).float()
t = torch.tensor([[0],
                  [1],
                  [1],
                  [0]]).float()

A network structure is created by defining a class which extends the `torch.nn.Module` class. In the initialization, the layers should be defined. The linear layer is the one you are familiar with from previous exercises. It can be initialized with `torch.nn.Linear(n_inputs, n_units)`. Here, we create a network with a single hidden layer. The `forward` function defines how the forward propagation, that is how to compute the network output given the input and layers. PyTorch will automatically derive the gradients for backpropagation using this forward definition.

In [4]:
# Define neural network
class Net(torch.nn.Module):
    def __init__(self, n_feature, n_hidden, n_output):
        super(Net, self).__init__()
        self.hidden = torch.nn.Linear(n_feature, n_hidden)
        self.predict = torch.nn.Linear(n_hidden, n_output)

    def forward(self, x):
        x = torch.sigmoid(self.hidden(x))
        x = self.predict(x)
        return x

Now we can instantiate our model using the class defined above:

In [5]:
model = Net(n_feature=2, n_hidden=4, n_output=1)

We define the loss function to use, in this case the Mean Square Error (MSE).

In [6]:
loss_func = torch.nn.MSELoss()

We also have to define the optimizer. We will use the stochastic gradient descent as in our own implementation, with learning rate 0.1.

In [7]:
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)

Next we define a loop to train the network. This consists of 5 steps, each explained in the comments below:

In [8]:
num_epochs = 100

for _ in range(num_epochs):
    prediction = model(x) # Forward pass prediction. Saves intermediary values required for backwards pass
    loss = loss_func(prediction, t) # Computes the loss for each example, using the loss function defined above
    optimizer.zero_grad() # Clears gradients from previous iteration
    loss.backward() # Backpropagation of errors through the network
    optimizer.step() # Updating weights

    #print("Prediction: ", prediction.detach().numpy())
    print("Loss: ", loss.detach().numpy())


Loss:  0.26167148
Loss:  0.25482735
Loss:  0.25218737
Loss:  0.25116548
Loss:  0.25076747
Loss:  0.25061008
Loss:  0.25054568
Loss:  0.25051713
Loss:  0.2505025
Loss:  0.25049326
Loss:  0.25048608
Loss:  0.25047976
Loss:  0.2504737
Loss:  0.25046784
Loss:  0.25046209
Loss:  0.2504563
Loss:  0.2504506
Loss:  0.25044492
Loss:  0.25043923
Loss:  0.2504336
Loss:  0.25042802
Loss:  0.25042242
Loss:  0.25041682
Loss:  0.25041133
Loss:  0.2504058
Loss:  0.25040027
Loss:  0.25039485
Loss:  0.25038937
Loss:  0.25038397
Loss:  0.25037855
Loss:  0.25037318
Loss:  0.25036782
Loss:  0.25036246
Loss:  0.25035715
Loss:  0.25035188
Loss:  0.25034657
Loss:  0.25034133
Loss:  0.2503361
Loss:  0.2503309
Loss:  0.25032568
Loss:  0.25032052
Loss:  0.2503154
Loss:  0.25031027
Loss:  0.25030518
Loss:  0.25030008
Loss:  0.250295
Loss:  0.25028995
Loss:  0.25028494
Loss:  0.25027993
Loss:  0.25027496
Loss:  0.25027
Loss:  0.250265
Loss:  0.25026006
Loss:  0.25025517
Loss:  0.25025028
Loss:  0.2502454
Loss:  0.

## Additional optimizer options
For faster convergence, the Adam optimizer can be useful. It employs adaptive learning rates specific to each weight. In this exampe you can see that the loss decreases much faster when using Adam:

In [9]:
model = Net(n_feature=2, n_hidden=4, n_output=1)
optimizer = torch.optim.Adam(model.parameters(), lr = 0.1)

num_epochs = 100

for _ in range(num_epochs):
    prediction = model(x)
    loss = loss_func(prediction, t)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    print("Loss: ", loss.detach().numpy())

Loss:  0.41995543
Loss:  0.2522374
Loss:  0.30000538
Loss:  0.33381772
Loss:  0.3005984
Loss:  0.26097837
Loss:  0.2492712
Loss:  0.2626231
Loss:  0.27919346
Loss:  0.28276792
Loss:  0.2728445
Loss:  0.2584318
Loss:  0.24892366
Loss:  0.24847585
Loss:  0.25455806
Loss:  0.26072598
Loss:  0.2618304
Loss:  0.25737414
Loss:  0.2507068
Loss:  0.24587166
Loss:  0.24489293
Loss:  0.24693891
Loss:  0.24940056
Loss:  0.24988283
Loss:  0.24768868
Loss:  0.24394368
Loss:  0.24056448
Loss:  0.23890564
Loss:  0.23893836
Loss:  0.23943803
Loss:  0.23899789
Loss:  0.23706794
Loss:  0.23422623
Loss:  0.23158133
Loss:  0.22987819
Loss:  0.22900188
Loss:  0.22818862
Loss:  0.22669646
Loss:  0.22436437
Loss:  0.22164366
Loss:  0.21915263
Loss:  0.21716228
Loss:  0.21542373
Loss:  0.21344072
Loss:  0.21091351
Loss:  0.20796216
Loss:  0.20496595
Loss:  0.20221032
Loss:  0.19966818
Loss:  0.19708279
Loss:  0.19423817
Loss:  0.19115537
Loss:  0.1880534
Loss:  0.18514189
Loss:  0.18245135
Loss:  0.17984718
L

# Saving and loading models
To save the model weights

In [10]:
torch.save(model.state_dict(), 'filename.pth')

To load into a moedl:

In [11]:
model.load_state_dict(torch.load('filename.pth'))

<All keys matched successfully>