### Lab 3.1: Basic Neural Network in PyTorch - Solution

Let's create a linear classifier one more time, but using PyTorch's automatic differentiation and optimization algorithms.  Then you will extend the perceptron into a multi-layer perceptron (MLP).

In [635]:
import numpy as np
import torch

We need to explicitly tell PyTorch when creating a tensor that we are interested in later computing its gradient

In [636]:
a = torch.tensor(5.,requires_grad=True)
a

tensor(5., requires_grad=True)

In [637]:
b = torch.tensor(6.,requires_grad=True)
c = 2*a+3*b
c

tensor(28., grad_fn=<AddBackward0>)

To extract the gradients, we first need to call `backward()`.

In [638]:
c.backward()

Now to get the gradient of any variable with respect to `c`, we simply access the `grad` attribute of that variable.

In [639]:
a.grad

tensor(2.)

In [640]:
b.grad

tensor(3.)

Let's load and format the Palmer penguins dataset for multi-class classification.

In [641]:
from palmerpenguins import load_penguins
from matplotlib import pyplot as plt

In [642]:
df = load_penguins()

# drop rows with missing values
df.dropna(inplace=True)

# get two features
X = df[['flipper_length_mm','bill_length_mm']].values

# convert species labels to integers
y = df['species'].map({'Adelie':0,'Chinstrap':1,'Gentoo':2}).values

To make the learning algorithm work more smoothly, we we will subtract the mean of each feature.

Here `np.mean` calculates a mean, and `axis=0` tells NumPy to calculate the mean over the rows (calculate the mean of each column).

In [643]:
X -= np.mean(X,axis=0)

Now we will convert our `X` and `y` arrays to torch Tensors.

In [644]:
X = torch.tensor(X).float()
y = torch.tensor(y).long()

In [645]:
from torch import nn

The `torch.nn.Sequential` class creates a feed-forward network from a list of `nn.Module` objects.  Here we provide a single `nn.Linear` class which performs an affine transformation ($Wx+b$) so that we will have a linear classifier.

In [646]:
linear_model = torch.nn.Sequential(
    torch.nn.Linear(2,3), # two inputs, three outputs
)

Now we create a cross-entropy loss function object and a stochastic gradient descent (SGD) optimizer.

In [647]:
loss_fn = torch.nn.CrossEntropyLoss()

In [648]:
lr = 1e-2
opt = torch.optim.SGD(linear_model.parameters(), lr=lr)

Finally we can iteratively optimize the model.

In [649]:
epochs = 100
for epoch in range(epochs):
    opt.zero_grad() # zero out the gradients

    z = linear_model(X) # compute z values
    loss = loss_fn(z,y) # compute loss

    loss.backward() # compute gradients

    opt.step() # apply gradients

    print(f'epoch {epoch}: loss is {loss.item()}')

epoch 0: loss is 11.014744758605957
epoch 1: loss is 8.893004417419434
epoch 2: loss is 6.873161792755127
epoch 3: loss is 5.09170389175415
epoch 4: loss is 3.690431594848633
epoch 5: loss is 2.5843398571014404
epoch 6: loss is 1.7286109924316406
epoch 7: loss is 1.189647912979126
epoch 8: loss is 0.9589129686355591
epoch 9: loss is 0.8545036315917969
epoch 10: loss is 0.7861957550048828
epoch 11: loss is 0.7327059507369995
epoch 12: loss is 0.6872875690460205
epoch 13: loss is 0.6471538543701172
epoch 14: loss is 0.6109429597854614
epoch 15: loss is 0.5779216885566711
epoch 16: loss is 0.5476758480072021
epoch 17: loss is 0.5199657082557678
epoch 18: loss is 0.4946480989456177
epoch 19: loss is 0.4716269373893738
epoch 20: loss is 0.4508169889450073
epoch 21: loss is 0.4321153461933136
epoch 22: loss is 0.4153860807418823
epoch 23: loss is 0.4004614055156708
epoch 24: loss is 0.3871553838253021
epoch 25: loss is 0.3752804696559906
epoch 26: loss is 0.3646599054336548
epoch 27: loss is

### Exercises

Extend the above code to implement an MLP with a single hidden layer of size 100.

Write code to compute the accuracy of each model.

Can you get the MLP to outperform the linear model?

In [650]:
# new changes
mlp_model = torch.nn.Sequential(
    torch.nn.Linear(2, 100), # 2 inputs, 1 hidden layer of size 100
    
    # hidden activation function, the magic happens
    torch.nn.ReLU(),
    
    torch.nn.Linear(100, 3) # 100 inputs, 3 outputs
)

In [651]:
# yippie, we have the model

# time to optimize the loss function
lr = 1e-2
opt = torch.optim.SGD(mlp_model.parameters(), lr=lr)

In [652]:
epochs = 100
for epoch in range(epochs):
    opt.zero_grad() # zero out the gradients

    z = mlp_model(X) # compute z values
    loss = loss_fn(z,y) # compute loss

    loss.backward() # compute gradients

    opt.step() # apply gradients

    print(f'epoch {epoch}: loss is {loss.item()}')

epoch 0: loss is 1.6419166326522827
epoch 1: loss is 0.6736262440681458
epoch 2: loss is 0.504383385181427
epoch 3: loss is 0.39080289006233215
epoch 4: loss is 0.320997416973114
epoch 5: loss is 0.2786753177642822
epoch 6: loss is 0.2524169087409973
epoch 7: loss is 0.23527199029922485
epoch 8: loss is 0.22316941618919373
epoch 9: loss is 0.21385875344276428
epoch 10: loss is 0.20623008906841278
epoch 11: loss is 0.1997596174478531
epoch 12: loss is 0.19416135549545288
epoch 13: loss is 0.18925464153289795
epoch 14: loss is 0.18491105735301971
epoch 15: loss is 0.18103379011154175
epoch 16: loss is 0.17754794657230377
epoch 17: loss is 0.17439445853233337
epoch 18: loss is 0.17152613401412964
epoch 19: loss is 0.16890431940555573
epoch 20: loss is 0.16649754345417023
epoch 21: loss is 0.16427981853485107
epoch 22: loss is 0.1622294932603836
epoch 23: loss is 0.16032828390598297
epoch 24: loss is 0.15856070816516876
epoch 25: loss is 0.15691353380680084
epoch 26: loss is 0.155375391244

In [653]:
# Compute accuracy of the model
def accuracy(model, X, y):
    
    # Set model to evaluation mode 
    model.eval()


    z = model(X)
    
    # First dimension of outputs are the samples (dim = 0)
    # Second dimension of outputs are the labels (dim = 1)
    # Get the highest predicted labels value for each sample in the 
    sample, predicted_labels = torch.max(z, dim=1)


    # Calculate the accuracy (the number of correct predictions divided by total number of samples)
    correct = (predicted_labels == y).sum().item()

    # size(0) refers to first dimension, which are the samples (dim = 0)
    total = y.size(0) 

    return correct/total

print(f"Linear Classifier Accuracy {accuracy(linear_model, X, y)}")
print(f"MLP Classifier Accuracy {accuracy(mlp_model, X, y)}")

Linear Classifier Accuracy 0.9309309309309309
MLP Classifier Accuracy 0.9579579579579579
