### Lab 3.1: Basic Neural Network in PyTorch - Solution

Let's create a linear classifier one more time, but using PyTorch's automatic differentiation and optimization algorithms.  Then you will extend the perceptron into a multi-layer perceptron (MLP).

In [77]:
import numpy as np
import torch

We need to explicitly tell PyTorch when creating a tensor that we are interested in later computing its gradient

In [78]:
a = torch.tensor(5.,requires_grad=True)
a

tensor(5., requires_grad=True)

In [79]:
b = torch.tensor(6.,requires_grad=True)
c = 2*a+3*b
c

tensor(28., grad_fn=<AddBackward0>)

To extract the gradients, we first need to call `backward()`.

In [80]:
c.backward()

Now to get the gradient of any variable with respect to `c`, we simply access the `grad` attribute of that variable.

In [81]:
a.grad

tensor(2.)

In [82]:
b.grad

tensor(3.)

Let's load and format the Palmer penguins dataset for multi-class classification.

In [83]:
from palmerpenguins import load_penguins
from matplotlib import pyplot as plt

In [84]:
df = load_penguins()

# drop rows with missing values
df.dropna(inplace=True)

# get two features
X = df[['flipper_length_mm','bill_length_mm']].values

# convert species labels to integers
y = df['species'].map({'Adelie':0,'Chinstrap':1,'Gentoo':2}).values

To make the learning algorithm work more smoothly, we we will subtract the mean of each feature.

Here `np.mean` calculates a mean, and `axis=0` tells NumPy to calculate the mean over the rows (calculate the mean of each column).

In [85]:
X -= np.mean(X,axis=0)

Now we will convert our `X` and `y` arrays to torch Tensors.

In [86]:
X = torch.tensor(X).float()
y = torch.tensor(y).long()

In [87]:
from torch import nn

The `torch.nn.Sequential` class creates a feed-forward network from a list of `nn.Module` objects.  Here we provide a single `nn.Linear` class which performs an affine transformation ($Wx+b$) so that we will have a linear classifier.

In [88]:
linear_model = torch.nn.Sequential(
    torch.nn.Linear(2,3), # two inputs, three outputs
)

Now we create a cross-entropy loss function object and a stochastic gradient descent (SGD) optimizer.

In [89]:
loss_fn = torch.nn.CrossEntropyLoss()

In [90]:
lr = 1e-2
opt = torch.optim.SGD(linear_model.parameters(), lr=lr)

Finally we can iteratively optimize the model.

In [91]:
epochs = 100
for epoch in range(epochs):
    opt.zero_grad() # zero out the gradients

    z = linear_model(X) # compute z values
    loss = loss_fn(z,y) # compute loss

    loss.backward() # compute gradients

    opt.step() # apply gradients

    print(f'epoch {epoch}: loss is {loss.item()}')

epoch 0: loss is 1.1159321069717407
epoch 1: loss is 0.6858724355697632
epoch 2: loss is 0.4304100573062897
epoch 3: loss is 0.31856605410575867
epoch 4: loss is 0.27364620566368103
epoch 5: loss is 0.25274404883384705
epoch 6: loss is 0.2410580962896347
epoch 7: loss is 0.2334849089384079
epoch 8: loss is 0.22800186276435852
epoch 9: loss is 0.2236955165863037
epoch 10: loss is 0.2201092392206192
epoch 11: loss is 0.21699605882167816
epoch 12: loss is 0.21421390771865845
epoch 13: loss is 0.2116764783859253
epoch 14: loss is 0.20932897925376892
epoch 15: loss is 0.20713503658771515
epoch 16: loss is 0.20506957173347473
epoch 17: loss is 0.2031146138906479
epoch 18: loss is 0.20125681161880493
epoch 19: loss is 0.1994858682155609
epoch 20: loss is 0.19779366254806519
epoch 21: loss is 0.19617345929145813
epoch 22: loss is 0.19461971521377563
epoch 23: loss is 0.19312767684459686
epoch 24: loss is 0.19169317185878754
epoch 25: loss is 0.1903124898672104
epoch 26: loss is 0.1889824569225

### Exercises

Extend the above code to implement an MLP with a single hidden layer of size 100.

Write code to compute the accuracy of each model.

Can you get the MLP to outperform the linear model?

In [92]:
# new changes
mlp_model = torch.nn.Sequential(
    torch.nn.Linear(2, 100), # 2 inputs, 1 hidden layer of size 100
    
    # hidden activation function, the magic happens
    torch.nn.ReLU(),
    
    torch.nn.Linear(100, 3) # 100 inputs, 3 outputs
)

In [93]:
# yippie, we have the model

# time to optimize the loss function
lr = 1e-2
opt = torch.optim.SGD(mlp_model.parameters(), lr=lr)

In [95]:
epochs = 100
for epoch in range(epochs):
    opt.zero_grad() # zero out the gradients

    z = mlp_model(X) # compute z values
    loss = loss_fn(z,y) # compute loss

    loss.backward() # compute gradients

    opt.step() # apply gradients

    print(f'epoch {epoch}: loss is {loss.item()}')

epoch 0: loss is 0.12434127181768417
epoch 1: loss is 0.12423999607563019
epoch 2: loss is 0.12414035201072693
epoch 3: loss is 0.1240423172712326
epoch 4: loss is 0.12394576519727707
epoch 5: loss is 0.12385069578886032
epoch 6: loss is 0.12375707179307938
epoch 7: loss is 0.12366484105587006
epoch 8: loss is 0.12357396632432938
epoch 9: loss is 0.12348445504903793
epoch 10: loss is 0.12339618057012558
epoch 11: loss is 0.1233091950416565
epoch 12: loss is 0.12322341650724411
epoch 13: loss is 0.12313883751630783
epoch 14: loss is 0.12305545806884766
epoch 15: loss is 0.12297313660383224
epoch 16: loss is 0.12289175391197205
epoch 17: loss is 0.12281142920255661
epoch 18: loss is 0.12273216992616653
epoch 19: loss is 0.12265397608280182
epoch 20: loss is 0.12257673591375351
epoch 21: loss is 0.12250050157308578
epoch 22: loss is 0.12242519110441208
epoch 23: loss is 0.12235084921121597
epoch 24: loss is 0.1222773939371109
epoch 25: loss is 0.12220485508441925
epoch 26: loss is 0.12213

Yes, my MLP model is able to outperform the linear model as it has a lower loss value. This means that my MLP model has a higher accuracy than the linear model. 