Inspired by the PyTorch tutorial on the official website, this is a simple example of a neural network that learns to predict the output of a simple function.


https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html

Importing the following libraries:

- torch: PyTorch, the ML library we're using
- torch.nn: Contains neural network classes and nn.Functional for functional versions
- DataLoader: An iterable object that allows us to iterate over the dataset
- torchvision: A vision focused library containing datasets, models, and transformations

In [1]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

Using the `FashionMNIST` dataset, in `torchvision.datasets`, create a training dataset object and a testing dataset object.

We can specify the following arguments:
- `root`: `str` The directory where the dataset is stored
- `train`: `bool` Whether this is the training or testing dataset
- `download`: `bool` Download if not found at `root`
- `transform`: `callable` Any preprocessing steps

In [2]:
# training dataset for FashionMNIST
training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor()
)

# test dataset for FashionMNIST
test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor()
)

Now let's create a dataloader, an iterable object that feeds data to the model in batches.

Before that, we must choose a batch size, the number of samples fed to the model at once. You can choose any number, but powers of 2 work best.

The `DataLoader` class takes the following arguments:
- `dataset`: The dataset object
- `batch_size`: The number of samples fed to the model at once

Let's take one batch from the test dataloader and see its shape.

In [3]:
# specify batch size
batch_size = 64

# instantiate train and test dataloaders
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)

# look at the first iterate of the test dataloader
for X, y in test_dataloader:
    N, C, H, W = X.shape
    print("Shape of X [N, C, H, W]: ", X.shape)
    print("Shape of y: ", y.shape, y.dtype)
    break

Shape of X [N, C, H, W]:  torch.Size([64, 1, 28, 28])
Shape of y:  torch.Size([64]) torch.int64


Now let's specify which device we'd like to use.

`torch.cuda.is_available()` returns `True` if a GPU is available, and `False` otherwise.

In [4]:
# specify device
device = (
    "cuda" if torch.cuda.is_available()
    else "cpu"
)
print(f"Using {device} device")

Using cpu device


Let's create the most basic network, a fully connected neural network with one hidden layer.

Recall:
- The images are 28 by 28
- The output is 10 classes
- We need to flatten from 28 x 28 to 784
- Place non-linearities between layers

In [5]:
# define the network
class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__() # call the parent class constructor (nn.Module)

        # network expects vectors, not 28 x 28 arrays
        self.flatten = nn.Flatten()

        # compose the linear and non-linear operations
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10)
        )

    def forward(self, x): # this is how the network consumes input 

        # turn a 28 x 28 image into a vector
        x = self.flatten(x)

        # pass the vector through the composed linear and non-linear operations
        logits = self.linear_relu_stack(x)
        return logits

# instantiate the network and move it to the device
model = NeuralNetwork().to(device)
print(model)

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


Let's interrogate this model and see its parameters.

In [6]:
# loop over model.parameters()
for param in model.parameters():
    print(param.shape)
    print(param.requires_grad)

torch.Size([512, 784])
True
torch.Size([512])
True
torch.Size([512, 512])
True
torch.Size([512])
True
torch.Size([10, 512])
True
torch.Size([10])
True


In [7]:
# loop over model.named_parameters()
for name, param in model.named_parameters():
    print(f"Layer: {name} | Size: {param.size()} | Values: {param[:2]} \n")

Layer: linear_relu_stack.0.weight | Size: torch.Size([512, 784]) | Values: tensor([[ 0.0014, -0.0177,  0.0307,  ...,  0.0232,  0.0247, -0.0006],
        [-0.0135, -0.0253,  0.0206,  ...,  0.0252,  0.0282,  0.0181]],
       grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.0.bias | Size: torch.Size([512]) | Values: tensor([ 0.0136, -0.0214], grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.2.weight | Size: torch.Size([512, 512]) | Values: tensor([[-0.0441,  0.0161,  0.0122,  ...,  0.0010,  0.0237, -0.0245],
        [ 0.0408,  0.0428, -0.0416,  ...,  0.0004, -0.0276,  0.0145]],
       grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.2.bias | Size: torch.Size([512]) | Values: tensor([ 0.0271, -0.0016], grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.4.weight | Size: torch.Size([10, 512]) | Values: tensor([[ 0.0267, -0.0047, -0.0070,  ..., -0.0234, -0.0414, -0.0113],
        [ 0.0029, -0.0154, -0.0190,  ...,  0.0032,  0.0071, -0.0119]],
       grad_fn=<SliceBackward0>) 

La

Choose a loss function and optimizer.

- Classification: `nn.CrossEntropyLoss()`
- Regression: `nn.MSELoss()`

In [8]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)

Next let's define the training loop.

- Set the model to training mode
- Iterate over the training dataset
- Make a prediction by passing the input to the model
- Calculate the loss
- Compute the gradients
- Update parameters
- Zero the gradients for next iteration

In [9]:
def train(dataloader, model, loss_fn, optimizer):
    # get size of dataset for pretty printing
    size = len(dataloader.dataset)

    # set model to training mode
    model.train()

    # loop over the dataloader
    for batch, (X, y) in enumerate(dataloader):
        # send the data to the device
        X, y = X.to(device), y.to(device)

        # Compute prediction error
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        # print periodic progress
        if batch % 100 == 0:
            loss, current = loss.item(), (batch + 1) * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

Now let's define the test loop.

- Set the model to evaluation mode
- Specify that we don't want to calculate gradients
- Iterate over the test dataset
- Make a prediction on the input
- Calculate and build the average loss

In [10]:
def test(dataloader, model, loss_fn):
    # get size of dataset for pretty printing
    size = len(dataloader.dataset)

    # get number of batches to find average loss over a batch
    num_batches = len(dataloader)

    # set model to evaluation mode
    model.eval()

    # initialize loss and prediction counts
    # without gradients, loop over dataloader
    # and accumulate loss and correct predictions
    # on the test dataset

    test_loss, correct = 0, 0
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

We have all the pieces, so we can now train and test the model.

In [15]:
# create the train & test loop! this is where things happen
epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train(train_dataloader, model, loss_fn, optimizer)
    test(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 1.163800  [   64/60000]
loss: 1.176977  [ 6464/60000]
loss: 1.015588  [12864/60000]
loss: 1.155159  [19264/60000]
loss: 1.033150  [25664/60000]
loss: 1.093275  [32064/60000]
loss: 1.132521  [38464/60000]
loss: 1.070952  [44864/60000]
loss: 1.129929  [51264/60000]
loss: 1.075571  [57664/60000]
Test Error: 
 Accuracy: 64.8%, Avg loss: 1.094951 

Epoch 2
-------------------------------
loss: 1.163800  [   64/60000]
loss: 1.176977  [ 6464/60000]
loss: 1.015588  [12864/60000]
loss: 1.155159  [19264/60000]
loss: 1.033150  [25664/60000]
loss: 1.093275  [32064/60000]
loss: 1.132521  [38464/60000]
loss: 1.070952  [44864/60000]
loss: 1.129929  [51264/60000]
loss: 1.075571  [57664/60000]
Test Error: 
 Accuracy: 64.8%, Avg loss: 1.094951 

Epoch 3
-------------------------------
loss: 1.163800  [   64/60000]
loss: 1.176977  [ 6464/60000]
loss: 1.015588  [12864/60000]
loss: 1.155159  [19264/60000]
loss: 1.033150  [25664/60000]
loss: 1.093275  [32064/600

We can save the model.

In [12]:
torch.save(model.state_dict(), "model.pth")
print("Saved PyTorch Model State to model.pth")

Saved PyTorch Model State to model.pth


And load the model.

In [13]:
model = NeuralNetwork().to(device)
model.load_state_dict(torch.load("model.pth", weights_only=True))

<All keys matched successfully>

In [14]:
classes = [
    "T-shirt/top",
    "Trouser",
    "Pullover",
    "Dress",
    "Coat",
    "Sandal",
    "Shirt",
    "Sneaker",
    "Bag",
    "Ankle boot",
]

model.eval()
x, y = test_data[0][0], test_data[0][1]
with torch.no_grad():
    x = x.to(device)
    pred = model(x)
    predicted, actual = classes[pred[0].argmax(0)], classes[y]
    print(f'Predicted: "{predicted}", Actual: "{actual}"')

Predicted: "Ankle boot", Actual: "Ankle boot"
