Inspired by the PyTorch tutorial on the official website, this is a simple example of a neural network that learns to predict the output of a simple function.


https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html

Importing the following libraries:

- torch: PyTorch, the ML library we're using
- torch.nn: Contains neural network classes and nn.Functional for functional versions
- DataLoader: An iterable object that allows us to iterate over the dataset
- torchvision: A vision focused library containing datasets, models, and transformations

In [7]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

Using the `FashionMNIST` dataset, in `torchvision.datasets`, create a training dataset object and a testing dataset object.

We can specify the following arguments:
- `root`: `str` The directory where the dataset is stored
- `train`: `bool` Whether this is the training or testing dataset
- `download`: `bool` Download if not found at `root`
- `transform`: `callable` Any preprocessing steps

In [8]:
# training dataset for FashionMNIST
"""
train_data = ...
"""
training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor()
)

# test dataset for FashionMNIST
"""
test_data = ...
"""

test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor()
)

Now let's create a dataloader, an iterable object that feeds data to the model in batches.

Before that, we must choose a batch size, the number of samples fed to the model at once. You can choose any number, but powers of 2 work best.

The `DataLoader` class takes the following arguments:
- `dataset`: The dataset object
- `batch_size`: The number of samples fed to the model at once

Let's take one batch from the test dataloader and see its shape.

In [9]:
# specify batch size
"""
batch_size = ...
"""

batch_size = 64
# instantiate train and test dataloaders
"""
train_dataloader = ...
test_dataloader = ...
"""

train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)

# look at the first iterate of the test dataloader
"""
for ... in ...:
    ...
    break

"""

for X, y in test_dataloader:
    N, C, H, W, = X.shape
    print("Shape of X: ", X.shape)
    print("Shape of y: ", y.shape, y.dtype)
    break

Shape of X:  torch.Size([64, 1, 28, 28])
Shape of y:  torch.Size([64]) torch.int64


Now let's specify which device we'd like to use.

`torch.cuda.is_available()` returns `True` if a GPU is available, and `False` otherwise.

In [10]:
# specify device
device = (
    "cuda" if torch.cuda.is_available()
    else "cpu"
)
print(f"Using {device} device")

Using cpu device


Let's create the most basic network, a fully connected neural network with one hidden layer.

Recall:
- The images are 28 by 28
- The output is 10 classes
- We need to flatten from 28 x 28 to 784
- Place non-linearities between layers

In [13]:
# define the network
class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        # call the parent class constructor (nn.Module)

        # network expects vectors, not 28 x 28 arrays
        """
        self.flatten = ...
        """
        self.flatten = nn.Flatten()

        # compose the linear and non-linear operations
        """
        self.linear_relu_stack = ...
        """

        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28,512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10)
        )

    def forward(self, x): # this is how the network consumes input 

        # turn a 28 x 28 image into a vector
        """
        x = ...
        """
        x = self.flatten(x) # 28 x 28 to a 1D 28*28 tensor
        logits = self.linear_relu_stack(x)

        # pass the vector through the composed linear and non-linear operations
        """
        logits = ...
        """
        return logits

# instantiate the network and move it to the device
model = NeuralNetwork().to(device)
print(model)

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


Let's interrogate this model and see its parameters.

In [None]:
# loop over model.parameters()
for param in model.parameters():
    """
    print(...)
    """

In [None]:
# loop over model.named_parameters()
for name, param in model.named_parameters():
    """
    print(...)
    """

Choose a loss function and optimizer.

- Classification: `nn.CrossEntropyLoss()`
- Regression: `nn.MSELoss()`

In [14]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)

Next let's define the training loop.

- Set the model to training mode
- Iterate over the training dataset
- Make a prediction by passing the input to the model
- Calculate the loss
- Compute the gradients
- Update parameters
- Zero the gradients for next iteration

In [15]:
def train(dataloader, model, loss_fn, optimizer):
    # get size of dataset for pretty printing
    """
    size = ...
    """
    size = len(dataloader.dataset)

    # set model to training mode
    """
    model.something()
    """
    model.train()

    # loop over the dataloader
    """
    for ... in ...:
        send data to device
        compute prediction error
        backpropagate
        print progress
    """

    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)

        pred = model(X)
        loss = loss_fn(pred, y)

        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        if batch % 100 == 0:
            loss, current = loss.item(), (batch + 1) * len(X)
            print(f"loss: {loss} [{current}/{size}]")
    

Now let's define the test loop.

- Set the model to evaluation mode
- Specify that we don't want to calculate gradients
- Iterate over the test dataset
- Make a prediction on the input
- Calculate and build the average loss

In [9]:
def test(dataloader, model, loss_fn):
    # get size of dataset for pretty printing
    """
    size = ...
    """
    # get number of batches to find average loss over a batch
    """
    num_batches = ...
    """

    # set model to evaluation mode
    """
    model.something()
    """

    # initialize loss and prediction counts
    # without gradients, loop over dataloader
    # and accumulate loss and correct predictions
    # on the test dataset
    
    """
    with ...:
        for ... in ...:
            send data to device
            compute prediction error
            accumulate loss and correct predictions
    
    calculate test error
    calculate accuracy
    print results
    """


We have all the pieces, so we can now train and test the model.

In [None]:
# create the train & test loop! this is where things happen
"""
epochs = ...
"""
epochs = 5
"""
for ... in ...:
    train(...)
    test(...)
"""
for t in range(epochs):
    train(train_dataloader, model, loss_fn, optimizer)

We can save the model.

In [None]:
"""
torch.save(..., ...)
"""

And load the model.

In [None]:
"""
model = ...
model.load_state_dict(..., ...)
"""

In [None]:
classes = [
    "T-shirt/top",
    "Trouser",
    "Pullover",
    "Dress",
    "Coat",
    "Sandal",
    "Shirt",
    "Sneaker",
    "Bag",
    "Ankle boot",
]

model.eval()
x, y = test_data[0][0], test_data[0][1]
with torch.no_grad():
    x = x.to(device)
    pred = model(x)
    predicted, actual = classes[pred[0].argmax(0)], classes[y]
    print(f'Predicted: "{predicted}", Actual: "{actual}"')