# Quick Start

This work is part of [PyTorch's tutorial library](https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html). Click the link for more elaboration.

# Working with Data

Pytorch has two primitives used to work with data.

1. `torch.utils.data.Dataset`
    - Stores samples and their corresponding labels
2. `torch.utils.data.DataLoader`
    - Wraps an iterable around `Dataset`

PyTorch offers domain specific libraries such as TorchText, TorchVision and TorchAudio. For this tutorial, we will be using TorchVision.

In [1]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

Each library also has its own datasets that we can utilize to train our models. The torchvision library features many popular datasets such as CIFAR, COCO, [etc](https://pytorch.org/vision/stable/datasets.html), and today we will use the FashionMNIST dataset. Let's grab that data set now.

In [3]:
# grab the data to train our model on
training_data = datasets.FashionMNIST(
    root='data',
    train=True,
    download=False,
    transform=ToTensor()
)

# grab the data to test our model against
test_data = datasets.FashionMNIST(
    root='data',
    train=False,
    download=False,
    transform=ToTensor()
)

Now we can pass our dataset into our dataloader. This will wrap an iterable around the dataset, which supports automatic batching, sampling, shuffling, and multiprocess data loading. The batch size is the amount of features and labels for each element in the dataloader iterable.

In [4]:
batch_size = 64

# create data loaders
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)

for X, y in test_dataloader:
    print(f'Shape of X [N, C, H, W]: {X.shape}')
    print(f'Shape of y: {y.shape} {y.dtype}')
    break

Shape of X [N, C, H, W]: torch.Size([64, 1, 28, 28])
Shape of y: torch.Size([64]) torch.int64


# Creating Models

To create a class in PyTorch, we'll create a class that inherits from `nn.Module`. We can define the layers in the `__init__` function and specify how data will pass through in the `forward` function. To accelerate operations in the neural network, we can use gpu or mps if it is available.

In [5]:
# get cpu, gpu, or mps device for training
device = (
    'cuda'
    if torch.cuda.is_available()
    else 'mps'
    if torch.backends.mps.is_available()
    else 'cpu'
)
print(f'Using {device} device')

# define model
class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10)
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits
    
model = NeuralNetwork().to(device)
print(model)

Using cpu device
NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


# Optimizing Model Parameters

To train a model, we need a loss function and an optimizer

In [6]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)

In a single training loop, the model makes predictions on the training dataset (fed to it in batches), and back propogates the prediction error to adjust the model's parameters

In [7]:
def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    model.train()

    for batch, (X, y), in enumerate(dataloader):
        X, y = X.to(device), y.to(device)

        # Compute prediction error
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        if batch % 100 == 0:
            loss, current = loss.item(), (batch + 1) * len(X)
            print(f'loss: {loss:>7f} [{current:>5d} / {size:>5d}]')

We also check the model's performance against the test dataset to ensure it is learning

In [8]:
def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0, 0
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
        test_loss /= num_batches
        correct /= size
        print(f'Test Error: \n Accuracy: {(100 * correct):>0.1f}%, Avg loss: {test_loss:>8f} \n')

The training process is carried out over several iterations (epochs). During each epoch, the model learns parameters to make better predictions. We print the model's accuracy and loss at each epoch; we'd like to see the accuracy increase and the loss decrease with every epoch.

In [9]:
epochs = 5
for t in range(epochs):
    print(f'Epoch: {t+1}\n----------------------')
    train(train_dataloader, model, loss_fn, optimizer)
    test(test_dataloader, model, loss_fn)
print('Done!')

Epoch: 1
----------------------
loss: 2.300294 [   64 / 60000]
loss: 2.290239 [ 6464 / 60000]
loss: 2.274131 [12864 / 60000]
loss: 2.274196 [19264 / 60000]
loss: 2.251212 [25664 / 60000]
loss: 2.220516 [32064 / 60000]
loss: 2.229866 [38464 / 60000]
loss: 2.192703 [44864 / 60000]
loss: 2.193669 [51264 / 60000]
loss: 2.169819 [57664 / 60000]
Test Error: 
 Accuracy: 45.7%, Avg loss: 2.161454 

Epoch: 2
----------------------
loss: 2.169209 [   64 / 60000]
loss: 2.159018 [ 6464 / 60000]
loss: 2.102790 [12864 / 60000]
loss: 2.127158 [19264 / 60000]
loss: 2.073288 [25664 / 60000]
loss: 2.014630 [32064 / 60000]
loss: 2.041771 [38464 / 60000]
loss: 1.962308 [44864 / 60000]
loss: 1.973387 [51264 / 60000]
loss: 1.915938 [57664 / 60000]
Test Error: 
 Accuracy: 56.3%, Avg loss: 1.905005 

Epoch: 3
----------------------
loss: 1.936152 [   64 / 60000]
loss: 1.905844 [ 6464 / 60000]
loss: 1.787696 [12864 / 60000]
loss: 1.833627 [19264 / 60000]
loss: 1.731061 [25664 / 60000]
loss: 1.676755 [32064 / 6

# Saving Models

A common way to save a model is to serialize the internal state dictionary (containing the model parameters)

In [10]:
torch.save(model.state_dict(), 'model.pth')
print('Saved PyTorch Model State to model.pth')

Saved PyTorch Model State to model.pth


# Loading Models

The process for loading a model includes recreating the model structure and loading the state dictionary into it

In [11]:
model = NeuralNetwork().to(device)
model.load_state_dict(torch.load('model.pth'))

<All keys matched successfully>

This model can now be used to make predictions.

In [17]:
classes = [
    "T-shirt/top",
    "Trouser",
    "Pullover",
    "Dress",
    "Coat",
    "Sandal",
    "Shirt",
    "Sneaker",
    "Bag",
    "Ankle boot",
]

model.eval()
item = 0
x, y = test_data[item][0], test_data[item][1]
with torch.no_grad():
    x = x.to(device)
    pred = model(x)
    predicted, actual = classes[pred[0].argmax(0)], classes[y]
    print(f'Predicted: "{predicted}", Actual: "{actual}"')

Predicted: "Ankle boot", Actual: "Ankle boot"
