## Importing the libraries

In [1]:
import torch
from torchvision import datasets
from torchvision.transforms import ToTensor
from torch import nn
from torch.utils.data import DataLoader


## Data Loading in TorchVision
Every TorchVision dataset includes two arguments in their constructors, transform and target_transform to modify the samples and labels respectively.
- transform: The transform function takes in a sample and returns a possibly transformed version of it. E.g, ```transforms.RandomCrop``` for images.
- target_transform: The target_transform function takes in the target and returns a transformed version. E.g, ```transforms.ToTensor``` for images.

In [2]:
## Training data
training_data = datasets.FashionMNIST(
    train=True,
    transform=ToTensor(),
    download=True, # if the dataset is already downloaded, pytorch uses the already downloaded data otherwise downloads
    root='data' # specifies which folder/ location to save the data
)

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to data\FashionMNIST\raw\train-images-idx3-ubyte.gz


100%|██████████| 26421880/26421880 [01:48<00:00, 243542.95it/s]


Extracting data\FashionMNIST\raw\train-images-idx3-ubyte.gz to data\FashionMNIST\raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to data\FashionMNIST\raw\train-labels-idx1-ubyte.gz


100%|██████████| 29515/29515 [00:00<00:00, 188397.24it/s]


Extracting data\FashionMNIST\raw\train-labels-idx1-ubyte.gz to data\FashionMNIST\raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to data\FashionMNIST\raw\t10k-images-idx3-ubyte.gz


100%|██████████| 4422102/4422102 [00:18<00:00, 233826.43it/s]


Extracting data\FashionMNIST\raw\t10k-images-idx3-ubyte.gz to data\FashionMNIST\raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to data\FashionMNIST\raw\t10k-labels-idx1-ubyte.gz


100%|██████████| 5148/5148 [00:00<00:00, 5153288.06it/s]

Extracting data\FashionMNIST\raw\t10k-labels-idx1-ubyte.gz to data\FashionMNIST\raw






In [3]:
## Test data
test_data = datasets.FashionMNIST(
    root='data' ,
    download=True,
    transform=ToTensor(),
    train=False
)

We pass the Dataset as an argument to DataLoader. This wraps an iterable over our dataset, and supports automatic batching, sampling, shuffling and multiprocess data loading. Here we define a batch size of 64, i.e. each element in the dataloader iterable will return a batch of 64 features and labels.

In [8]:
batch_size = 64
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)


# data shape
for X, y in train_dataloader:
    print(X.shape)
    print(y.shape)

    break

torch.Size([64, 1, 28, 28])
torch.Size([64])


## Creating Models
To define a neural network in PyTorch, we create a class that inherits from **nn.Module**. We define the layers of the network in the `__init__` function and specify how data will pass through the network in the forward function. To accelerate operations in the neural network, we move it to the GPU or MPS if available.

### Getting current device

In [9]:
# Get cpu, gpu or mps device for training.
device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps"
    if torch.backends.mps.is_available()
    else "cpu"
)
print(f"Using {device} device")

Using cpu device


## Model

In [10]:
class NeuralNetwork(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512), #takes 784 features and outputs 512 features, this is because each image in the dataset has a shape of 28 by 28 pixels -> this is the input layer
            nn.ReLU(), # initializes internal module state
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10) # this is the output layer, takes 512 input features and outputs 10 features -> this is because each image can be a number from 0-9
        )

    def forward(self, x): # defines how data moves forward in the network
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits
    
model = NeuralNetwork().to(device)
print(model)

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


## Optimizing the model parameters
To train a model, we need a loss function and an optimizer.

## Loss Function
It is a mathematical function that measures the difference between the predicted values (output) of the model and the actual target values (ground truth) for a given set of input data. The loss function quantifies how well or poorly the model is performing on a specific task, and it serves as the basis for training the neural network.

In [11]:
loss_fn = nn.CrossEntropyLoss()

## Optimizer
It is an optimization algorithm that is used to adjust the weights and biases of the model in order to minimize the loss function. Optimizers are responsible for reducing the errors and to make the model perform better. The optimizer is used in conjunction with the loss function to optimize the weights of the neurons in the network.

In [12]:
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)

## Training the model
In a single training loop, the model makes predictions on the training dataset (fed to it in batches), and backpropagates the prediction error to adjust the model’s parameters.

In [13]:
def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)

        # Compute prediction error
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        loss.backward() # computes the gradients of the model's parameters with respect to the loss. In other words, it calculates how much each parameter should be adjusted to minimize the loss. This step is a crucial part of the backpropagation algorithm. The gradients are stored internally in the PyTorch tensors associated with each parameter.
        optimizer.step() # After computing the gradients, this line applies an optimization algorithm to update the model's parameters. The optimizer adjusts the parameters in the direction that reduces the loss.
        optimizer.zero_grad() # This line resets the gradients of all model parameters to zero. It's essential to clear the gradients before the next iteration of the training loop to avoid accumulating gradients from previous iterations. Failing to do this would lead to incorrect parameter updates in subsequent iterations.

        if batch % 100 == 0:
            loss, current = loss.item(), (batch + 1) * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

We also check the model’s performance against the test dataset to ensure it is learning.

In [14]:
def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0, 0
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

## Train
The training process is conducted over several iterations (epochs). During each epoch, the model learns parameters to make better predictions. We print the model’s accuracy and loss at each epoch; we’d like to see the accuracy increase and the loss decrease with every epoch.

In [15]:
epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train(train_dataloader, model, loss_fn, optimizer)
    test(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.319875  [   64/60000]
loss: 2.297073  [ 6464/60000]
loss: 2.281264  [12864/60000]
loss: 2.265108  [19264/60000]
loss: 2.253100  [25664/60000]
loss: 2.235931  [32064/60000]
loss: 2.230354  [38464/60000]
loss: 2.202765  [44864/60000]
loss: 2.211284  [51264/60000]
loss: 2.166266  [57664/60000]
Test Error: 
 Accuracy: 55.1%, Avg loss: 2.162713 

Epoch 2
-------------------------------
loss: 2.183338  [   64/60000]
loss: 2.157350  [ 6464/60000]
loss: 2.109827  [12864/60000]
loss: 2.114559  [19264/60000]
loss: 2.069529  [25664/60000]
loss: 2.029360  [32064/60000]
loss: 2.040752  [38464/60000]
loss: 1.971407  [44864/60000]
loss: 1.987021  [51264/60000]
loss: 1.900869  [57664/60000]
Test Error: 
 Accuracy: 57.9%, Avg loss: 1.900298 

Epoch 3
-------------------------------
loss: 1.945298  [   64/60000]
loss: 1.893683  [ 6464/60000]
loss: 1.796674  [12864/60000]
loss: 1.821684  [19264/60000]
loss: 1.717047  [25664/60000]
loss: 1.688978  [32064/600

## Saving Models
A common way to save a model is to serialize the internal state dictionary (containing the model parameters).

In [16]:
torch.save(model.state_dict(), "model.pth")
print("Saved PyTorch Model State to model.pth")

Saved PyTorch Model State to model.pth


## Loading Models
The process for loading a model includes re-creating the model structure and loading the state dictionary into it.

In [17]:
model = NeuralNetwork().to(device)
model.load_state_dict(torch.load("model.pth"))

<All keys matched successfully>

## Making Predictions



In [28]:
classes = [
    "T-shirt/top",
    "Trouser",
    "Pullover",
    "Dress",
    "Coat",
    "Sandal",
    "Shirt",
    "Sneaker",
    "Bag",
    "Ankle boot",
]

model.eval()
x, y = test_data[1][0], test_data[1][1]
with torch.no_grad():
    x = x.to(device)
    pred = model(x)
    predicted, actual = classes[pred[0].argmax(0)], classes[y]
    print(f'Predicted: "{predicted}", Actual: "{actual}"')

Predicted: "Pullover", Actual: "Pullover"
