In [1]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

### Load Data - Part 1

In [2]:
# Download training data from open datasets.
training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor(),
)

# Download test data from open datasets.
test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor(),
)

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz


0.1%

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to data/FashionMNIST/raw/train-images-idx3-ubyte.gz


100.0%


Extracting data/FashionMNIST/raw/train-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw/train-labels-idx1-ubyte.gz


100.0%


Extracting data/FashionMNIST/raw/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz


100.0%


Extracting data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz


100.0%

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz
Extracting data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw






### Load Data - Part 2

Dataset: In PyTorch, a Dataset is an object that holds your data and is responsible for loading it from disk or another source. It's a collection of data samples, and each sample is usually a tuple (data, label).

DataLoader: The DataLoader takes a Dataset as input and makes it iterable, meaning that you can loop over it. It handles loading the data in batches, which is important because it's more efficient to process data in small groups rather than individually or all at once. This is especially true when training neural networks.

Batching: This refers to taking a subset of your dataset and processing it at one time. Here, batch_size is set to 64, which means the DataLoader will give you 64 samples each time you iterate over it. Each sample consists of features and a label.

Multiprocess Data Loading: This is an efficiency feature of DataLoader. It can use multiple processes to load data in parallel, which speeds up the process significantly, especially when dealing with large datasets.

The Loop: The for loop in your code is where you would typically put your model training code. Each iteration of the loop gives you a batch of data (X) and labels (Y). The X.shape and Y.shape are the dimensions of the data and labels tensor respectively.

X [N, C, H, W]: This is the shape of the data tensor. N is the batch size (64 in your case), C is the number of channels (1 for grayscale images, 3 for RGB color images), H is the height of the image, and W is the width of the image.
Y: This is the shape of the labels tensor. It's just [N] because there's one label per image.

The break statement stops the loop after the first batch, which is often used for testing to make sure that the data loading is working as expected without having to iterate over the entire dataset.

Each time you run this loop, the DataLoader will automatically handle the fetching and transforming of the data, allowing you to focus on implementing and training your model. This setup is fundamental when working with neural networks as it enables efficient and manageable data handling.


In the for loop you're referring to, X and y represent two different, but related, components of your dataset:

X is commonly used to denote the input features of your data. In the context of image processing, X would be the actual image data that your model will learn from. If you're dealing with a batch of images, X would be a tensor containing several images, each represented as a grid of pixel values.

y is commonly used to denote the labels or targets associated with your input data. For supervised learning, where the goal is to predict a label given some input, y would contain the correct answers or the ground truth for each input sample in X. For instance, if you're working on a digit classification task, y would contain the actual digit (0 through 9) that each image in X represents.

The DataLoader combines your dataset's input features and labels into batches. When you iterate over the DataLoader, it yields pairs of (X, y) for each batch. This is a tuple where the first element is the batch of input features and the second element is the corresponding batch of labels.

Summary:

We pass the Dataset as an argument to DataLoader. This wraps an iterable over our dataset, and supports automatic batching, sampling, shuffling and multiprocess data loading. Here we define a batch size of 64, i.e. each element in the dataloader iterable will return a batch of 64 features and labels.



In [3]:
batch_size = 64

# Create data loaders
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)

for X, y in test_dataloader:
    print(f"Shape of X [N, C, H, W]: {X.shape}")
    print(f"Shape of y: {y.shape} {y.dtype}")
    break

Shape of X [N, C, H, W]: torch.Size([64, 1, 28, 28])
Shape of y: torch.Size([64]) torch.int64


### Creating Models

Device Setup: Before defining a model, the code snippet is setting up the device on which the computations will run. PyTorch allows you to run your computations on a Graphics Processing Unit (GPU), which can greatly accelerate the training of deep learning models. The code checks if CUDA (NVIDIA's GPU computing API) is available. If it is, it will use the GPU; otherwise, it falls back to the CPU. More recently, PyTorch has added support for Apple's Metal Performance Shaders (MPS) to run on Apple Silicon (M1/M2 chips).

In [4]:
device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps"
    if torch.backends.mps.is_available()
    else "cpu"
)

print(f"using device {device}")

using device cpu


In PyTorch, defining a model involves creating a class that inherits from nn.Module. This class framework provides a lot of built-in functionality that is necessary for neural networks, such as methods for moving the model to different devices (CPU or GPU), methods to set up the weights, and more.

Here's how the class concepts apply to your PyTorch model:

The NeuralNetwork class inherits from nn.Module, which means it gets all the functionality of nn.Module on top of what you define.
__init__ is the constructor where you define the layers of your neural network.
forward is a special method that defines the forward pass of your model. When you pass an input through your model, PyTorch will automatically call this method.
By defining your neural network as a class, you're able to create instances of this network that have their own weights and biases, which can be trained independently of other instances.

To define a neural network in PyTorch, we create a class that inherits from nn.Module. We define the layers of the network in the __init__ function and specify how data will pass through the network in the forward function. To accelerate operations in the neural network, we move it to the GPU or MPS if available.



##### Below Code explained step-by-step

class NeuralNetwork(nn.Module):
This line defines a new class called NeuralNetwork, which inherits from nn.Module. In PyTorch, nn.Module is a base class for all neural network modules, and your custom models should also subclass this. This inheritance gives your model access to a lot of functionality and utilities provided by PyTorch.

def __init__(self):
This is the initializer for your NeuralNetwork class. It is called when you create an instance of the NeuralNetwork.

super().__init__()
This line calls the initializer of the base class (nn.Module). This is necessary to properly set up the internals of the module.

self.flatten = nn.Flatten()
Here, you're defining a layer that will flatten the input. nn.Flatten() is a PyTorch layer that collapses all dimensions of the input except the batch dimension. It's commonly used to convert 2D images (like in the MNIST dataset) into a 1D tensor before feeding it into a fully connected layer.

self.linear_relu_stack = nn.Sequential(
This line starts the definition of a sequence of layers. nn.Sequential is a container that will process the input sequentially, passing it through each layer in the order they are added.

nn.Linear(28*28, 512),
This is the first layer in your sequence. nn.Linear is a fully connected linear layer. Here, it takes an input size of 28*28 (which is the size of a flattened MNIST image) and outputs a tensor of size 512.

nn.ReLU(),
This adds a Rectified Linear Unit (ReLU) activation function. Activation functions like ReLU are used to introduce non-linearities into the network, which are crucial for learning complex patterns.

nn.Linear(512, 512),
Another linear layer that takes the 512 inputs from the previous layer and outputs another 512.

nn.ReLU(),
Another ReLU activation function.

nn.Linear(512, 10)
The final linear layer that reduces the tensor from 512 to 10, which is the number of classes in a typical classification task like MNIST (digits 0-9).

def forward(self, x):
This method defines the forward pass of your network. It takes an input tensor x.

x = self.flatten(x)
The input x is passed through the flatten layer defined earlier.

logits = self.linear_relu_stack(x)
Then, the flattened x is passed through the sequence of linear and ReLU layers.

return logits
The output logits (unnormalized probability distributions) is returned.

model = NeuralNetwork().to(device)
Here, an instance of your NeuralNetwork is created and moved to the specified device (like a CPU or GPU).

print(model)
Finally, the model's structure is printed. This is useful for debugging and understanding the architecture of your model.

This is a typical structure for a simple feedforward neural network in PyTorch. It's designed to process input data, apply a series of transformations and activations, and produce output that can be used for tasks like classification.

In [5]:
# Define model
class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10)
        )
    
    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNetwork().to(device)
print(model)

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


### Optimise Model Parameters

To train a model, we need a loss function and an optimizer.

In this section of the PyTorch tutorial, you're setting up the components needed for training the neural network: the loss function and the optimizer. Let's break down these two lines:

loss_fn = nn.CrossEntropyLoss()

This line initializes the loss function that you will use to evaluate how well your model is performing during training. In PyTorch, nn.CrossEntropyLoss is a commonly used loss function for classification tasks.

Cross-Entropy Loss: This loss function measures the difference between two probability distributions - the actual labels and the predictions from the model. It's particularly suitable for problems like multi-class classification (like digit classification in MNIST).

Working of Cross-Entropy Loss: It calculates the loss by considering the model's output as probabilities (using softmax) and comparing them with the true distribution (the actual labels). The goal of training is to minimize this loss, bringing the model's predictions closer to the actual labels.
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)

Here, you're defining the optimizer, which is the algorithm used to update the model's parameters (weights and biases) during training based on the gradients computed during backpropagation.

Stochastic Gradient Descent (SGD): torch.optim.SGD stands for Stochastic Gradient Descent. It's a very basic yet powerful optimization algorithm used in training neural networks.

Parameters: model.parameters() passes all the trainable parameters of your model to the optimizer. These are the variables that will be adjusted to minimize the loss function.

Learning Rate (lr): 1e-3 or 0.001 is the learning rate. It determines the step size at each iteration while moving towards a minimum of the loss function. A smaller learning rate requires more training epochs through the training dataset, whereas a larger learning rate results in rapid changes and requires fewer training epochs.

The choice of loss function and optimizer, and their parameters like learning rate, are crucial in determining how effectively your model learns from the training data. Fine-tuning these can significantly impact the performance of your neural network.

In [6]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)

In a single training loop, the model makes predictions on the training dataset (fed to it in batches), and backpropagates the prediction error to adjust the model’s parameters.

This code snippet defines a function for training the neural network. It's a crucial part of the learning process where the model's parameters are updated based on the data. 

step by step:

def train(dataloader, model, loss_fn, optimizer):

This line defines the train function with four parameters:

dataloader: This provides batches of training data.
model: The neural network model that you're training.
loss_fn: The loss function used to evaluate the model's performance.
optimizer: The optimization algorithm used to update model parameters.
size = len(dataloader.dataset)

This calculates the total number of samples in the dataset.

model.train()

Puts the model in training mode. This is important because some models may behave differently during training than during testing (e.g., dropout is used during training but not during testing).

for batch, (X, y) in enumerate(dataloader):

This loop iterates over the dataloader. In each iteration, it provides a batch of data (X) and the corresponding labels (y).

X, y = X.to(device), y.to(device)

This line moves the data and labels to the device (like a CPU or GPU), which you specified earlier.

pred = model(X)

Here, the model makes predictions based on the batch of data X.

loss = loss_fn(pred, y)

The loss function calculates the loss, comparing the predictions pred to the actual labels y.

loss.backward()

This line performs backpropagation. It computes the gradient of the loss with respect to all the weights in the model.

optimizer.step()

This line updates the weights of the model using the gradients computed by loss.backward().

optimizer.zero_grad()

Resets the gradients of the model parameters. It's important to clear them before the next batch otherwise the gradients will accumulate.

if batch % 100 == 0:
This conditional statement is for logging. It prints the current loss and the number of samples processed so far after every 100 batches.

loss, current = loss.item(), (batch + 1) * len(X)
This line extracts the loss value as a Python float and calculates the number of samples processed so far.

print(f"loss: {loss:>7f} [{current:>5d}/{size:>5d}]")
Finally, this line prints the loss and the progress (number of samples processed out of the total).

In summary, this function iteratively updates the model's parameters to minimize the loss function, thereby training the model on the dataset. The training process involves making predictions, calculating the loss, performing backpropagation to compute gradients, and then updating the model parameters.

In [7]:
def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)

        # Compute prediction error
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        if batch % 100 == 0:
            loss, current = loss.item(), (batch + 1) * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

We also check the model’s performance against the test dataset to ensure it is learning.

This code snippet defines a function for evaluating the performance of your neural network on a test dataset. Testing on a separate dataset that wasn't used during training is crucial for assessing how well your model generalizes to new, unseen data. 

step by step:

def test(dataloader, model, loss_fn):

This line defines the test function with three parameters:

dataloader: This provides batches of test data.
model: The neural network model that you're evaluating.
loss_fn: The loss function used to evaluate the model's performance.
size = len(dataloader.dataset)

This calculates the total number of samples in the test dataset.

num_batches = len(dataloader)

This calculates the total number of batches in the dataloader.

model.eval()

Sets the model to evaluation mode. This is important for certain types of layers (like dropout layers or batch normalization layers) that behave differently during training and testing.

test_loss, correct = 0, 0

Initializes variables to track the total test loss and the number of correct predictions.

with torch.no_grad():

This context manager tells PyTorch that gradient computation is not needed in this block. This is because during testing, we don't need to update the weights of the model, and thus, don't need the gradients.

for X, y in dataloader:

Iterates over the test dataloader. In each iteration, it provides a batch of data (X) and the corresponding labels (y).

X, y = X.to(device), y.to(device)

Moves the data and labels to the specified device.

pred = model(X)

The model makes predictions based on the batch of data X.

test_loss += loss_fn(pred, y).item()

Adds up the loss for each batch. .item() converts the loss tensor to a Python number.

correct += (pred.argmax(1) == y).type(torch.float).sum().item()

Counts the number of correct predictions. pred.argmax(1) gets the index of the highest value in each prediction (which represents the predicted class). This is compared to the actual labels y. The sum of correct predictions is accumulated.

test_loss /= num_batches

Calculates the average test loss by dividing the total loss by the number of batches.

correct /= size

Calculates the accuracy by dividing the number of correct predictions by the total number of samples.

print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

Prints out the test accuracy and the average test loss.

This function is a standard way to evaluate the performance of a neural network. It provides a good indication of how well the model is likely to perform on unseen data, which is a critical aspect of building reliable machine learning models.

In [8]:
def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0, 0
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

The training process is conducted over several iterations (epochs). During each epoch, the model learns parameters to make better predictions. We print the model’s accuracy and loss at each epoch; we’d like to see the accuracy increase and the loss decrease with every epoch.

step-by-step:

epochs = 5

This sets the number of epochs to 5. You will train and test your model five times over the entire dataset.

for t in range(epochs):

This is a loop that will iterate five times (once for each epoch).

print(f"Epoch {t+1}\n-------------------------------")

This prints the current epoch number. t+1 is used because t starts from 0, so t+1 gives you the human-readable epoch number (starting from 1).

train(train_dataloader, model, loss_fn, optimizer)

This calls the train function defined earlier. It will train the model using the training dataset (train_dataloader) and the specified loss function and optimizer. The training process involves passing the training data through the model, calculating the loss, performing backpropagation to update the model's weights based on this loss, and iterating over the entire training dataset.

test(test_dataloader, model, loss_fn)

After each training epoch, this line calls the test function. The test function evaluates the model's performance on the test dataset (test_dataloader). This is critical for understanding how well your model is learning and generalizing to new data.

print("Done!")

This prints "Done!" after all epochs are completed, indicating the end of the training and testing process.

The purpose of iterating over multiple epochs is to allow the model to learn from the dataset effectively. In each epoch, the model has an opportunity to adjust its weights and biases to improve its predictions. The test after each epoch gives you insights into whether the model is improving, overfitting (performing well on training data but poorly on test data), or underfitting (performing poorly on both).

The number of epochs, 5 in this case, is a hyperparameter that you can adjust. The optimal number of epochs varies depending on the specific dataset and model architecture. Too few epochs might underfit the model, while too many might lead to overfitting.

In [9]:
epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train(train_dataloader, model, loss_fn, optimizer)
    test(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.303594  [   64/60000]
loss: 2.292017  [ 6464/60000]
loss: 2.270508  [12864/60000]
loss: 2.264066  [19264/60000]
loss: 2.245029  [25664/60000]
loss: 2.222055  [32064/60000]
loss: 2.232006  [38464/60000]
loss: 2.197491  [44864/60000]
loss: 2.194942  [51264/60000]
loss: 2.181595  [57664/60000]
Test Error: 
 Accuracy: 52.6%, Avg loss: 2.160689 

Epoch 2
-------------------------------
loss: 2.169685  [   64/60000]
loss: 2.164948  [ 6464/60000]
loss: 2.098836  [12864/60000]
loss: 2.116326  [19264/60000]
loss: 2.070799  [25664/60000]
loss: 2.015916  [32064/60000]
loss: 2.047490  [38464/60000]
loss: 1.967980  [44864/60000]
loss: 1.968687  [51264/60000]
loss: 1.924091  [57664/60000]
Test Error: 
 Accuracy: 59.2%, Avg loss: 1.898909 

Epoch 3
-------------------------------
loss: 1.928288  [   64/60000]
loss: 1.908277  [ 6464/60000]
loss: 1.775222  [12864/60000]
loss: 1.819747  [19264/60000]
loss: 1.723400  [25664/60000]
loss: 1.674419  [32064/600

### Saving Models

This code is about saving the trained state of your PyTorch model. Saving a model in PyTorch is a crucial step as it allows you to load the model later for further training, evaluation, or to make predictions on new data. Let's break it down:

torch.save(model.state_dict(), "model.pth")

torch.save: This is a PyTorch function used for serializing Python objects. In the context of PyTorch models, it's typically used to save the model's parameters.
model.state_dict(): This is a PyTorch function that returns a Python dictionary containing all the weights and biases of the model. The state dict is a snapshot of the model's parameters at a particular point in time.
"model.pth": This is the filename for the saved model. The .pth extension is a convention used for PyTorch model files, but you could use any other filename or extension if you prefer.
print("Saved PyTorch Model State to model.pth")

This line simply prints a confirmation message indicating that the model has been saved successfully.

When you save a model using state_dict, you're only saving the parameters of the model, not the entire model or its architecture. This is efficient because it reduces the size of the saved model file. However, when you want to load the model for further use, you'll need to recreate the model architecture in code and load the saved parameters into it.

In [10]:
torch.save(model.state_dict(), "model.pth")
print("Saved PyTorch Model State to model.pth")

Saved PyTorch Model State to model.pth


### Loading Models

In [11]:
model = NeuralNetwork().to(device)
model.load_state_dict(torch.load("model.pth"))

<All keys matched successfully>

This model can now be used to make predictions.

This final code snippet demonstrates how to use your trained PyTorch model to make a prediction on a single data sample from the test dataset. It also includes a nice way of presenting the model's prediction and the actual label for human readability. Let's go through it:

classes = [...]

This list defines the classes for your classification problem. Each class corresponds to a label that the model can predict. These labels represent different types of clothing items.

model.eval()

Sets the model to evaluation mode. This is important as certain layers in your model might behave differently during training (like dropout layers), and you want to evaluate the model in its testing configuration.

x, y = test_data[0][0], test_data[0][1]

This line retrieves the first sample ([0]) from your test dataset. x is the image data, and y is the corresponding label.

with torch.no_grad():

Temporarily sets all the gradients to zero. This is important because during inference (making predictions), you typically don't want to perform backpropagation, and hence you don't need gradients.

x = x.to(device)

Moves the input data x to the specified device (CPU or GPU), making it ready for the model to process.

pred = model(x)

Feeds the input data x through the model to get a prediction. The output pred is a tensor of class probabilities.

predicted, actual = classes[pred[0].argmax(0)], classes[y]

pred[0].argmax(0): This finds the index of the highest value in the predictions tensor, which corresponds to the most likely class predicted by the model.
classes[...]: It then uses this index to find the corresponding class label in the classes list.
classes[y]: Retrieves the actual label for the data sample.
print(f'Predicted: "{predicted}", Actual: "{actual}"')

Finally, it prints out the model's predicted class and the actual class for the data sample. This is useful for visually inspecting how well your model is performing on individual examples.

This snippet is often used to quickly test a model's prediction on a specific sample or to showcase the model's capabilities. It provides a clear and concise way to compare the model's prediction against the actual label.

In [12]:
classes = [
    "T-shirt/top",
    "Trouser",
    "Pullover",
    "Dress",
    "Coat",
    "Sandal",
    "Shirt",
    "Sneaker",
    "Bag",
    "Ankle boot",
]

model.eval()
x, y = test_data[0][0], test_data[0][1]
with torch.no_grad():
    x = x.to(device)
    pred = model(x)
    predicted, actual = classes[pred[0].argmax(0)], classes[y]
    print(f'Predicted: "{predicted}", Actual: "{actual}"')

Predicted: "Ankle boot", Actual: "Ankle boot"
