# Note
In the few upcoming notebooks we will understand the basic machine learning workflow which consists of:

    1. Working with Data - Tensorsm, Datasets, DataLoaders and Transforms

    2. Creating Models - Building the neural network, understanding the automatic differentiation using torch.autograd

    3. Optimizing Model Parameters

    4. Save and Load Model.

All the scripts will be written in .ipynb files and using pytorch api.

Dataset used: FashionMNIST
Tutorial Link: https://pytorch.org/tutorials/beginner/basics/intro.html

# 1. Working with data

PyTorch has two methods to work with data: 'torch.utils.data.DataLoader' and 'torch.utils.data.Dataset'. Dataset stores the samples and labels while DataLoader is an iterable object which wraps about the Dataset object.

In [2]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

Every TorchVision "Dataset" includes two arguments - "transform" and "target_transform" - to modify the samples and labels respectively

In [3]:
# Download training data from open datasets
training_data = datasets.FashionMNIST(
    root="./data",
    train=True,
    download=True,
    transform=ToTensor(),
)

# Download test data from open datasets
test_data = datasets.FashionMNIST(
    root="./data",
    train=False,
    download=True,
    transform=ToTensor(),
)

Now, we pass "Dataset" as an argument to "DataLoader". This wraps an iterable over our FashionMNIST dataset and supports the following operation:

    a) automatic batching 

    b) Sampling 

    c) shuffling 

    d) multiprocess data loading 

In [4]:
batch_size = 64

In [5]:
# Create data loaders
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)

for X, y in test_dataloader:
    print(f"Shape of X [N, C, H, W]: {X.shape}")
    print(f"Shape of y: {y.shape} {y.dtype}")
    break

Shape of X [N, C, H, W]: torch.Size([64, 1, 28, 28])
Shape of y: torch.Size([64]) torch.int64


# 2. Creating Models

In Pytorch, we define a neural network with a help of a class. The class will inherits from nn.Module and consists of two methods __inti__() and forward(). __init__() define the layers of the network. foward() specify how data wil pass through the network. 

To accelerate the processing in neural nets, we can use CUDA, MPS, MTIA or XPU, these are called accelerator. If nothing is specified then it will choose CPU.

In [6]:
device = torch.accelerator.current_accelerator().type if torch.accelerator.is_available() else "cpu"

print(f"Using  {device} device")

Using  cuda device


In [7]:
'''
__init__(): is a special method called 'constructor', it is automatically create when NeuralNetwork object is created. Here 
            we defined the layers of neural network

self - is a parameter and refers to the instance of the object (or current object)

super().__init__() - calls the method of the parent class (nn.Module). It's required to properly initialize everything from
                    nn.Module, like tracking layers and parameters

self.flatten - it is layer called flatten. nn.Flatten() converts a multi-dimension input (e.g. 28 x 28 image) into a 1-d vector.
               it is required because nn.Layer work with 1D inputs.

self.linear_relu_stack = nn.Sequential() - defined the sequences of layers in the neural net. nn.Sequential allows to combine
                                           multiple layers into a single block

nn.Linear(28 * 28, 512) - it is a fully connected layer flattened to 784 inputs and ouputs 512 features.
nn.ReLU() - is a activation function which introduces non-Linearity, it replaces negative values in output with zero.
nn.Linear(512, 512) and nn.ReLU() - fully connected layer with 512 inputs and 512 outputs followed ReLU activation function
nn.Linear(512, 10) -  final fully connected layer where 512 features are inputs and outputs 10 features. It is corresponds
                      to 10 possible classes
'''
# Define model
class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28 * 28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10)
        )
    '''
    def forward: defined forward pass of the network, i.e. how input data flows through networks to produce an output
    x - represents input data
    self.flatten - converts to 1D vector
    self.linear_relu_stack = flattened input passed through network defined earler and final outputs stored in logits.
    '''

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

In [8]:
model = NeuralNetwork().to(device)
print(model)

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


# 3. Optimizing the Model Parameters

to train a model we need a loss function and an optimizer.

In [9]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)

In a single training loop, the model makes predictions on the training dataset (fed to it in batches), and backpropagates the prediction error to adjust the model's parameters

In [10]:
def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)

        # Compute prediction error
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        if batch % 100 == 0:
            loss, current = loss.item(), (batch + 1) * len(X)
            print(f"loss: {loss:>7f} [{current:>5d}|{size:>5d}]")

In [12]:
# method for testing the trained model
def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0, 0
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>.8f} \n")


In a typical process, each batches of data goes through several iterations (epochs) in the neural network. The purpose here to make model learns by adjusting the parameters to make better predictions. To know whether model is making correct prediction is based on loss function values, the model aims is to reduce the loss value. We can observe the model accuracy and loss at each epoch and we would like to see the accuracy increase and the loss decrease with every epoch.

In [13]:
epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n -------------------------------------------------")
    train(train_dataloader, model, loss_fn, optimizer)
    test(test_dataloader, model, loss_fn)
print()

Epoch 1
 -------------------------------------------------
loss: 2.303051 [   64|60000]
loss: 2.295276 [ 6464|60000]
loss: 2.277853 [12864|60000]
loss: 2.276852 [19264|60000]
loss: 2.246889 [25664|60000]
loss: 2.219332 [32064|60000]
loss: 2.236468 [38464|60000]
loss: 2.195301 [44864|60000]
loss: 2.206052 [51264|60000]
loss: 2.164716 [57664|60000]
Test Error: 
 Accuracy: 36.0%, Avg loss: 2.16628930 

Epoch 2
 -------------------------------------------------
loss: 2.176670 [   64|60000]
loss: 2.173671 [ 6464|60000]
loss: 2.122136 [12864|60000]
loss: 2.136147 [19264|60000]
loss: 2.082084 [25664|60000]
loss: 2.025294 [32064|60000]
loss: 2.056125 [38464|60000]
loss: 1.978423 [44864|60000]
loss: 1.995018 [51264|60000]
loss: 1.908728 [57664|60000]
Test Error: 
 Accuracy: 52.3%, Avg loss: 1.91812415 

Epoch 3
 -------------------------------------------------
loss: 1.955384 [   64|60000]
loss: 1.931765 [ 6464|60000]
loss: 1.823907 [12864|60000]
loss: 1.847648 [19264|60000]
loss: 1.745743 [256

# 4. Saving Models

A common way to save a model is to serialize internal state dictionary containing the model parameters.

In [15]:
torch.save(model.state_dict(), "model.pth")
print("Saved Pytorch model state to model.pth")

Saved Pytorch model state to model.pth


# 5. Loading Models
The process for loading a model includes re-creating the model structure and loading the state dictionary into it.

In [16]:
model = NeuralNetwork().to(device)
model.load_state_dict(torch.load("model.pth", weights_only=True))

<All keys matched successfully>

In [18]:
# Prediction code
classes = [
    "T-shirt/top",
    "Trouser",
    "Pullover",
    "Dress",
    "Coat",
    "Sandal",
    "Shirt",
    "Sneaker",
    "Bag",
    "Ankle boot",
]
model.eval()
x, y = test_data[0][0], test_data[0][1]
with torch.no_grad():
    x = x.to(device)
    pred = model(x)
    predicted, actual = classes[pred[0].argmax(0)], classes[y]
    print(f'Predicted: "{predicted}", Actual: "{actual}"')

Predicted: "Ankle boot", Actual: "Ankle boot"
