# Basic PyTorch Concepts in Practice: Building an MNIST CNN Model

## Introduction to the Lecture

**Purpose of this Lecture:**
This lecture aims to introduce you to the fundamental concepts of PyTorch, a powerful open-source machine learning library. We'lldo this by working through a practical example: building an image classification model. By the end of this session, you'll have a foundational understanding of how to define, train, and test a simple neural network using PyTorch.

**The MNIST Example: What Are We Doing?**
We will be working with the **MNIST dataset** (pronounced "em-nist"). This is a famous dataset in the machine learning world, consisting of 70,000 grayscale images of handwritten digits, from 0 to 9 (60,000 for training and 10,000 for testing). Each image is small, 28x28 pixels.
Our goal is to build a **Convolutional Neural Network (CNN)** that can look at one of these handwritten digit images and correctly predict which digit it is (0, 1, 2, ..., 9).

**Why This Example?**
*   **MNIST - A Classic Starting Point:** MNIST is often called the "hello world" of image classification. Its simplicity and small size make it an ideal dataset for beginners to learn the basics without getting bogged down by large, complex data.
*   **CNNs - Fundamental for Images:** Convolutional Neural Networks are a cornerstone of modern computer vision. Understanding their basic structure is crucial for anyone interested in working with image data. This example provides a gentle introduction.
*   **PyTorch - Flexible and Pythonic:** PyTorch is widely used in both research and industry. It's known for its Python-friendly interface, dynamic computation graphs (which offer flexibility), and strong GPU acceleration. Learning PyTorch is a valuable skill.

**Learning Outcomes:**
By completing this exercise, you will learn to:
1.  Load and prepare image datasets (like MNIST) specifically for use in PyTorch.
2.  Understand and use **Tensors**, the core data structure in PyTorch.
3.  Define a simple Convolutional Neural Network (CNN) architecture using PyTorch's `nn.Module`.
4.  Understand the roles of an **optimizer** (like Adam) and a **loss function** (like Cross-Entropy Loss) in training a neural network.
5.  Implement the basic training loop:
    *   **Forward Pass:** Getting predictions from the model.
    *   **Loss Calculation:** Measuring how good/bad the predictions are.
    *   **Backward Pass (Backpropagation):** Calculating gradients to understand how to improve the model.
    *   **Optimizer Step:** Updating the model's parameters (weights) to make it better.
6.  Evaluate your model's performance on unseen test data to see how well it has learned.

**Brief Explanation of Key Concepts (for Absolute Beginners):**
Don't worry if these terms are new; we'll see them in action!
*   **Neural Network (NN):** Imagine a computer system that learns from examples, much like a human brain. It's made of interconnected units called "neurons" that work together to process information and make decisions or predictions.
*   **Convolutional Neural Network (CNN):** A special type of neural network that's really good at understanding images. It uses "convolutional" layers that act like sets of learnable filters, sliding across images to detect patterns like edges, textures, and shapes.
*   **Tensor:** In PyTorch (and other deep learning frameworks), a tensor is the main way we store and manipulate data. Think of it as a multi-dimensional array or grid of numbers. A 1D tensor is a vector, a 2D tensor is a matrix, and you can have 3D, 4D, or even higher-dimensional tensors (e.g., a batch of color images might be a 4D tensor: batch_size x height x width x color_channels).
*   **Training:** This is the process of "teaching" the neural network. We show it lots of example images and their correct labels (e.g., this image is a "7"). The network makes a prediction, we see how wrong it is, and then we slightly adjust its internal settings (called "weights") to make it more accurate next time.
*   **Epoch:** One complete round of showing the *entire* training dataset to the neural network. We usually train for multiple epochs.
*   **Batch:** Because training datasets can be very large, we often break them into smaller chunks called batches. The model processes one batch at a time within an epoch.
*   **Loss Function (or Criterion):** A mathematical function that measures how "wrong" the model's predictions are compared to the actual correct answers (labels). A high loss means the model is doing poorly; a low loss means it's doing well. The main goal of training is to minimize this loss.
*   **Optimizer:** An algorithm that helps the neural network adjust its internal weights to reduce the loss. It uses the information from the loss function to decide how to change the weights to make better predictions. Adam, SGD (Stochastic Gradient Descent), and RMSprop are common optimizers.
*   **Activation Function (e.g., ReLU):** A function applied to the output of neurons within the network. They introduce non-linearity, which is crucial for the network to learn complex relationships in the data. ReLU (Rectified Linear Unit) is a very common one; it basically outputs the input if it's positive, and zero otherwise.
*   **Softmax:** Often used as the last activation function in a classification model. It takes a vector of raw scores (logits) from the network and converts them into a vector of probabilities, where each probability represents how likely the input image is to belong to a particular class (e.g., 70% chance it's a '2', 10% it's a '7', etc.). All probabilities sum up to 1.
---

## 1. Load Libraries

This first code block is dedicated to importing all the necessary Python libraries and modules that we'll use throughout this notebook. Libraries are collections of pre-written code that provide useful functions and tools, so we don't have to write everything from scratch.

Here's a brief overview of the key libraries we're importing:

*   **`torch`**: This is the main PyTorch library. It provides the core functionalities like tensor operations (the fundamental data structures for PyTorch) and automatic differentiation (which is key for training neural networks).
*   **`torch.nn`**: This submodule of PyTorch contains the building blocks for constructing neural networks, such_as layers (convolutional, linear, etc.), activation functions, and loss functions. `nn` stands for Neural Network.
*   **`torch.nn.functional` (often imported as `F`)**: This module contains functions that are used in building neural networks, like activation functions (e.g., ReLU) and pooling operations. It's often used for functions that don't have learnable parameters.
*   **`torchvision`**: This library is part of the PyTorch ecosystem and provides access to popular datasets, pre-trained model architectures, and common image transformations for computer vision tasks.
*   **`torchvision.transforms`**: A submodule of `torchvision` that provides tools for pre-processing image data, such as converting images to tensors, normalizing them, or applying data augmentation techniques.
*   **`torchvision.datasets`**: This submodule makes it easy to download and use standard datasets like MNIST, CIFAR10, ImageNet, etc.
*   **`pandas` (as `pd`)**: A powerful library for data manipulation and analysis, particularly useful for working with tabular data (though not heavily used in this specific image-focused notebook).
*   **`numpy` (as `np`)**: A fundamental package for numerical computation in Python. PyTorch tensors can be easily converted to and from NumPy arrays.
*   **`torch.utils.data.Dataset` and `torch.utils.data.DataLoader`**: These are PyTorch utilities that help in creating custom datasets and loading data efficiently in batches during model training and evaluation.
*   **`sklearn.metrics.recall_score`**: A function from scikit-learn, a comprehensive machine learning library, to calculate the recall score. While not directly used in the main training loop here, it's imported and could be used for more detailed evaluation.
*   **`matplotlib.pyplot` (as `plt`)**: A widely used plotting library in Python. We'll use it to display images from our dataset.
*   **`joblib`**: A library for lightweight pipelining in Python. It can be useful for saving and loading Python objects, including trained models or data.
*   **`tqdm`**: A library that provides a simple and effective way to add progress bars to loops, which is very helpful for monitoring the progress of time-consuming tasks like training neural networks.
*   **`os`**: A standard Python library for interacting with the operating system, for example, to manage files and directories.
*   **`random`**: A standard Python library for generating random numbers, which can be useful for various tasks like shuffling data or initializing parameters.

The lines `%reload_ext autoreload`, `%autoreload 2`, and `%matplotlib inline` are "magic commands" often used in Jupyter Notebooks:
*   `%reload_ext autoreload` and `%autoreload 2`: These commands automatically reload modules before executing code. This is useful if you're editing external Python scripts and want the changes to be reflected in the notebook without restarting the kernel.
*   `%matplotlib inline`: This command ensures that plots generated by `matplotlib` are displayed directly within the notebook interface.
---

## 1. Load Libraries

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
from torchvision import datasets

import pandas as pd
import numpy as np
from torch.utils.data import Dataset, DataLoader
from sklearn.metrics import recall_score
import matplotlib.pyplot as plt
import joblib
from tqdm import tqdm
import os
import random

%reload_ext autoreload
%autoreload 2
%matplotlib inline

## 2. Read / Import Data & Initial Setup

In this section, we'll set up some basic parameters and then download and prepare our dataset.

*   **`BATCH_SIZE = 32` (or 64, 128):**
    *   When training neural networks, we usually don't feed the entire dataset to the model at once. Instead, we divide it into smaller, manageable chunks called **batches**.
    *   `BATCH_SIZE` defines how many samples (images, in our case) are included in each batch.
    *   **Why use batches?**
        *   **Memory Efficiency:** Processing the entire dataset at once might require more memory (RAM or GPU VRAM) than available. Batches make it feasible to train on large datasets.
        *   **Faster Training (per update):** The model's weights are updated after processing each batch. Smaller batches mean more frequent updates, which can sometimes lead to faster convergence.
        *   **Stable Gradient Estimation:** While very small batches can lead to noisy gradient estimates, moderately sized batches provide a good balance, offering a more stable estimate of the gradient across the dataset compared to processing one sample at a time (stochastic gradient descent).
    *   The choice of batch size (e.g., 32, 64, 128) is a hyperparameter that can affect training speed and model performance. There's no single "best" size; it often depends on the dataset, model, and available hardware.

*   **`transform = transforms.Compose([...])`**:
    *   PyTorch's `torchvision.transforms` module provides tools for common image transformations. `transforms.Compose` allows us to chain multiple transformations together.
    *   **`transforms.ToTensor()`**: This is a crucial transformation. It converts images (which might be in formats like PIL Image or NumPy arrays) into PyTorch **Tensors**.
        *   For PIL Images, it also changes the pixel value range from `[0, 255]` (typical for images) to `[0.0, 1.0]` (a floating-point range suitable for neural networks) by dividing each pixel value by 255.
        *   It rearranges the dimensions of the image tensor from HWC (Height, Width, Channel) to CHW (Channel, Height, Width), which is the format PyTorch's convolutional layers expect. Since MNIST images are grayscale, the channel will be 1.

*   **`trainset = torchvision.datasets.MNIST(...)`** and **`testset = torchvision.datasets.MNIST(...)`**:
    *   These lines download the MNIST dataset using `torchvision.datasets.MNIST`. PyTorch makes it very convenient to access many standard datasets.
    *   **`root='./data'`**: Specifies the directory where the MNIST data will be downloaded or, if already downloaded, where it's stored.
    *   **`train=True`**: This flag indicates that we want to load the **training** portion of the MNIST dataset. This is the data the model will learn from.
    *   **`train=False`**: This flag indicates that we want to load the **testing** (or evaluation) portion of the MNIST dataset. This data is kept separate and is used to evaluate how well our trained model generalizes to new, unseen images. It's crucial not to train the model on the test data.
    *   **`download=True`**: If the MNIST dataset is not found in the `root` directory, this option allows PyTorch to download it automatically.
    *   **`transform=transform`**: This applies the transformations we defined earlier (in our case, `transforms.ToTensor()`) to each image as it's loaded from the dataset. This means each image will be converted into a PyTorch tensor with pixel values between 0.0 and 1.0.

The output of this cell (the download progress bars) shows that PyTorch is fetching the dataset files. These typically include files for training images, training labels, test images, and test labels.
---

## 2. Read / Import Data

In [None]:
BATCH_SIZE = 32 # or 64, 128

## transform the data into 'tensors' using the 'transforms' module
transform = transforms.Compose(
    [transforms.ToTensor()])

## download training dataset
trainset = torchvision.datasets.MNIST(root='./data', train=True,
                                        download=True, transform=transform)
## download testing dataset
testset = torchvision.datasets.MNIST(root='./data', train=False,
                                       download=True, transform=transform)

## 3. Load Data on DataLoader

Now that we have our datasets (`trainset` and `testset`), we need an efficient way to load the data in batches during the training and evaluation phases. This is where PyTorch's `DataLoader` comes in.

*   **`torch.utils.data.DataLoader`**:
    *   This utility takes a `Dataset` object (like our `trainset` and `testset`) and provides an iterable over it. This means we can easily loop through our data.
    *   It handles batching, shuffling, and can even use multiple worker processes to load data in parallel (though we're keeping it simple here).

Let's look at the parameters used:

*   **`trainloader = torch.utils.data.DataLoader(trainset, batch_size=BATCH_SIZE, shuffle=True, num_workers=0)`**:
    *   `trainset`: The training dataset we prepared earlier.
    *   `batch_size=BATCH_SIZE`: We use the `BATCH_SIZE` (which we set to 32) that we defined previously. This means the `trainloader` will provide 32 images (and their labels) at a time.
    *   `shuffle=True`: This is a very important parameter for the training data loader. Setting it to `True` means that the order of the data will be randomized at the beginning of each epoch.
        *   **Why shuffle?** Shuffling helps the model generalize better and prevents it from learning any unintended order present in the dataset. If the data isn't shuffled, the model might learn patterns related to the sequence of data, especially if similar samples are grouped together. It also helps ensure that batches are more representative of the overall data distribution.
    *   `num_workers=0`: This parameter determines how many subprocesses to use for data loading. `0` means the data will be loaded in the main process. For simple datasets like MNIST and on most personal machines, `0` is often fine. Increasing this can speed up data loading for larger datasets or more complex transformations by loading data in parallel, but can sometimes cause issues on Windows or in certain environments if not configured correctly.

*   **`testloader = torch.utils.data.DataLoader(testset, batch_size=BATCH_SIZE, shuffle=False, num_workers=0)`**:
    *   `testset`: The test dataset.
    *   `batch_size=BATCH_SIZE`: We often use the same batch size for testing as for training, but it can be different. For evaluation, especially if memory is a concern, a larger batch size can sometimes speed things up as there are fewer iterations.
    *   `shuffle=False`: For the test (and validation) dataset, we **do not** shuffle the data. The order of evaluation doesn't impact the model's learning (as weights are not updated during testing), and keeping it consistent allows for reproducible evaluation metrics and easier comparison across different test runs or models.
    *   `num_workers=0`: Same reasoning as for the `trainloader`.

After this cell runs, `trainloader` and `testloader` are ready to be used in our training and testing loops to feed data to the model in an organized and efficient manner.
---

## 3. Load Data on DataLoader

In [None]:
# Feed data in batches into deep-learning models
# num_workers=0 in Windows machine
trainloader = torch.utils.data.DataLoader(trainset, batch_size=BATCH_SIZE,
                                          shuffle=True, num_workers=0)
testloader = torch.utils.data.DataLoader(testset, batch_size=BATCH_SIZE,
                                         shuffle=False, num_workers=0)

## 4. Explore the Data (EDA)

Before diving into model building, it's often a good idea to visually inspect your data. This is a part of Exploratory Data Analysis (EDA). It helps ensure that the data has loaded correctly and gives you a feel for what you're working with. In this section, we'll define a helper function to display images and then use it to look at a few examples from our training set.

*   **`def imshow(img):`**:
    *   This defines a Python function named `imshow` that takes an image tensor (`img`) as input.
    *   **`#img = img / 2 + 0.5 # unnormalize`**: This line is commented out. If the images were normalized to a range like `[-1, 1]` (e.g., by subtracting the mean and dividing by standard deviation), this line would be used to "unnormalize" them back to a displayable range (often `[0, 1]`) before showing them. Since `transforms.ToTensor()` already scales images to `[0, 1]`, this specific unnormalization isn't strictly needed here.
    *   **`npimg = img.numpy()`**: Converts the PyTorch tensor `img` into a NumPy array. `matplotlib`'s `imshow` function typically expects data in NumPy array format.
    *   **`plt.imshow(np.transpose(npimg, (1, 2, 0)))`**: This is the core line for displaying the image.
        *   `plt.imshow()`: The function from `matplotlib.pyplot` (imported as `plt`) used to display data as an image.
        *   `np.transpose(npimg, (1, 2, 0))`: This is an important step. PyTorch tensors for images are typically in the CHW format (Channel, Height, Width). However, `matplotlib` expects images in HWC format (Height, Width, Channel). `np.transpose(npimg, (1, 2, 0))` rearranges the dimensions of the NumPy array from `(Channel, Height, Width)` to `(Height, Width, Channel)`. For our grayscale MNIST images, the input might be `(1, 28, 28)`, and this would transpose it to `(28, 28, 1)`.

*   **Getting and Displaying Images:**
    *   **`dataiter = iter(trainloader)`**: Creates an iterator from our `trainloader`. An iterator is an object that allows us to loop through a sequence one item (in this case, one batch) at a time.
    *   **`images, labels = next(dataiter)`**: Retrieves the next batch (the first batch, in this instance) of images and their corresponding labels from the `trainloader`. `images` will be a tensor containing a batch of image data, and `labels` will be a tensor containing their labels.
    *   **`imshow(torchvision.utils.make_grid(images))`**:
        *   `torchvision.utils.make_grid(images)`: This utility function takes a batch of images (as a 4D tensor) and arranges them into a single grid image. This is very convenient for visualizing multiple images from a batch at once.
        *   The resulting grid image (which is a tensor) is then passed to our `imshow` function to be displayed.

The output of this cell will be a grid showing several of the MNIST handwritten digit images from the first batch of our training data. This helps confirm that our data loading and transformations are working as expected.
---

## 4. Explore the Data (EDA)

In [None]:
## functions to show an image
def imshow(img):
    #img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))

## get some random training images
dataiter = iter(trainloader)
images, labels = next(dataiter)

## show images
imshow(torchvision.utils.make_grid(images))

### Checking Data Dimensions

After loading the data and visualizing some samples, it's good practice to explicitly check the dimensions (or "shape") of our data tensors. This helps us confirm that the data is structured as expected by our model and can prevent errors down the line.

*   **`for images, labels in trainloader:`**: We start a loop that iterates through the `trainloader`. Each iteration will give us one batch of `images` and their corresponding `labels`.
*   **`print("Image batch dimensions:", images.shape)`**:
    *   `images.shape`: This attribute of a PyTorch tensor returns a `torch.Size` object, which is like a tuple describing the size of each dimension of the tensor.
    *   For our MNIST data, we expect this to be something like `torch.Size([32, 1, 28, 28])`. Let's break this down:
        *   `32`: This is the `BATCH_SIZE`, meaning there are 32 images in this batch.
        *   `1`: This is the number of **color channels**. Since MNIST images are grayscale, there's only one channel. For RGB color images, this would typically be 3.
        *   `28`: This is the **height** of each image in pixels.
        *   `28`: This is the **width** of each image in pixels.
*   **`print("Image label dimensions:", labels.shape)`**:
    *   `labels.shape`: This gives us the dimensions of the labels tensor for the current batch.
    *   For our MNIST data, we expect this to be `torch.Size([32])`. This means there are 32 labels in this batch, one for each of the 32 images. Each label will be a number between 0 and 9.
*   **`break`**: After processing and printing the dimensions for the first batch, we use `break` to exit the loop. We only need to check the dimensions once to verify the structure.

The commented-out code cell that follows in the original notebook (`# Image batch dimensions: torch.Size([32, 1, 28, 28]) --> ...`) simply reiterates this interpretation. Understanding these dimensions is crucial for designing the input layer of our neural network correctly.
---

In [None]:
## Check the dimensions of a batch:
for images, labels in trainloader:
    print("Image batch dimensions:", images.shape)
    print("Image label dimensions:", labels.shape)
    break

In [None]:
# Image batch dimensions: torch.Size([32, 1, 28, 28]) -->
# 32: samples, 1 color channel, 28 x 28 (height x width)
# Image label dimensions: torch.Size([32])

## 5. Create a model, optimizer and criterion

In [8]:
# The model below consists of an __init__() portion where you include the layers and components of the neural network.
# In our model, we have a convolutional layer denoted by nn.Conv2d(...).
# We are dealing with an image dataset that is in grayscale so we only need one channel going in, so "in_channels=1".
# We hope to get a nice representation of this layer, so we use "out_channels=32".
# Kernel size is 3, and for the rest of parameters, we use the default values which you can find here.

In [9]:
class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()

        # 28x28x1 => 26x26x32
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3)
        self.d1 = nn.Linear(26 * 26 * 32, 128) # 128 represents the size we want as output, and (26*26*32) represents the dimension of the incoming data
        self.d2 = nn.Linear(128, 10) #  The same applies for the second dense layer (d2) where the dimension of the output of the previous linear layer was added as in_features=128,
        # and 10 is the size of the output (It should be same the final number of classes we want to predict)

        # To see how to calculate this, go to https://pytorch.org/docs/stable/nn.html?highlight=linear#conv2d

        # Apply an activation function such as ReLU in the middle of each layer
        # For prediction purposes, we then apply a softmax layer to the last transformation and return the output of that.
    def forward(self, x):
    # 32x1x28x28 => 32x32x26x26
        x = self.conv1(x)
        x = F.relu(x)

        # flatten => 32 x (32*26*26)
        x = x.flatten(start_dim = 1)

        # 32 x (32*26*26) => 32x128
        x = self.d1(x)
        x = F.relu(x)

        # logits => 32x10
        logits = self.d2(x)
        out = F.softmax(logits, dim=1)
        return out

### 5.1. Test one batch

In [10]:
model = MyModel()
## We always want to test 1 batch
for images, labels in trainloader:
    print("batch size:", images.shape)
    out = model(images)
    print(out.shape)
    break

batch size: torch.Size([32, 1, 28, 28])
torch.Size([32, 10])


### 5.2 optimizer and criterion

In [11]:
# Define Model
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = MyModel()
model = model.to(device)
# Learning Rate / Epoch
learning_rate = 0.001
num_epochs = 5
# optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
# criterion
criterion = nn.CrossEntropyLoss()

## 6. Train the Model

In [12]:
## Custom accuracy function
def get_accuracy(logit, target, batch_size):
    ''' Obtain accuracy for training round '''
    corrects = (torch.max(logit, 1)[1].view(target.size()).data == target.data).sum()
    accuracy = 100.0 * corrects/batch_size
    return accuracy.item()

In [13]:
for epoch in range(num_epochs):
    train_running_loss = 0.0
    train_acc = 0.0

    model = model.train()

    ## training step
    for i, (images, labels) in enumerate(trainloader):

        images = images.to(device)
        labels = labels.to(device)

        ## forward + backprop + loss
        logits = model(images)
        loss = criterion(logits, labels)
        optimizer.zero_grad()
        loss.backward()

        ## update model params
        optimizer.step()

        train_running_loss += loss.detach().item()
        train_acc += get_accuracy(logits, labels, BATCH_SIZE)

    model.eval()
    print('Epoch: %d | Loss: %.4f | Train Accuracy: %.2f' \
          %(epoch, train_running_loss / i, train_acc/i))

Epoch: 0 | Loss: 1.6332 | Train Accuracy: 83.23
Epoch: 1 | Loss: 1.4937 | Train Accuracy: 97.12
Epoch: 2 | Loss: 1.4827 | Train Accuracy: 98.11
Epoch: 3 | Loss: 1.4781 | Train Accuracy: 98.54
Epoch: 4 | Loss: 1.4753 | Train Accuracy: 98.80


## 7. Test the Model

In [14]:
test_acc = 0.0
for i, (images, labels) in enumerate(testloader, 0):
    images = images.to(device)
    labels = labels.to(device)
    outputs = model(images)
    test_acc += get_accuracy(outputs, labels, BATCH_SIZE)

print('Avg. Test Accuracy: %.2f'%( test_acc/i))

Avg. Test Accuracy: 98.24


In [None]:
# Inspired by
# https://medium.com/dair-ai/building-rnns-is-fun-with-pytorch-and-google-colab-3903ea9a3a79