# Introduction

## Purpose of the Lecture
To introduce fundamental PyTorch concepts by building a model for a common image classification task.

## The MNIST Example
The MNIST dataset consists of handwritten digits 0-9. In this lecture, we'll build a Convolutional Neural Network (CNN) to classify these digits.

## Why this Example?
- MNIST is a classic "hello world" dataset for image classification, making it ideal for beginners.
- CNNs are a fundamental type of neural network for image tasks.
- PyTorch is a popular deep learning framework known for its flexibility and ease of use.

## Learning Outcomes
Students will learn:
- How to load and prepare image data in PyTorch.
- The concept of Tensors.
- How to define a simple CNN architecture.
- The roles of an optimizer and a loss function.
- The basic training loop: forward pass, loss calculation, backward pass (backpropagation), and optimizer step.
- How to evaluate a model's performance.

## Brief Explanation of Key Concepts (for absolute beginners)
- **Neural Network (NN):** A computing system inspired by the human brain, composed of interconnected processing units (neurons) that learn from data to perform tasks.
- **Convolutional Neural Network (CNN):** A specialized type of NN particularly effective for image processing, using layers that apply "filters" to detect patterns like edges, shapes, and textures.
- **Tensor:** The primary data structure in PyTorch (similar to NumPy arrays but with GPU support). Think of them as multi-dimensional arrays that can hold numbers.
- **Training:** The process of teaching a neural network by showing it examples (data) and adjusting its internal parameters (weights) to minimize errors.
- **Epoch:** One complete pass through the entire training dataset.
- **Batch:** A small subset of the training data processed at one time during an epoch.
- **Loss Function:** A way to measure how wrong the model's predictions are compared to the actual labels. The goal of training is to minimize this loss.
- **Optimizer:** An algorithm that adjusts the model's parameters based on the loss to improve its performance (e.g., Adam, SGD).
- **Activation Function (e.g., ReLU):** A function applied to the output of neurons to introduce non-linearity, allowing the network to learn complex patterns.
- **Softmax:** An activation function often used in the final layer of a classification network to convert raw scores (logits) into probabilities for each class.


# 1. Load Libraries

This cell imports all necessary Python libraries used in this notebook.

- **torch, torch.nn, torch.nn.functional**: Core PyTorch libraries for building neural networks.
- **torchvision, torchvision.transforms, torchvision.datasets**: For accessing standard datasets (like MNIST) and image transformation tools.
- **matplotlib.pyplot**: For plotting images.
- **numpy**: For numerical operations (though PyTorch tensors are preferred for model building).
- **tqdm**: For displaying progress bars during training.


In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
from torchvision import datasets

import pandas as pd
import numpy as np
from torch.utils.data import Dataset, DataLoader
from sklearn.metrics import recall_score
import matplotlib.pyplot as plt
import joblib
from tqdm import tqdm
import os
import random

%reload_ext autoreload
%autoreload 2
%matplotlib inline

# 2. Read / Import Data

This cell loads the MNIST dataset and prepares it for training and testing.

- **BATCH_SIZE**: Data is processed in batches for memory efficiency and stable gradient estimation.
- **transforms.Compose and transforms.ToTensor()**: Convert images into PyTorch tensors, the data format the model expects. Normalizing pixel values to a range (usually 0 to 1) is implicitly handled by `ToTensor` for PIL Images.
- **torchvision.datasets.MNIST**: PyTorch provides easy access to the MNIST dataset.
- **root='./data'**: Directory where the data will be downloaded/stored.
- **train=True/False**: Separate datasets for training the model and testing its performance on unseen data.
- **download=True**: Automatically downloads the dataset if not found locally.
- **transform=transform**: Applies the defined tensor transformation to the images.


In [None]:
BATCH_SIZE = 32 # or 64, 128

## transform the data into 'tensors' using the 'transforms' module
transform = transforms.Compose(
    [transforms.ToTensor()])

## download training dataset
trainset = torchvision.datasets.MNIST(root='./data', train=True,
                                        download=True, transform=transform)
## download testing dataset
testset = torchvision.datasets.MNIST(root='./data', train=False,
                                       download=True, transform=transform)

In [None]:
# 3. Load Data on DataLoader
# Feed data in batches into deep-learning models
# num_workers=0 in Windows machine
trainloader = torch.utils.data.DataLoader(trainset, batch_size=BATCH_SIZE,
                                          shuffle=True, num_workers=0)
testloader = torch.utils.data.DataLoader(testset, batch_size=BATCH_SIZE,
                                         shuffle=False, num_workers=0)

In [None]:
# 4. Explore the Data (EDA)
# functions to show an image
def imshow(img):
    #img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))

## get some random training images
dataiter = iter(trainloader)
images, labels = next(dataiter)

## show images
imshow(torchvision.utils.make_grid(images))

In [None]:
# Check the dimensions of a batch:
for images, labels in trainloader:
    print("Image batch dimensions:", images.shape)
    print("Image label dimensions:", labels.shape)
    break
# Image batch dimensions: torch.Size([32, 1, 28, 28]) -->
# 32: samples, 1 color channel, 28 x 28 (height x width)
# Image label dimensions: torch.Size([32])

In [None]:
# 5. Create a model, optimizer and criterion
# The model below consists of an __init__() portion where you include the layers and components of the neural network.
# In our model, we have a convolutional layer denoted by nn.Conv2d(...).
# We are dealing with an image dataset that is in grayscale so we only need one channel going in, so "in_channels=1".
# We hope to get a nice representation of this layer, so we use "out_channels=32".
# Kernel size is 3, and for the rest of parameters, we use the default values which you can find here.
class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()

        # 28x28x1 => 26x26x32
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3)
        self.d1 = nn.Linear(26 * 26 * 32, 128) # 128 represents the size we want as output, and (26*26*32) represents the dimension of the incoming data
        self.d2 = nn.Linear(128, 10) #  The same applies for the second dense layer (d2) where the dimension of the output of the previous linear layer was added as in_features=128,
        # and 10 is the size of the output (It should be same the final number of classes we want to predict)

        # To see how to calculate this, go to https://pytorch.org/docs/stable/nn.html?highlight=linear#conv2d

        # Apply an activation function such as ReLU in the middle of each layer
        # For prediction purposes, we then apply a softmax layer to the last transformation and return the output of that.
    def forward(self, x):
    # 32x1x28x28 => 32x32x26x26
        x = self.conv1(x)
        x = F.relu(x)

        # flatten => 32 x (32*26*26)
        x = x.flatten(start_dim = 1)

        # 32 x (32*26*26) => 32x128
        x = self.d1(x)
        x = F.relu(x)

        # logits => 32x10
        logits = self.d2(x)
        out = F.softmax(logits, dim=1)
        return out

In [None]:
### 5.1. Test one batch
model = MyModel()
## We always want to test 1 batch
for images, labels in trainloader:
    print("batch size:", images.shape)
    out = model(images)
    print(out.shape)
    break

In [None]:
### 5.2 optimizer and criterion
# Define Model
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = MyModel()
model = model.to(device)
# Learning Rate / Epoch
learning_rate = 0.001
num_epochs = 5
# optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
# criterion
criterion = nn.CrossEntropyLoss()

In [None]:
## 6. Train the Model
## Custom accuracy function
def get_accuracy(logit, target, batch_size):
    ''' Obtain accuracy for training round '''
    corrects = (torch.max(logit, 1)[1].view(target.size()).data == target.data).sum()
    accuracy = 100.0 * corrects/batch_size
    return accuracy.item()
for epoch in range(num_epochs):
    train_running_loss = 0.0
    train_acc = 0.0

    model = model.train()

    ## training step
    for i, (images, labels) in enumerate(trainloader):

        images = images.to(device)
        labels = labels.to(device)

        ## forward + backprop + loss
        logits = model(images)
        loss = criterion(logits, labels)
        optimizer.zero_grad()
        loss.backward()

        ## update model params
        optimizer.step()

        train_running_loss += loss.detach().item()
        train_acc += get_accuracy(logits, labels, BATCH_SIZE)

    model.eval()
    print('Epoch: %d | Loss: %.4f | Train Accuracy: %.2f' \
          %(epoch, train_running_loss / i, train_acc/i))

In [None]:
## 7. Test the Model
test_acc = 0.0
for i, (images, labels) in enumerate(testloader, 0):
    images = images.to(device)
    labels = labels.to(device)
    outputs = model(images)
    test_acc += get_accuracy(outputs, labels, BATCH_SIZE)

print('Avg. Test Accuracy: %.2f'%( test_acc/i))