#**Introduction to PyTorch and Libraries for Deep Learning**

By *Joshua Otis*

## **Pre-shared Content for Advanced PyTorch Tutorial**

**Title:** *Preparation for "Introduction to PyTorch and Libraries for Deep Learning"*


### 1. **Prerequisites Checklist**

Participants should be comfortable with:

* Python programming (functions, classes, NumPy)
* Basic concepts in machine learning (e.g., regression, classification)
* Jupyter notebooks or a Python IDE

**NOTE**:  This tutorial session would be run entirely in Google Colab

Participants can complete a quick Python refresher if needed:

* [Python for Data Science Handbook (Chapters 1–3)](https://jakevdp.github.io/PythonDataScienceHandbook/)




### 2. **Environment Setup Instructions**

Install the following libraries in Google Colab before the session:

#### Install via `pip`:

```python
# Check if running in Colab
import sys
IN_COLAB = 'google.colab' in sys.modules

if IN_COLAB:
    print("Running in Google Colab. Installing required packages...")
    !pip install torch torchvision lightning matplotlib scikit-learn pandas --quiet
else:
    print("You're not in Colab. Please ensure all dependencies are installed.")
```

Link to the official PyTorch install guide (auto-configures commands):

[https://pytorch.org/get-started/locally/](https://pytorch.org/get-started/locally/)




###3. **Pre-reading / Learning Materials**

#### I. **What is Deep Learning?**

Brief introduction:

* [Deep Learning Overview - MIT](https://introtodeeplearning.com)
* [3Blue1Brown – What is a Neural Network?](https://www.youtube.com/watch?v=aircAruvnKk)
* [Neural Networks and Deep Learning by Michael A. Nielsen](http://neuralnetworksanddeeplearning.com/)

#### II. **Intro to PyTorch**

* [Official PyTorch 60-Minute Blitz](https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html)
* [PyTorch for Deep Learning (FreeCodeCamp, first 45 mins)](https://www.youtube.com/watch?v=GIsg-ZUy0MY)




### 4. **Libraries to Familiarize With**

You can familiarize yourself with the following PyTorch libraries by taking a look at these:

* `torchvision` – For loading datasets (like CIFAR-10, MNIST)
* `torch.nn` – For building neural network architectures
* `torch.optim` – For optimization algorithms like SGD, Adam
* `sklearn` – For evaluation metrics (accuracy, confusion matrix, etc.)


Link to some cheat sheets or docs:

* [PyTorch Documentation](https://pytorch.org/docs/stable/index.html)
* [Torchvision Documentation](https://pytorch.org/vision/stable/index.html)
* [Torchnn Documentation](https://docs.pytorch.org/docs/stable/nn.html)
* [Torchlightning Documentation](https://lightning.ai/docs/pytorch/stable/)

###5. **Pre-work Notebook or Exercise**

**Objective:** Warm-up with PyTorch tensors and basic NN structure.

Look at some simple notebook or script that includes:

* Creating and manipulating tensors
* Building a simple linear model
* Using `autograd` and `.backward()`

You can create this or adapt it from:

* [PyTorch Tutorials GitHub](https://github.com/pytorch/tutorials/tree/main/beginner_source/basics)

##**Session Objectives**


##**PART A**

##I. Introduction
* Review foundational deep learning principles to contextualize PyTorch's role in model development.

* Understand the significance of PyTorch in modern deep learning research and production pipelines.

* Clarify the objectives of the session and align expectations with advanced learning outcomes.

##II. Advanced PyTorch Concepts
* Perform complex tensor operations and transformations using PyTorch’s tensor API for high-performance computation.

* Utilize PyTorch’s autograd system to implement custom optimization workflows with automatic differentiation.

* Design and build reusable and composable neural network architectures using torch.nn.Module.

* Develop custom datasets and implement efficient data pipelines using DataLoader.

##**PART B**

##III. Deep Learning Libraries and Tools
* Apply torchvision for image/video processing tasks and leverage pre-trained models for transfer learning.

* Use PyTorch Lightning to simplify training loops.

* Explore additional PyTorch libraries like TorchText, TorchAudio, Ignite and Catalyst to enhance training workflows and performance monitoring.

##IV. Sample Dataset for PyTorch
* Use of PyTorch to run the FashionMNIST dataset.

##**PART C**

##V. Challenge
* Create a task for participants to check their level of understanding.


##VI. Conclusion and Next Steps
* Summarize the session’s key takeaways, reinforcing practical skills and theoretical insights.

* Identify reliable resources for continuous learning (e.g., research papers, tutorials, code repositories).

* Develop confidence and curiosity to independently explore and implement advanced deep learning models using PyTorch.




#PART A

## **Why Deep Learning?**


#### **Real-World Problems Solved by Deep Learning**

* **Computer Vision**:

  * Object detection, facial recognition, autonomous driving
  * E.g., Tesla Autopilot, Google Photos tagging
* **Natural Language Processing (NLP)**:

  * Translation, chatbots, sentiment analysis
  * E.g., Google Translate, Siri, ChatGPT
* **Games & Reinforcement Learning**:

  * Mastering Go, Chess, and StarCraft
  * E.g., AlphaGo by DeepMind
* **Healthcare & Bioinformatics**:

  * Cancer detection from scans, protein folding
  * E.g., DeepMind’s AlphaFold


##**PyTorch**

PyTorch is an open source machine learning (ML) framework based on the Python programming language and the Torch library.



### **Why PyTorch?**

* **Flexible & Dynamic**:

  * Eager execution model—debug and test like regular Python
* **Pythonic & Intuitive**:

  * Seamlessly integrates with Python’s ecosystem
* **Researcher-Friendly**:

  * Widely adopted in academic papers and ML research
* **Strong Ecosystem**:

  * Libraries like TorchVision, TorchText, PyTorch Lightning
* **Production-Ready**:

  * TorchScript, ONNX support, and deployment tools

### Over the course of this tutorial we are going to

#### **1. Understand Core PyTorch Concepts**

* Learn how PyTorch works under the hood: tensors, autograd, and modules
* Explore model building, training loops, and optimization
* Understand how PyTorch differs from other frameworks (e.g., TensorFlow)


#### **2. Explore Key Libraries for Deep Learning**

* **TorchVision** – tools for image data: transforms, datasets, pre-trained models
* **PyTorch Lightning** - a lightweight wrapper for PyTorch that simplifies training loops, logging, and scaling.
* Speak briefly on ***other libraries*** like
  * TorchAudio
  * TorchText
  * PyTorch Ignite
  * Catalyst

#### **3. Apply Knowledge in a Hands-On Challenge**

* Build and train a PyTorch model on toy image or data
* Use real deep learning workflows: data preprocessing, model creation, training, evaluation
* Gain practical experience integrating PyTorch libraries


### **PyTorch**

PyTorch is a powerful and intuitive deep learning framework that combines:

* NumPy-like tensors

* Automatic gradient computation

* Modular network design

* Easy GPU support



We begin by looking at the PyTorch library and some of the things it contains then we explore the concepts

In [None]:
import torch

In [None]:
dir(torch.stack)

In [None]:
help(torch.arange)

## **Core PyTorch Concepts**

### **PyTorch Tensors: Creation and Operations**
* Tensors are multidimensional arrays, similar to NumPy arrays, but with GPU support.

* Support broadcasting, indexing, slicing, reshaping (view, reshape), and efficient in-place operations.

* Advanced operations: einsum, matrix multiplication (matmul, bmm, mm), and tensor arithmetic.

For documentation on PyTorch tensors, learn more at

https://pytorch.org/docs/stable/tensors.html

####Examples
Here we look at an example of where tensors are used in tensor multplication, indexing, slicing and reshaping


In [None]:
import torch

a = torch.tensor(3.) #Scalar
b = torch.tensor([1., 2, 3, 4]) #Vector
c = torch.tensor([[5., 6],
                  [7, 8],
                  [9, 10]]) #Matrix
a.dtype
a.shape

In [None]:
##EXAMPLE

import torch

a = torch.randn(3,4)
b = torch.randn(2,4,5)
a
b

In [None]:
### A simple 3D tensor

tensor_3d = torch.tensor([
    [[ 1,  2,  3,  4],
     [ 5,  6,  7,  8],
     [ 9, 10, 11, 12]],

    [[13, 14, 15, 16],
     [17, 18, 19, 20],
     [21, 22, 23, 24]]
])

print("Original 3D Tensor (2x3x4):\n", tensor_3d)


In [None]:
##Indexing in 3D

# Get element at batch 1, row 2, column 3
element = tensor_3d[1, 2, 3]
print("Indexed Element [1, 2, 3]:", element)  # Output: 24

In [None]:
##Slicing in 3D

# Get all rows and columns of the first batch
slice_1 = tensor_3d[0, :, :]
print("Slice of first batch (0):\n", slice_1)

# Get only the first two columns across all batches and rows
slice_2 = tensor_3d[:, :, :2]
print("First two columns of every row in every batch:\n", slice_2)

# Get the middle row (row 1) of each batch
slice_3 = tensor_3d[:, 1, :]
print("Middle row of each batch:\n", slice_3)


In [None]:
##Reshaping in 3D

# Reshape entire tensor (2x3x4) into (3x2x4)
reshaped = tensor_3d.permute(1, 0, 2)
print("Reshaped Tensor (3x2x4):\n", reshaped)

# Flatten each batch (from 3x4 to 12)
flattened_batches = tensor_3d.view(2, -1)
print("Each batch flattened:\n", flattened_batches)


In [None]:
##EXAMPLE

#Using Standard Matrix Multiplication (2D only)

# a: shape (3, 4)
# b: shape (4, 5)
a = torch.randn(3, 4)
b = torch.randn(4, 5)

result = torch.mm(a, b)
print(result.shape)  # (3, 5)

In [None]:
#EXAMPLE

#Using Batch Matrix Multiplication (3D only with first dimension as batch size)

# a: shape (10, 3, 4) — batch of 10 matrices (3x4)
# b: shape (10, 4, 2) — batch of 10 matrices (4x2)
a = torch.randn(10, 3, 4)
b = torch.randn(10, 4, 2)

result = torch.bmm(a, b)
print(result.shape)  # (10, 3, 2)

In [None]:
##EXAMPLE

#Using Matrix Multiplication (3D)

# a: shape (2, 3, 4)
# b: shape (2, 4, 5) — compatible for batch matrix multiplication
a = torch.randn(2, 3, 4)
b = torch.randn(2, 4, 5)

# Will perform batched matrix multiplication across batch dimension
result = torch.matmul(a, b)
print(result.shape)  # (2, 3, 5)

In [None]:
##EXAMPLE

#Using  Matrix Multiplication with Broadcasting

# a: shape (2, 3, 4) — batch of 2 matrices, each 3×4
a = torch.randn(2, 3, 4)

# b: shape (4, 5) — a single matrix 4×5
b = torch.randn(4, 5)

# Broadcasted: b is broadcast across the batch dimension of a
result = torch.matmul(a, b)
print(result.shape)  # (2, 3, 5)

In [None]:
##Similarly the EXAMPLE above can be broadcasted this way

a = torch.randn(3, 4)     # shape (3, 4)
b = torch.randn(2, 4, 5)  # shape (2, 4, 5)

result = torch.matmul(a, b)  # a is broadcast to (2, 3, 4)
print(result.shape)  # (2, 3, 5)

In [None]:
##EXAMPLE

#Creating an L-norm

# Create a 1D tensor (vector)
v = torch.tensor([3.0, 4.0])

# Compute the L2 norm (Euclidean norm, p=2)
l2_norm = torch.norm(v, p=2)

print(f"Vector: {v}")
print(f"L2 Norm: {l2_norm}")  # Should output 5.0 (√(3² + 4²))


In [None]:
##EXAMPLE

#Creating a Frobenius norm

# 2D tensor (matrix)
A = torch.tensor([[1.0, 2.0], [3.0, 4.0]])

# Frobenius norm (default for matrices with p=2)
fro_norm = torch.norm(A, p='fro')

print(f"Matrix:\n{A}")
print(f"Frobenius Norm: {fro_norm}")  # √(1² + 2² + 3² + 4²) = √30 ≈ 5.477


In [None]:
import numpy as np
import torch

x = np.array([[1,2],[3,4]])
x

In [None]:
#convert numpy array to to a torch tensor
y = torch.from_numpy(x)
y

In [None]:
x.dtype, y.dtype

In [None]:
z = y.numpy()
z

#### **Example: Image Tensor Transformations**
Let us say you have an image tensor of shape **[batch_size, height, width, channels]**, and you want to feed it into a CNN which expects **[batch_size, channels, height, width]**.

In [None]:
import torch

# Assume your image data is in a NumPy array called 'image_np'
# Replace this with your actual image data loading
import numpy as np
image_np = np.random.randint(0, 255, size=(10, 32, 32, 3), dtype=np.uint8)

# Convert the NumPy array to a PyTorch tensor
image = torch.from_numpy(image_np)

# Now you can permute the dimensions
image = image.permute(0, 3, 1, 2)  # Rearranges dimensions

print(image.shape)

In [None]:
# Define a sample tensor for 'x'
x = torch.randn(4, 5, 6)  # Example tensor with shape (4, 5, 6)


x = x.view(x.size(0), -1)  # Flatten all but batch dimension

print(x.shape)

### **PyTorch Autograd**
Automatic Differentiation
* PyTorch dynamically builds a computation graph as operations occur.

* Uses `requires_grad=True` to track operations for automatic gradient calculation.

* Backpropagation via `.backward()` triggers gradient computation.

Gradient Computation & Optimization
* Access gradients via `.grad` attributes on tensors.

Common workflow:

`loss.backward()`

`optimizer.step()`

`optimizer.zero_grad()`

* Integrates with `torch.optim` for efficient parameter updates.

For documentation on PyTorch automatic differentiation, learn more at

https://docs.pytorch.org/docs/stable/autograd.html

In [None]:
import torch

x = torch.tensor(2.0, requires_grad=True)
y = x ** 3 + 2 * x
y.backward()  #compute derivatives
print(x.grad)  # dy/dx = 3x² + 2 = 3(2²) + 2 = 14

A more detailed example using the gradient descent and optimization

In [None]:
import torch

# Set random seed for reproducibility
torch.manual_seed(0)

# Example data: y = 2x + 3
x = torch.tensor([[1.0], [2.0], [3.0]])
y_true = torch.tensor([[5.0], [7.0], [9.0]])

# Initialize weights and bias
weight = torch.randn(1, 1, requires_grad=True)
bias = torch.randn(1, requires_grad=True)

# Define a simple linear model
def model(x):
    return x @ weight + bias  # matrix multiply + bias

# Define mean squared error loss
def mse_loss(y_pred, y_true):
    return ((y_pred - y_true) ** 2).mean()

# Use SGD optimizer to update weight and bias
optimizer = torch.optim.SGD([weight, bias], lr=0.01)

# Forward pass
y_pred = model(x)
loss = mse_loss(y_pred, y_true)

print("Loss before backward:", loss.item())

# Backward pass
loss.backward()

# Update parameters
optimizer.step()

# Zero gradients
optimizer.zero_grad()

# Check updated weights and bias
print("Updated weight:", weight.data)
print("Updated bias:", bias.data)


In the code above, we look at a linear regression where we use weights and biases to predict the system using an optimization technique called the gradient descent.

Mathematically this can be expressed as
$$y = XW^T + b$$


### **PyTorch Modules**

Building Custom Modules
* Subclass `nn.Module` to define custom models.

* Implement `__init__()` for layers and `forward()` for logic.

Example

```python
class MyModel(nn.Module):
    def __init__(self): ...
    def forward(self, x): ...
    
```
Module Composition & Reuse
* Nest modules inside others using `nn.Sequential`, containers, or custom hierarchies.

* Enables model reuse, better organization, and cleaner code.

For documentation on PyTorch Modules, learn more at

https://docs.pytorch.org/docs/stable/notes/modules.html

####A simple custom model

---

```python
import torch.nn as nn

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc = nn.Linear(10, 1)

    def forward(self, x):
        return self.fc(x)
```

####A more complex model


---

```python

import torch.nn as nn

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.layer1 = nn.Linear(10, 20)
        self.relu = nn.ReLU()
        self.layer2 = nn.Linear(20, 1)

    def forward(self, x):
        x = self.layer1(x)
        x = self.relu(x)
        return self.layer2(x)

class ResidualBlock(nn.Module):
    def __init__(self, in_features):
        super().__init__()
        self.block = nn.Sequential(
            nn.Linear(in_features, in_features),
            nn.ReLU(),
            nn.Linear(in_features, in_features)
        )

    def forward(self, x):
        return x + self.block(x)  # Skip connection

class ResNetMini(nn.Module):
    def __init__(self):
        super().__init__()
        self.input = nn.Linear(64, 64)
        self.resblock1 = ResidualBlock(64)
        self.resblock2 = ResidualBlock(64)
        self.output = nn.Linear(64, 10)

    def forward(self, x):
        x = self.input(x)
        x = self.resblock1(x)
        x = self.resblock2(x)
        return self.output(x)

model = nn.Sequential(
    nn.Linear(10, 20),
    nn.ReLU(),
    nn.Linear(20, 1)
)

```

### **PyTorch Data Loaders**
Custom Data Loading & Preprocessing
* Implement custom `Dataset` by subclassing `torch.utils.data.Dataset`.

* Use `DataLoader` to batch, shuffle, and load data with multiprocessing.

For documentation on PyTorch Data Loaders and Datasets, learn more at

https://docs.pytorch.org/tutorials/beginner/basics/data_tutorial.html

```python

from torch.utils.data import DataLoader, TensorDataset
dataset = TensorDataset(torch.rand(100, 10), torch.rand(100, 1))
loader = DataLoader(dataset, batch_size=16, shuffle=True)
```

Before we proceed, let us combine our knowledge so far to create a simple Neural Network Module

In [None]:
# PyTorch Neural Network Module

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset

# 1. Create a Toy Dataset
X = torch.rand(100, 10)  # 100 samples, 10 features
y = torch.rand(100, 1)   # 100 target values (e.g., regression)

dataset = TensorDataset(X, y)
loader = DataLoader(dataset, batch_size=16, shuffle=True)

# 2. Define a Simple Neural Network using nn.Module
class SimpleNet(nn.Module):
    def __init__(self):
        super(SimpleNet, self).__init__()
        self.fc1 = nn.Linear(10, 32)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(32, 1)

    def forward(self, x):
        x = self.relu(self.fc1(x))
        return self.fc2(x)

model = SimpleNet()

# 3. Define Loss and Optimizer
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

# 4. Train the Model
for epoch in range(5):  # 5 epochs for demo
    for batch_X, batch_y in loader:
        pred = model(batch_X)               # Forward pass
        loss = criterion(pred, batch_y)     # Compute loss

        optimizer.zero_grad()               # Zero gradients
        loss.backward()                     # Backpropagation
        optimizer.step()                    # Update weights

    print(f"Epoch {epoch+1}, Loss: {loss.item():.4f}")


print("Training complete. You can now evaluate or save the model.")


#PART B

In this second session, we would explore

* Deep learning libraries associated with PyTorch
* A dataset with PyTorch
* Building a model

##**Key Deep Learning Libraries**

###**Torchvision**
Image and Video Processing
* Built-in transforms for cropping, resizing, normalizing, augmenting images.

* Datasets like MNIST, CIFAR-10, ImageNet, COCO readily available.

Pre-trained Models & Transfer Learning
* Access pretrained CNNs (ResNet, VGG, EfficientNet, etc.).

* Easily fine-tune on custom datasets:
```python
model = models.resnet50(pretrained=True)
model.fc = nn.Linear(...custom_classes)
```

For documentation on Torchvision, learn more at

https://docs.pytorch.org/vision/stable/index.html

In [None]:
import torchvision.datasets as datasets
import torchvision.transforms as transforms
from torch.utils.data import DataLoader

# Load the MNIST dataset
train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transforms.ToTensor())

# Create a DataLoader for batching
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)

# Example of a transformation
resize_transform = transforms.Resize((32, 32))

####**PyTorch Lightning**

PyTorch Lightning is a lightweight wrapper around PyTorch designed to:

* Decouple model code from engineering code (e.g., training loops, GPU management)

* Simplify complex training pipelines

* Make your research more readable, scalable, and reproducible

It follows the philosophy:

"Focus on the science, not the boilerplate."

For documentation on PyTorch Lightning, learn more at

https://lightning.ai/docs/pytorch/stable/


In [None]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
import pytorch_lightning as pl
from pytorch_lightning.loggers import TensorBoardLogger
import matplotlib.pyplot as plt

# Step 1: Get the data
transform = transforms.ToTensor()

train_data = datasets.MNIST(root='.', train=True, download=True, transform=transform)
test_data = datasets.MNIST(root='.', train=False, download=True, transform=transform)

train_loader = DataLoader(train_data, batch_size=64, shuffle=True)
test_loader = DataLoader(test_data, batch_size=64)

# Step 2: Create a model
class MNISTModel(pl.LightningModule):
    def __init__(self):
        super().__init__()
        self.model = nn.Sequential(
            nn.Flatten(),
            nn.Linear(28 * 28, 128),
            nn.ReLU(),
            nn.Linear(128, 10)
        )

    def forward(self, x):
        return self.model(x)

    def training_step(self, batch, batch_idx):
        x, y = batch
        logits = self(x)
        loss = nn.functional.cross_entropy(logits, y)
        acc = (logits.argmax(dim=1) == y).float().mean()
        self.log("train_loss", loss)
        self.log("train_acc", acc)
        return loss

    def test_step(self, batch, batch_idx):
        x, y = batch
        logits = self(x)
        loss = nn.functional.cross_entropy(logits, y)
        acc = (logits.argmax(dim=1) == y).float().mean()
        self.log("test_loss", loss)
        self.log("test_acc", acc)
        return {"x": x, "y": y, "preds": logits.argmax(dim=1)}

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=1e-3)

# Step 3: Train the dataset
logger = TensorBoardLogger("tb_logs", name="mnist")
model = MNISTModel()
trainer = pl.Trainer(max_epochs=5, logger=logger, enable_checkpointing=False)
trainer.fit(model, train_loader)

# Step 4: Test the dataset
test_results = trainer.test(model, test_loader)

# Step 5: Evaluate the model
model.eval()
sample_batch = next(iter(test_loader))
images, labels = sample_batch
with torch.no_grad():
    preds = model(images).argmax(dim=1)

plt.figure(figsize=(10, 4))
for i in range(10):
    plt.subplot(2, 5, i+1)
    plt.imshow(images[i].squeeze(), cmap='gray')
    plt.title(f"Pred: {preds[i].item()}\nTrue: {labels[i].item()}")
    plt.axis('off')
plt.tight_layout()
plt.show()


## **Advanced Deep Learning Libraries and Tools**

### **Torchtext**
Text Processing and Preprocessing
* Tokenization, vocabulary building, padding, and batching built-in.

* Supports datasets like IMDB, AG News, and more.

Word Embeddings and Language Models
* Pre-trained embeddings (GloVe, FastText) for initializing models.

* Interface with Transformers for fine-tuning NLP tasks.

For documentation on TorchText, learn more at

https://docs.pytorch.org/text/stable/index.html


####**TorchAudio**

Torchaudio is a domain-specific library built to integrate audio processing capabilities directly into the PyTorch ecosystem.
It provides:

* Easy-to-use dataset loaders for popular audio datasets

* Efficient audio transformations and feature extraction (e.g., Mel spectrograms)

* Tools for building and training speech models, including support for pre-trained models

It allows you to go from raw waveform → features → deep model → predictions in one pipeline.

For documentation on TorchAudio, learn more at

https://docs.pytorch.org/audio/stable/index.html

####**PyTorch Ignite**

PyTorch Ignite is a high-level library built on PyTorch for:

* Rapidly developing research workflows

* Managing training and evaluation loops

* Simplifying event-driven logging, checkpointing, and metrics

It provides flexible, yet powerful abstractions for both beginners and experienced ML engineers.

For documentation on PyTorch Ignite, learn more at

https://docs.pytorch.org/ignite/index.html

####**Catalyst**

Catalyst is a deep learning framework for fast prototyping and reproducible experiments. It is built on top of PyTorch and Ignite, but offers a higher-level interface.

Catalyst is especially suited for:

* ML research

* Kaggle competitions

* Industrial production pipelines

For more on Catalyst, see

https://catalyst-team.com/

## **Dataset Overview – FashionMNIST**

#### **Why FashionMNIST?**

* Designed as a **drop-in replacement for MNIST**
* Slightly more complex: better for benchmarking real-world models
* Small, grayscale images (28×28), ideal for beginners


#### **Categories of Fashion Items**

FashionMNIST includes **10 classes**:

1. T-shirt/top
2. Trouser
3. Pullover
4. Dress
5. Coat
6. Sandal
7. Shirt
8. Sneaker
9. Bag
10. Ankle boot



#### **Data Shape and Labels**

* **Images**: `28 x 28` grayscale (1 channel)
* **Label type**: integer (0 to 9)
* **Train/Test split**:

  * **60,000** training images
  * **10,000** test images


### **Problem Statement**

####  **Goal**

* Build a model that can **accurately classify** clothing images from the **FashionMNIST** dataset.


#### **Task**

* Train a **neural network classifier** using PyTorch
* Input: `28x28` grayscale image
* Output: One of **10 fashion item classes**


#### **Evaluation Metric**

* **Accuracy**: Percentage of correctly predicted labels over total samples


In [None]:
# Hands-On with PyTorch: FashionMNIST Classifier

# Step 1: Import Libraries
import torch
from torch import nn
import torchvision
from torchvision import transforms
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt

device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f'Using device: {device}')

# Step 2: Load the FashionMNIST Dataset
transform = transforms.ToTensor()

train_data = torchvision.datasets.FashionMNIST(root='data', train=True, download=True, transform=transform)
test_data = torchvision.datasets.FashionMNIST(root='data', train=False, download=True, transform=transform)

train_loader = DataLoader(train_data, batch_size=64, shuffle=True)
test_loader = DataLoader(test_data, batch_size=64, shuffle=False)

# Step 3: Visualize Sample Images
classes = train_data.classes
def show_images(images, labels):
    fig, ax = plt.subplots(1, 6, figsize=(12, 4))
    for i in range(6):
        ax[i].imshow(images[i][0], cmap='gray')
        ax[i].set_title(classes[labels[i]])
        ax[i].axis('off')

images, labels = next(iter(train_loader))
show_images(images, labels)

# Step 4: Define a Neural Network
class FashionClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.model = nn.Sequential(
            nn.Linear(28*28, 128),
            nn.ReLU(),
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.Linear(64, 10)
        )
    def forward(self, x):
        x = self.flatten(x)
        return self.model(x)

model = FashionClassifier().to(device)

# Step 5: Train the Model
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)
        pred = model(X)
        loss = loss_fn(pred, y)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if batch % 100 == 0:
            print(f"loss: {loss.item():>7f}  [{batch * len(X):>5d}/{size:>5d}]")

for epoch in range(5):
    print(f"Epoch {epoch+1}\n-------------------------------")
    train(train_loader, model, loss_fn, optimizer)
print("Training done!")

# Step 6: Evaluate the Model
def test(dataloader, model):
    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
            total += y.size(0)
    accuracy = correct / total
    print(f"Accuracy: {accuracy:.2f}")

test(test_loader, model)

##**Step-by-Step – Build Your First Model**

####Step-by-Step – Build Your First Model

#### **1. Define the Network**

* Use `nn.Module` to create a **simple feedforward neural network**
* Input: Flattened `28x28` $\rightarrow$ Linear layers $\rightarrow$ ReLU $\rightarrow$ Output (10 classes)
* Tip: Start with one hidden layer to keep it simple

```python
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(784, 128)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(128, 10)
```

#### **2. Choose Loss and Optimizer**

* Loss: `nn.CrossEntropyLoss()` – for multi-class classification
* Optimizer: `torch.optim.Adam()` or `SGD` for training

```python
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
```


#### **3. Training Loop**

* Iterate through batches
* Perform **forward pass**, **loss computation**, **backpropagation**, and **parameter update**

```python
for images, labels in train_loader:
    outputs = model(images.view(-1, 28*28))
    loss = criterion(outputs, labels)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
```


So for our model we have

In [None]:
# Build Your First PyTorch Model – FashionMNIST
#In this notebook, you will
#Define a simple neural network
#Train it on FashionMNIST
#Track loss and accuracy

# Imports
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Data loading
## 1. Load the FashionMNIST Dataset
transform = transforms.ToTensor()

train_dataset = datasets.FashionMNIST(root='./data', train=True, download=True, transform=transform)
test_dataset = datasets.FashionMNIST(root='./data', train=False, download=True, transform=transform)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)

# Define model
## 2. Define the Neural Network
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(28*28, 128)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = x.view(-1, 28*28)  # Flatten
        x = self.relu(self.fc1(x))
        return self.fc2(x)

model = Net()

# Loss and optimizer
## 3. Set Loss Function and Optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
## 4. Train the Model
epochs = 5
for epoch in range(epochs):
    running_loss = 0.0
    for images, labels in train_loader:
        outputs = model(images)
        loss = criterion(outputs, labels)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
    print(f"Epoch [{epoch+1}/{epochs}], Loss: {running_loss/len(train_loader):.4f}")

# Evaluation
## 5. Evaluate on Test Set
correct = 0
total = 0
with torch.no_grad():
    for images, labels in test_loader:
        outputs = model(images)
        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f'Test Accuracy: {100 * correct / total:.2f}%')


#PART C

In this last session, we will look at
* An improved version of the FashionMNIST dataset model
* A challenge
* Best practices for PyTorch
* Next steps

In [None]:
# Create an improved version of the FashionMNIST model with added features and visualizations

#Improved FashionMNIST Model – PyTorch
#This notebook extends the basic model with:
#Additional hidden layers
#Dropout for regularization
#Training/validation loss visualization
#Sample predictions

# Imports
import torch
import torch.nn as nn
import torch.optim as optim
import matplotlib.pyplot as plt
from torchvision import datasets, transforms
from torch.utils.data import DataLoader, random_split

# Data loading and train/val split
## 1. Load and Split the Dataset
transform = transforms.ToTensor()

train_full = datasets.FashionMNIST(root='./data', train=True, download=True, transform=transform)
train_size = int(0.8 * len(train_full))
val_size = len(train_full) - train_size
train_dataset, val_dataset = random_split(train_full, [train_size, val_size])

test_dataset = datasets.FashionMNIST(root='./data', train=False, download=True, transform=transform)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=64, shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)

# Improved model
## 2. Define an Improved Neural Network
class ImprovedNet(nn.Module):
    def __init__(self):
        super(ImprovedNet, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(28*28, 256),
            nn.ReLU(),
            nn.Dropout(0.3),
            nn.Linear(256, 128),
            nn.ReLU(),
            nn.Linear(128, 10)
        )

    def forward(self, x):
        x = x.view(-1, 28*28)
        return self.model(x)

model = ImprovedNet()

# Loss and optimizer
## 3. Set Loss and Optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training with validation tracking
## 4. Train the Model with Validation
train_losses, val_losses = [], []
epochs = 5

for epoch in range(epochs):
    model.train()
    running_loss = 0.0
    for images, labels in train_loader:
        outputs = model(images)
        loss = criterion(outputs, labels)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        running_loss += loss.item()

    train_losses.append(running_loss / len(train_loader))

    model.eval()
    val_loss = 0.0
    with torch.no_grad():
        for images, labels in val_loader:
            outputs = model(images)
            loss = criterion(outputs, labels)
            val_loss += loss.item()
    val_losses.append(val_loss / len(val_loader))

    print(f"Epoch [{epoch+1}/{epochs}], Train Loss: {train_losses[-1]:.4f}, Val Loss: {val_losses[-1]:.4f}")

# Plotting training vs validation loss
## 5. Visualize Training and Validation Loss
plt.plot(train_losses, label='Train Loss')
plt.plot(val_losses, label='Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.title('Training vs Validation Loss')
plt.show()

# Evaluation
## 6. Evaluate on Test Set
correct = 0
total = 0
with torch.no_grad():
    for images, labels in test_loader:
        outputs = model(images)
        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f'Test Accuracy: {100 * correct / total:.2f}%')

# Sample predictions
## 7. Show Sample Predictions
classes = ['T-shirt', 'Trouser', 'Pullover', 'Dress', 'Coat','Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

def imshow(img):
    plt.imshow(img.squeeze(), cmap='gray')
    plt.axis('off')

dataiter = iter(test_loader)
images, labels = next(dataiter)
outputs = model(images)
_, preds = torch.max(outputs, 1)

plt.figure(figsize=(10, 4))
for i in range(6):
    plt.subplot(2, 6, i + 1)
    imshow(images[i])
    plt.title(f"Pred: {classes[preds[i]]}")
plt.tight_layout()
plt.show()

## **Challenge!**

#### **Your Task: Improve the Model Accuracy**

You have trained a basic neural network — now make it better!

#### **Things You Can Try:**

* **Change Model Architecture:**
  Add more layers, try different activation functions, or switch to CNNs.

* **Apply Data Augmentation:**
  Use `transforms` (e.g., `RandomHorizontalFlip`, `RandomCrop`) to improve generalization.

* **Tune Hyperparameters:**
  Adjust:

  * Learning rate
  * Batch size
  * Optimizer (try SGD, Adam, etc.)

* **Add Regularization Techniques:**

  * **Dropout**
  * **Batch Normalization**

These lines of code are for the facilitators explanation
```python
#Challenge: Improve Your FashionMNIST Model
#In this notebook, you will experiment with different ways to improve model performance on the FashionMNIST dataset.
# Imports
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt


## 1. Data Loading and Augmentation
# TODO: Add your own data augmentation transforms here
transform = transforms.Compose([
    transforms.ToTensor(),
    # transforms.RandomHorizontalFlip(),
    # transforms.RandomRotation(10),
])

train_dataset = datasets.FashionMNIST(root='./data', train=True, download=True, transform=transform)
test_dataset = datasets.FashionMNIST(root='./data', train=False, download=True, transform=transforms.ToTensor())

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)

## 2. Define Your Custom Model
class CustomNet(nn.Module):
    def __init__(self):
        super(CustomNet, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(28*28, 128),
            nn.ReLU(),
            # TODO: Add dropout, batch norm, more layers
            nn.Linear(128, 10)
        )

    def forward(self, x):
        x = x.view(-1, 28*28)
        return self.model(x)

model = CustomNet()


## 3. Loss and Optimizer (Tune Me!)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)  #TODO: Try different optimizers (SGD) or learning rates


## 4. Training Loop
epochs = 5
for epoch in range(epochs):
    running_loss = 0.0
    for images, labels in train_loader:
        outputs = model(images)
        loss = criterion(outputs, labels)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
    print(f"Epoch [{epoch+1}/{epochs}], Loss: {running_loss/len(train_loader):.4f}")


## 5. Evaluate Your Model
correct = 0
total = 0
with torch.no_grad():
    for images, labels in test_loader:
        outputs = model(images)
        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f'Test Accuracy: {100 * correct / total:.2f}%')

# Final challenge prompt
## Challenge Tips
#Add hidden layers
#Use dropout or batch norm
#Tune learning rate, optimizer
#Try data augmentation
#Run, modify, experiment!

```

The next lines of code with the blanks are for the challenge to be handled by the participants during the session.

The blanks are for the particpants to try out different options and then it would be discussed which options work best and why?

We are going to recall essential information taught during the session to replicate the lines of code we have learnt during the tutorial session and check the levels of understanding of participants.


---
Fill in the blanks

In [None]:
#Challenge: Improve Your FashionMNIST Model
#In this notebook, you will experiment with different ways to improve model performance on the FashionMNIST dataset.

# Imports
import torch
import ... as nn
import ... as optim
from ... import datasets, transforms
from torch.utils.data import DataLoader
import ... as plt


## 1. Data Loading and Augmentation
# TODO: Add your own data augmentation transforms here
transform = transforms.Compose([
    transforms.ToTensor(),
    # transforms.RandomHorizontalFlip(),
    # transforms.RandomRotation(10),
])

train_dataset = datasets.FashionMNIST(root='./data', train=True, download=True, transform=transform)
test_dataset = datasets.FashionMNIST(root='./data', train=False, download=True, transform=transforms.ToTensor())

train_loader = DataLoader(train_dataset, batch_size= ..., shuffle=True)
test_loader = DataLoader(test_dataset, batch_size= ..., shuffle=False)

## 2. Define Your Custom Model
class CustomNet(nn.Module):
    def __init__(self):
        super(CustomNet, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(28*28, 128),
            nn.ReLU(),
            # TODO: Add dropout, batch norm, more layers
            nn.Linear(128, 10)
        )

    def forward(self, x):
        x = x.view(-1, 28*28)
        return self.model(x)

model = CustomNet()


## 3. Loss and Optimizer (Tune Me!)
criterion = ...
optimizer = ...(model.parameters(), lr=...)


## 4. Training Loop
epochs = ...
for epoch in range(epochs):
    running_loss = 0.0
    for images, labels in train_loader:
        outputs = model(...)
        loss = criterion(outputs, ...)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
    print(f"Epoch [{epoch+1}/{epochs}], Loss: {running_loss/len(train_loader):.4f}")


## 5. Evaluate Your Model
correct = 0
total = 0
with torch.no_grad():
    for images, labels in test_loader:
        outputs = model(...)
        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == ...).sum().item()

print(f'Test Accuracy: {100 * correct / total:.2f}%')

## **Best Practices in PyTorch**

#### **1. Use `Dataset` + `DataLoader`**

* Wrap your data using `torch.utils.data.Dataset`
* Efficiently load it using `DataLoader` (supports batching, shuffling, parallelism)
* Enables easy scaling to large datasets

```python
from torch.utils.data import Dataset, DataLoader
```


#### **2. Track Loss and Accuracy**

* Log training/validation **loss** and **accuracy** at every epoch
* Helps diagnose underfitting/overfitting and guide improvements
* Use tools like:

  * Simple print statements
  * Matplotlib plots
  * TensorBoard (`torch.utils.tensorboard`)


#### **3. Save & Load Models**

* Always save your trained model checkpoints:

```python
torch.save(model.state_dict(), 'model.pth')
```

* Load them later for inference or resuming training:

```python
model.load_state_dict(torch.load('model.pth'))
model.eval()
```



###**Tip:** Structure your training scripts for reproducibility and clarity.

## **Resources for Further Learning**

#### **1. PyTorch Official Tutorials**

* Step-by-step guides for beginners to experts
  
  [pytorch.org/tutorials](https://pytorch.org/tutorials/)


#### **2. Fast.ai Deep Learning Course**

* Practical, top-down learning with PyTorch
  
  [course.fast.ai](https://course.fast.ai)

#### **3. Papers with Code**

* Find state-of-the-art research with implementations
  
  [paperswithcode.com](https://paperswithcode.com)


#### **4. Hugging Face Model Hub**

* Access thousands of pretrained models for NLP, vision, and audio
  
  [huggingface.co/models](https://huggingface.co/models)
