In [1]:
import struct
import numpy as np
import torch
from torch.utils.data import Dataset, DataLoader
import os


### **Step 1: We’ll write functions to load the binary data from the .idx files.**

In [2]:
def read_images(path):
    with open(path, 'rb') as f:
        magic, num, rows, cols = struct.unpack(">IIII", f.read(16))
        return np.frombuffer(f.read(), dtype=np.uint8).reshape(num, 1, rows, cols)

def read_labels(path):
    with open(path, 'rb') as f:
        magic, num = struct.unpack(">II", f.read(8))
        return np.frombuffer(f.read(), dtype=np.uint8)

### **Step 2: Create a Custom PyTorch Dataset**

In [3]:
class MNISTDataset(Dataset):
    def __init__(self, image_path, label_path):
        self.images = torch.tensor(read_images(image_path), dtype=torch.float32) / 255.0
        self.labels = torch.tensor(read_labels(label_path), dtype=torch.long)

    def __len__(self):
        return len(self.labels)

    def __getitem__(self, idx):
        return self.images[idx], self.labels[idx]


### **Step 3: Load the Dataset**

In [4]:
base_path = r"D:\Code for tutorials\Machine Learning\datasets\MNIST Dataset"

train_dataset = MNISTDataset(
    image_path=os.path.join(base_path, "train-images.idx3-ubyte"),
    label_path=os.path.join(base_path, "train-labels.idx1-ubyte")
)

test_dataset = MNISTDataset(
    image_path=os.path.join(base_path, "t10k-images.idx3-ubyte"),
    label_path=os.path.join(base_path, "t10k-labels.idx1-ubyte")
)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)


In [5]:
train_loader.dataset.labels

tensor([5, 0, 4,  ..., 5, 6, 8])

In [6]:
train_loader.dataset.images

tensor([[[[0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          ...,
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.]]],


        [[[0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          ...,
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.]]],


        [[[0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          ...,
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.]]],


        ...,


        [[[0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          ...,
          [0., 0., 0.,  ..., 0.

Absolutely! Let’s **go through the data loading part line by line, word by word**, and explain what **each part means and does** in detail.

---

## 🔍 Part: `read_images` and `read_labels` Functions

```python
import struct
import numpy as np
```

* `import struct`:
  Imports Python's built-in `struct` module. This module helps **decode binary data** (like `.idx3-ubyte`) into Python values.

* `import numpy as np`:
  Imports the NumPy library with alias `np`. It's used for fast array and matrix operations.

---

### 📦 `read_images` Function

```python
def read_images(path):
```

* `def`: Starts a **function definition**.
* `read_images`: The name of the function. This function **reads image data**.
* `path`: A parameter that will hold the file path (like `train-images.idx3-ubyte`).

---

```python
    with open(path, 'rb') as f:
```

* `with`: Opens a context block for handling the file. It ensures the file is closed automatically.
* `open(path, 'rb')`: Opens the file located at `path` in **binary read mode** (`rb` = read binary).
* `as f`: Assigns the opened file to variable `f`.

---

```python
        magic, num, rows, cols = struct.unpack(">IIII", f.read(16))
```

* `struct.unpack(">IIII", ...)`:
  This **unpacks** the first 16 bytes of the file into 4 integers:

  * `>IIII`:

    * `>` = big-endian byte order
    * `I` = unsigned integer (4 bytes) — four times for 4 integers
  * `f.read(16)`: Reads the first 16 bytes of the file
* `magic`: A number identifying the file type (should be `2051` for images)
* `num`: Number of images
* `rows`: Number of rows per image (should be `28`)
* `cols`: Number of columns per image (should be `28`)

✅ At this point, we know how many images there are and their shape.

---

```python
        return np.frombuffer(f.read(), dtype=np.uint8).reshape(num, 1, rows, cols)
```

* `f.read()`: Reads the rest of the file, which contains all image pixels
* `np.frombuffer(...)`: Converts the binary data into a NumPy array
* `dtype=np.uint8`: Each pixel is an unsigned 8-bit integer (0 to 255)
* `.reshape(num, 1, rows, cols)`:

  * `num`: number of images
  * `1`: 1 channel (grayscale)
  * `rows` & `cols`: 28x28 pixels
  * Shape becomes `(N, 1, 28, 28)`, which is what PyTorch expects for image input.

---

### 📦 `read_labels` Function

```python
def read_labels(path):
```

Defines another function to load the **labels** (digits 0–9) from `.idx1-ubyte` files.

---

```python
    with open(path, 'rb') as f:
```

Same as before: Opens the label file in binary mode.

---

```python
        magic, num = struct.unpack(">II", f.read(8))
```

* Reads the first 8 bytes:

  * `magic`: file identifier (should be `2049` for label files)
  * `num`: number of labels

---

```python
        return np.frombuffer(f.read(), dtype=np.uint8)
```

* Reads the rest of the file into a NumPy array of 8-bit unsigned integers (each representing a digit between 0–9)

---

## 🧱 MNISTDataset Class

```python
class MNISTDataset(Dataset):
```

* Defines a **custom PyTorch dataset**.
* `Dataset` is a base class from `torch.utils.data` that we’re extending.

---

```python
    def __init__(self, image_path, label_path):
```

* `__init__`: The constructor runs when you create an instance of this class.
* `image_path` & `label_path`: Parameters for the file paths.

---

```python
        self.images = torch.tensor(read_images(image_path), dtype=torch.float32) / 255.0
```

* `read_images(image_path)`: Loads image data as a NumPy array.
* `torch.tensor(..., dtype=torch.float32)`: Converts it to a PyTorch tensor of type `float32`
* `/ 255.0`: Normalizes pixel values from `[0–255]` to `[0.0–1.0]`

---

```python
        self.labels = torch.tensor(read_labels(label_path), dtype=torch.long)
```

* `read_labels(...)`: Loads label data
* Converts to a PyTorch tensor of type `long` (required by `CrossEntropyLoss`)

---

```python
    def __len__(self):
        return len(self.labels)
```

* Returns the number of samples (i.e., length of the dataset)

---

```python
    def __getitem__(self, idx):
        return self.images[idx], self.labels[idx]
```

* This method allows accessing a single `(image, label)` pair by index
* Required for `DataLoader` to work

---

## 📦 Dataloader Code

```python
train_dataset = MNISTDataset(
    image_path=os.path.join(base_path, "train-images.idx3-ubyte"),
    label_path=os.path.join(base_path, "train-labels.idx1-ubyte")
)
```

* Creates an object of your custom dataset class for **training data**
* Loads both images and labels

---

```python
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
```

* Wraps your dataset in a `DataLoader`:

  * `batch_size=64`: Each batch has 64 samples
  * `shuffle=True`: Shuffles the dataset every epoch for better training

---

Let me know if you want the **same word-by-word breakdown** for training, evaluation, or model definition too!


### **Step 5: Define a Simple Neural Network**

In [7]:
import torch.nn as nn
import torch.nn.functional as F

class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(28*28, 128)
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, 10)

    def forward(self, x):
        x = x.view(-1, 28*28)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

model = SimpleNN()
model

SimpleNN(
  (fc1): Linear(in_features=784, out_features=128, bias=True)
  (fc2): Linear(in_features=128, out_features=64, bias=True)
  (fc3): Linear(in_features=64, out_features=10, bias=True)
)

In [8]:
model.state_dict()

OrderedDict([('fc1.weight',
              tensor([[ 0.0207, -0.0087,  0.0202,  ...,  0.0051, -0.0087,  0.0154],
                      [-0.0242, -0.0257,  0.0267,  ...,  0.0060, -0.0357, -0.0235],
                      [-0.0100, -0.0350,  0.0344,  ..., -0.0110,  0.0331, -0.0077],
                      ...,
                      [ 0.0330, -0.0262,  0.0330,  ..., -0.0071,  0.0195,  0.0123],
                      [ 0.0010,  0.0090, -0.0156,  ..., -0.0271, -0.0320,  0.0101],
                      [-0.0173,  0.0200,  0.0264,  ...,  0.0134, -0.0336,  0.0217]])),
             ('fc1.bias',
              tensor([ 2.1869e-02, -1.4737e-02,  3.0590e-02, -1.6426e-02,  1.8911e-02,
                       5.7711e-03,  1.8876e-02, -6.3875e-03, -1.5170e-02, -3.2584e-02,
                      -4.5517e-03, -2.5429e-02,  3.0657e-02,  3.2786e-02,  3.2774e-02,
                      -1.2552e-02, -1.9630e-02,  2.1215e-02,  1.4565e-02, -1.3482e-02,
                       7.1670e-03, -3.4134e-02, -1.8708e-03,  2.

In [9]:
model.parameters

<bound method Module.parameters of SimpleNN(
  (fc1): Linear(in_features=784, out_features=128, bias=True)
  (fc2): Linear(in_features=128, out_features=64, bias=True)
  (fc3): Linear(in_features=64, out_features=10, bias=True)
)>

In [10]:
import torch
print(torch.__version__)


2.6.0+cpu


### **Step 6: Loss and Optimizer**

In [11]:
import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(params=model.parameters(),
                            lr=0.01 #learning rate = possibly the most important hyperparameter
                            )
print(optimizer)


SGD (
Parameter Group 0
    dampening: 0
    differentiable: False
    foreach: None
    fused: None
    lr: 0.01
    maximize: False
    momentum: 0
    nesterov: False
    weight_decay: 0
)


Perfect! Let’s now dive into **Step 5 (Model definition)** and **Step 6 (Loss & Optimizer setup)** with a **word-by-word, line-by-line explanation** just like before.

---

## 🧠 Step 5: Define a Simple Neural Network

```python
import torch.nn as nn
import torch.nn.functional as F
```

* `import torch.nn as nn`:
  Imports PyTorch’s **neural network module** as `nn`.
  This includes building blocks like `Linear`, `Conv2d`, `ReLU`, etc.

* `import torch.nn.functional as F`:
  Imports PyTorch’s **functional API** under alias `F`.
  This gives access to activation functions like `F.relu()`, `F.softmax()`, etc. — used in forward passes.

---

### 🧱 Define the Model Class

```python
class SimpleNN(nn.Module):
```

* `class SimpleNN`:
  Defines a **new neural network class** called `SimpleNN`.

* `(nn.Module)`:
  Inherits from PyTorch’s base class for models: `nn.Module`.
  This gives our model structure, parameter tracking, etc.

---

```python
    def __init__(self):
        super(SimpleNN, self).__init__()
```

* `def __init__(self):`
  Constructor method: This is run when we create a new model object.

* `super(SimpleNN, self).__init__()`:
  Calls the constructor of the **parent class** (`nn.Module`) to set up model infrastructure like `parameters`, `state_dict`, etc.

---

### 🧩 Define Layers

```python
        self.fc1 = nn.Linear(28*28, 128)
```

* `self.fc1`: First fully connected layer.
* `nn.Linear(28*28, 128)`:

  * Input size = 784 (i.e., 28 × 28 flattened image)
  * Output size = 128 neurons

---

```python
        self.fc2 = nn.Linear(128, 64)
```

* Second fully connected layer
* Takes input of 128 (from `fc1`), outputs 64 neurons

---

```python
        self.fc3 = nn.Linear(64, 10)
```

* Third and final layer
* Outputs 10 values (one for each digit class 0–9)

---

### 🔁 Forward Propagation Function

```python
    def forward(self, x):
```

* `def forward(self, x)`:
  Defines how input `x` flows through the layers — i.e., **the forward pass**.

---

```python
        x = x.view(-1, 28*28)
```

* `x.view(-1, 28*28)`:

  * Reshapes the 2D image (`[B, 1, 28, 28]`) into a flat 1D vector (`[B, 784]`)
  * `-1` means "infer batch size"

---

```python
        x = F.relu(self.fc1(x))
```

* Passes `x` through `fc1`, then applies **ReLU activation**
* `F.relu(...)`: Applies Rectified Linear Unit: `max(0, x)`

---

```python
        x = F.relu(self.fc2(x))
```

* Same: pass through second fully connected layer and apply ReLU

---

```python
        x = self.fc3(x)
```

* Pass through the third layer (no activation here because it will go to CrossEntropyLoss)

---

```python
        return x
```

* Return the **raw output logits** — not probabilities.
  These will be converted to probabilities internally by `CrossEntropyLoss`.

---

### ✅ Create Model Instance

```python
model = SimpleNN()
```

* Instantiates the `SimpleNN` class
* `model` now holds the neural network and its parameters

---

## ⚙️ Step 6: Define Loss and Optimizer

```python
import torch.optim as optim
```

* Imports the **optimizer module** in PyTorch.
* This provides optimizers like SGD, Adam, RMSProp, etc.

---

### 📉 Define Loss Function

```python
criterion = nn.CrossEntropyLoss()
```

* `criterion`: A variable holding the **loss function**.
* `nn.CrossEntropyLoss()`:

  * Combines **softmax + negative log likelihood**
  * Used for **multi-class classification**
  * Expects **raw logits as input** and **class index as target**

---

### 🧮 Define Optimizer

```python
optimizer = optim.SGD(model.parameters(), lr=0.01)
```

* `optimizer`: A variable holding the **optimizer object**.
* `optim.SGD(...)`: Stochastic Gradient Descent
* `model.parameters()`: Gives the optimizer all the model’s weights to update during training
* `lr=0.01`: Learning rate — controls **how fast** the model learns (step size in weight updates)

---

### Summary Table

| Component           | Meaning                     |
| ------------------- | --------------------------- |
| `fc1`, `fc2`, `fc3` | Fully connected layers      |
| `F.relu`            | Activation function         |
| `CrossEntropyLoss`  | Loss for classification     |
| `SGD`               | Optimizer to adjust weights |
| `x.view(...)`       | Flatten the image           |

---

Would you like me to break down **Step 7 (training loop)** next in the same style? That part is the heart of model learning.


### **Step 7: Train the Model**

### Building a traing loop and a testing loop in PyTorch

A couple of things we need in a training loop:
0. Loop through data
1. forward pass (this involves data moving through our model's forward() functions) to make predictions on data - also called forward propagation
2. calculate the loss (compare forward pass predictions to ground truth labels)
3. optimizer zero grad
4. loss.backward() - move backwards through the network to calculate the gradients of each of the parameters of our model with respect to the loss (**backpropagation**)
5. optimizer.step() - use the optimizer to adjust our model's parameters to try and improve the loss (**gradient descent**)


In [12]:
epochs = 5

for epoch in range(epochs):
    running_loss = 0.0
    for images, labels in train_loader:
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    
    print(f"Epoch [{epoch+1}/{epochs}], Loss: {running_loss / len(train_loader):.4f}")


Epoch [1/5], Loss: 1.7709
Epoch [2/5], Loss: 0.5747
Epoch [3/5], Loss: 0.3954
Epoch [4/5], Loss: 0.3435
Epoch [5/5], Loss: 0.3153


### **Step 8: Evaluate Accuracy**

In [13]:
correct = 0
total = 0
model.eval()

with torch.no_grad():
    for images, labels in test_loader:
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f"Test Accuracy: {100 * correct / total:.2f}%")


Test Accuracy: 91.46%


Awesome! Let’s continue with the **step-by-step, word-by-word breakdown** of:

---

# ✅ Step 7: **Training the Model**

# ✅ Step 8: **Evaluating the Model**

---

## 🔁 Step 7: Training the Model

Here's the full training loop code first:

```python
epochs = 5
for epoch in range(epochs):
    running_loss = 0.0
    for images, labels in train_loader:
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    print(f"Epoch {epoch+1}, Loss: {running_loss/len(train_loader)}")
```

---

### 🔢 `epochs = 5`

* `epochs`: Number of **complete passes** through the training dataset.
* `= 5`: We’ll train the model for 5 full rounds.

---

### 🔁 `for epoch in range(epochs):`

* `for ... in range(epochs)`: A loop that will repeat `epochs` times (i.e., 5).
* `epoch`: This variable keeps track of which round (0 to 4) we’re in.

---

### `running_loss = 0.0`

* This variable stores the **total loss** for the current epoch.
* We’ll use this to calculate and print **average loss** at the end of the epoch.

---

### 🔂 `for images, labels in train_loader:`

* `train_loader`: The DataLoader that returns batches of data.
* `images`: A batch of input images (e.g., shape `[64, 1, 28, 28]`)
* `labels`: Corresponding ground-truth labels (e.g., `[3, 5, 1, 7, ...]`)
* This inner loop goes over **one batch at a time**.

---

### 🧼 `optimizer.zero_grad()`

* Clears the **previous gradients** stored in the optimizer.
* This is critical: PyTorch accumulates gradients by default, so we need to reset them each step.

---

### 🤖 `outputs = model(images)`

* Pass the batch of `images` through the model (calls `forward()` method)
* `outputs`: Model’s raw predictions (called **logits**), shape `[batch_size, 10]`

---

### 📉 `loss = criterion(outputs, labels)`

* Calculates the **loss** between the predicted `outputs` and the `labels`.
* `criterion` is `nn.CrossEntropyLoss()` which internally applies **softmax** + **negative log likelihood**.
* `loss`: A single scalar value showing **how wrong** the predictions were.

---

### 🔁 `loss.backward()`

* Computes the **gradients** of loss with respect to model weights.
* This is the **backpropagation** step.

---

### 🔧 `optimizer.step()`

* Updates the model parameters using the gradients computed.
* This is where the model **learns**.

---

### ➕ `running_loss += loss.item()`

* `loss.item()`: Converts PyTorch tensor to a Python float.
* Accumulates the total loss for the current epoch.

---

### 🖨️ `print(f"Epoch {epoch+1}, Loss: {running_loss/len(train_loader)}")`

* Prints average loss for the epoch.
* `epoch+1`: Epochs start from 0 in Python, so we add 1 to show human-readable index.
* `running_loss / len(train_loader)`: Computes average loss across all batches.

---

## 🧪 Step 8: Evaluating the Model

Here’s the code:

```python
correct = 0
total = 0
with torch.no_grad():
    for images, labels in test_loader:
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
print(f"Accuracy: {100 * correct / total:.2f}%")
```

---

### 🔢 `correct = 0` and `total = 0`

* `correct`: To count how many predictions were correct.
* `total`: To count total number of test images.

---

### ❌ `with torch.no_grad():`

* Disables gradient tracking.
* Saves memory and speeds up evaluation (no learning happens here).

---

### 🔁 `for images, labels in test_loader:`

* Loops through batches from the **test set** (not training set).

---

### 🤖 `outputs = model(images)`

* Pass images through the model to get predictions.

---

### 📊 `_, predicted = torch.max(outputs.data, 1)`

* `outputs.data`: Gets raw prediction scores.
* `torch.max(..., 1)`: Finds the **index of the maximum value** in each row (i.e., the predicted class).
* `_`: We ignore the actual max value.
* `predicted`: The model’s predicted class labels.

---

### ➕ `total += labels.size(0)`

* `labels.size(0)`: Number of samples in the batch.
* We add that to `total`.

---

### ✅ `correct += (predicted == labels).sum().item()`

* `(predicted == labels)`: Boolean tensor where `True` for correct predictions.
* `.sum()`: Total number of correct predictions in this batch.
* `.item()`: Converts it to a Python number.
* Accumulates correct predictions.

---

### 🖨️ `print(f"Accuracy: {100 * correct / total:.2f}%")`

* Calculates accuracy = `(correct / total) * 100`
* Formats to 2 decimal places using `:.2f`

---

## ✅ Summary

| Concept             | Meaning                           |
| ------------------- | --------------------------------- |
| `zero_grad()`       | Reset gradients before backprop   |
| `loss.backward()`   | Compute gradients                 |
| `optimizer.step()`  | Apply gradients to weights        |
| `torch.no_grad()`   | Turns off autograd for inference  |
| `torch.max(..., 1)` | Get predicted class               |
| `accuracy`          | Percentage of correct predictions |

---

Would you like to:

* **Plot predictions**, or
* **Save/load the model**, or
* **Visualize losses/accuracy**, or
* Try a **CNN** model next?

Let me know what you want to build next from here 👇
