# Deep Neural Network for MNIST Classification

We'll apply all the knowledge from the lectures in this section to write a deep neural network. The problem we've chosen is referred to as the "Hello World" of deep learning because for most students it is the first deep learning algorithm they see.

The dataset is called MNIST and refers to handwritten digit recognition. You can find more about it on Yann LeCun's website (Director of AI Research, Facebook). He is one of the pioneers of what we've been talking about and of more complex approaches that are widely used today, such as covolutional neural networks (CNNs). 

The dataset provides 70,000 images (28x28 pixels) of handwritten digits (1 digit per image). 

The goal is to write an algorithm that detects which digit is written. Since there are only 10 digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9), this is a classification problem with 10 classes. 

Our goal would be to build a neural network with 2 hidden layers.

### Importing Packages

In [2]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, random_split
from torchvision import datasets, transforms

### Data

In [3]:
transform = transforms.Compose([
    transforms.ToTensor()  # Converts to float tensor and scales [0,255] → [0,1]
])

# Images become shape (1, 28, 28) (channels-first, PyTorch style)

mnist_train = datasets.MNIST(
    root='../../Data',
    train=True,
    download=True,
    transform=transform
)

mnist_test = datasets.MNIST(
    root='../../Data',
    train=False,
    download=True,
    transform=transform
)

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.91M/9.91M [00:35<00:00, 277kB/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28.9k/28.9k [00:00<00:00, 46.9kB/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.65M/1.65M [00:05<00:00, 287kB/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.54k/4.54k [00:00<00:00, 10.4MB/s]


In [7]:
mnist_train, mnist_test

(Dataset MNIST
     Number of datapoints: 60000
     Root location: ../../Data
     Split: Train
     StandardTransform
 Transform: Compose(
                ToTensor()
            ),
 Dataset MNIST
     Number of datapoints: 10000
     Root location: ../../Data
     Split: Test
     StandardTransform
 Transform: Compose(
                ToTensor()
            ))

In [12]:
# We need validation samples so we are spliting training data using the info we have from the dataset

# Same 90% training, 10% validation

# random_split handles shuffling internally

num_train = len(mnist_train)
num_val = int(0.1 * num_train)
num_train = num_train - num_val

train_dataset, val_dataset = random_split(
    mnist_train, [num_train, num_val]
)

In [13]:
# Shuffling and Batching the data

BATCH_SIZE = 100

train_loader = DataLoader(
    train_dataset, batch_size=BATCH_SIZE, shuffle=True
)

val_loader = DataLoader(
    val_dataset, batch_size=num_val, shuffle=False
)

test_loader = DataLoader(
    mnist_test, batch_size=len(mnist_test), shuffle=False
)

### Model

#### Outline of the model

In [14]:
class MNISTModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.fc1 = nn.Linear(28 * 28, 200)
        self.fc2 = nn.Linear(200, 200)
        self.fc3 = nn.Linear(200, 200)
        self.out = nn.Linear(200, 10)
        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.flatten(x)
        x = self.relu(self.fc1(x))
        x = self.relu(self.fc2(x))
        x = self.relu(self.fc3(x))
        x = self.out(x)  # No softmax here!
        return x


In [15]:
# Loading the model into GPU if its available

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model = MNISTModel().to(device)

#### Choosing Optimizer and Loss Function

This is a classification problem so we are choosing loss function as 'CrossEntropyLoss' as its best for classification 

`CrossEntropyLoss = Softmax + NLLLoss`

For Optimizer we are choosing the best ADAM optimizer 

In [16]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

#### Training

Training the model with the data with early stop using our validation set

In [21]:
NUM_EPOCHS = 10
patience = 2
best_val_loss = float('inf')
early_stop_counter = 0

for epoch in range(NUM_EPOCHS):
    # ---- Training ----
    model.train() # training mode
    train_loss = 0

    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)

        optimizer.zero_grad() # clear old gradients
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward() # backprop
        optimizer.step() # update weights

        train_loss += loss.item()

    # ---- Validation ----
    model.eval() # inference mode
    with torch.no_grad(): 
        val_images, val_labels = next(iter(val_loader))
        val_images, val_labels = val_images.to(device), val_labels.to(device)

        val_outputs = model(val_images)
        val_loss = criterion(val_outputs, val_labels)

    print(f"Epoch {epoch+1}: "
          f"Train Loss = {train_loss:.4f}, "
          f"Val Loss = {val_loss:.4f}")

    # ---- Early Stopping ----
    if val_loss < best_val_loss:
        best_val_loss = val_loss
        early_stop_counter = 0
    else:
        early_stop_counter += 1
        if early_stop_counter >= patience:
            print("Early stopping triggered")
            break


Epoch 1: Train Loss = 7.7496, Val Loss = 0.1130
Epoch 2: Train Loss = 3.5243, Val Loss = 0.1202
Epoch 3: Train Loss = 7.1631, Val Loss = 0.1118
Epoch 4: Train Loss = 6.1999, Val Loss = 0.1103
Epoch 5: Train Loss = 3.6760, Val Loss = 0.1136
Epoch 6: Train Loss = 5.8490, Val Loss = 0.1145
Early stopping triggered


#### Testing

In [22]:
model.eval()
with torch.no_grad():
    test_images, test_labels = next(iter(test_loader))
    test_images, test_labels = test_images.to(device), test_labels.to(device)

    outputs = model(test_images)
    test_loss = criterion(outputs, test_labels)

    _, predicted = torch.max(outputs, 1)
    accuracy = (predicted == test_labels).float().mean()

print(f"Test Loss: {test_loss:.2f}, Test Accuracy: {accuracy*100:.2f}%")


Test Loss: 0.11, Test Accuracy: 97.94%


In [23]:
print(model)

MNISTModel(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (fc1): Linear(in_features=784, out_features=200, bias=True)
  (fc2): Linear(in_features=200, out_features=200, bias=True)
  (fc3): Linear(in_features=200, out_features=200, bias=True)
  (out): Linear(in_features=200, out_features=10, bias=True)
  (relu): ReLU()
)
