# Neural Network on GPU

> Author : Badr TAJINI - Machine Learning 2 & Deep learning - ECE 2025-2026

---


From Kaggle: 
"MNIST ("Modified National Institute of Standards and Technology") is the de facto “hello world” dataset of computer vision. Since its release in 1999, this classic dataset of handwritten images has served as the basis for benchmarking classification algorithms. As new machine learning techniques emerge, MNIST remains a reliable resource for researchers and learners alike."

[Read more.](https://www.kaggle.com/c/digit-recognizer)


<a title="By Josef Steppan [CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0)], from Wikimedia Commons" href="https://commons.wikimedia.org/wiki/File:MnistExamples.png"><img width="512" alt="MnistExamples" src="https://upload.wikimedia.org/wikipedia/commons/2/27/MnistExamples.png"/></a>

In [56]:
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets 


## STEP 1: LOADING DATASET

In [57]:
train_dataset = dsets.MNIST(root='./data', 
                            train=True, 
                            transform=transforms.ToTensor(),
                            download=True)

test_dataset = dsets.MNIST(root='./data', 
                           train=False, 
                           transform=transforms.ToTensor())

## STEP 2: MAKING DATASET ITERABLE

In [58]:
batch_size = 100
n_iters = 3000 
num_epochs = n_iters / (len(train_dataset) / batch_size)
num_epochs = int(num_epochs)


print("Number of epochs: " + str(num_epochs))

train_loader = torch.utils.data.DataLoader(dataset=train_dataset, 
                                           batch_size=batch_size, 
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset, 
                                          batch_size=batch_size, 
                                          shuffle=False)


## STEP 3: CREATE MODEL CLASS

In [59]:
class CNNModel(nn.Module):
    def __init__(self):
        super(CNNModel, self).__init__()

        # Convolution 1
        self.cnn1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=5, stride=1, padding=0)
        self.relu1 = nn.ReLU()

        # Max pool 1
        self.maxpool1 = nn.MaxPool2d(kernel_size=2)

        # Convolution 2
        self.cnn2 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=5, stride=1, padding=0)
        self.relu2 = nn.ReLU()

        # Max pool 2
        self.maxpool2 = nn.MaxPool2d(kernel_size=2)

        # Fully connected 1 (readout)
        self.fc1 = nn.Linear(32 * 4 * 4, 10) 

    def forward(self, x):
        # Convolution 1
        out = self.cnn1(x)
        out = self.relu1(out)

        # Max pool 1
        out = self.maxpool1(out)

        # Convolution 2 
        out = self.cnn2(out)
        out = self.relu2(out)

        # Max pool 2 
        out = self.maxpool2(out)

        # Resize
        # Original size: (100, 32, 7, 7)
        # out.size(0): 100
        # New out size: (100, 32*7*7)
        out = out.view(out.size(0), -1)

        # Linear function (readout)
        out = self.fc1(out)

        return out



## STEP 4: INSTANTIATE MODEL CLASS

In [60]:
# Number of CUDA devices
# The first device is always named "cuda:0"
# The second one is "cuda:1", etc.
print(torch.cuda.device_count())

In [61]:
model = CNNModel()

####################################################################
#  USE GPU FOR MODEL                                               #
#  The model must be put on the GPU before declaring the optimizer #
####################################################################

#device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

device = "cpu"

print(device)
model.to(device)





In [62]:
print(model)

## STEP 5: INSTANTIATE LOSS CLASS

In [63]:
criterion = nn.CrossEntropyLoss()



## STEP 6: INSTANTIATE OPTIMIZER CLASS

In [64]:
learning_rate = 0.01

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)



In [65]:
print(model.parameters)

Function to compute the accuracy on the test set

### Question: modify the following code to exploit the GPU instead of the CPU

In [66]:
def test_model(test_loader, model, device):
  # Calculate Accuracy         
  correct = 0
  total = 0
  # Iterate through test dataset
  for images, labels in test_loader:
    #######################
    #  USE GPU FOR MODEL  #
    #######################
    images = images.to(device)
    labels = labels.to(device)

    # Forward pass only to get logits/output
    outputs = model(images)

    # Get predictions from the maximum value
    _, predicted = torch.max(outputs, 1)

    # Total number of labels
    total += labels.size(0)

    # Total correct predictions
    correct += (predicted == labels).sum()

  accuracy = 100 * float(correct) / float(total)
    
  return accuracy

## STEP 7: TRAIN THE MODEL

### Question: modify the following code to exploit the GPU instead of the CPU

In [67]:
%%time
# Time execution of a Python statement or expression.
# wall time is the actual time taken from the start of a computer program to the end
print("device : ", device)
iter = 0
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):

        #######################
        #  USE GPU FOR MODEL  #
        #######################
        images = images.to(device)
        labels = labels.to(device)

        # Clear gradients w.r.t. parameters
        optimizer.zero_grad()

        # Forward pass to get output/logits
        outputs = model(images)

        # Calculate Loss: softmax --> cross entropy loss
        loss = criterion(outputs, labels)

        # Getting gradients w.r.t. parameters
        loss.backward()

        # Updating parameters
        optimizer.step()

        iter += 1

        if iter % 500 == 0:
            # Calculate Accuracy on the test set        
            accuracy = test_model(test_loader, model, device)

            # Print Loss
            print('Iteration: {}. Loss: {}. Accuracy on test set: {}'.format(iter, loss.item(), accuracy))

### Question: compare the wall time on GPU to the wall time on CPU

In [68]:
'''
First result - CPU device
device :  cpu
Iteration: 500. Loss: 0.4888801872730255. Accuracy on test set: 87.5
Iteration: 1000. Loss: 0.37688079476356506. Accuracy on test set: 91.95
CPU times: user 34.6 s, sys: 1.12 s, total: 35.7 s
Wall time: 35.7 s

First result - GPU device
device :  cuda:0
Iteration: 500. Loss: 0.31107470393180847. Accuracy on test set: 88.87
Iteration: 1000. Loss: 0.2122945636510849. Accuracy on test set: 93.0
CPU times: user 12.9 s, sys: 109 ms, total: 13 s
Wall time: 13 s

Second result - GPU device
device :  cuda:0
Iteration: 500. Loss: 0.39977511763572693. Accuracy on test set: 89.4
Iteration: 1000. Loss: 0.24916988611221313. Accuracy on test set: 92.89
Iteration: 1500. Loss: 0.23252594470977783. Accuracy on test set: 93.8
Iteration: 2000. Loss: 0.059734128415584564. Accuracy on test set: 95.59
Iteration: 2500. Loss: 0.1804923117160797. Accuracy on test set: 96.07
Iteration: 3000. Loss: 0.07106972485780716. Accuracy on test set: 96.5
CPU times: user 32.3 s, sys: 268 ms, total: 32.5 s
Wall time: 32.6 s

Second result - CPU device 
device :  cpu
Iteration: 500. Loss: 0.5236935615539551. Accuracy on test set: 88.25
Iteration: 1000. Loss: 0.21130454540252686. Accuracy on test set: 92.09
Iteration: 1500. Loss: 0.22272621095180511. Accuracy on test set: 94.18
Iteration: 2000. Loss: 0.13368134200572968. Accuracy on test set: 95.29
Iteration: 2500. Loss: 0.17730632424354553. Accuracy on test set: 95.83
Iteration: 3000. Loss: 0.16622531414031982. Accuracy on test set: 96.33
CPU times: user 1min 26s, sys: 1.15 s, total: 1min 27s
Wall time: 1min 28s

'''

### Question: increase the number of epoch until 5 to see if we can expect a better average accuracy

In [69]:
'''
n_iters = 1200 
CPU => Accuracy on test set: 91.95
GPU => Accuracy on test set: 93.0

n_iters = 3000
CPU => Accuracy on test set: 96.33
GPU => Accuracy on test set: 96.5
'''