1. You have been given a partially implemented code for a feed-forward neural network using PyTorch. Your task is to complete the missing parts of the code to make it functional.

In [1]:
import torch
device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps"
    if torch.backends.mps.is_available()
    else "cpu"
)
print(f"Using {device} device")

Using cuda device


In [2]:

import torch.nn as nn
import torch.optim as optim

# Define the neural network architecture
class NeuralNetwork(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(NeuralNetwork, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        x = self.relu(x)
        return x

# Define the hyperparameters
input_size = 10
hidden_size = 20
label_size = 5
learning_rate = 0.001
num_epochs = 1000

# Create the neural network object
model = NeuralNetwork(input_size, hidden_size, label_size)

# Define the loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=learning_rate)

# Generate some dummy data for training
train_data = torch.randn(100, input_size)
train_labels = torch.randint(label_size, (100,))

# Training loop
for epoch in range(num_epochs):
    # Forward pass
    # Complete this line to pass the training data through the model and obtain the predictions
    outputs = model.forward(train_data)

    # Compute the loss
    loss = criterion(outputs, train_labels)

    # Backward and optimize
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if (epoch+1) % 100 == 0:
        print(f'Epoch: {epoch+1}/{num_epochs}, Loss: {loss.item()}')

# Test the trained model
test_data = torch.randn(10, input_size)
with torch.no_grad():
    # Complete this line to pass the test data through the model and obtain the predictions
    test_outputs = model(test_data)

    # Print the predictions
    
    _, predicted = torch.max(test_outputs.data, 1)
    print("Predictions:", predicted)
    

Epoch: 100/1000, Loss: 1.60367751121521
Epoch: 200/1000, Loss: 1.6002953052520752
Epoch: 300/1000, Loss: 1.5970964431762695
Epoch: 400/1000, Loss: 1.594032883644104
Epoch: 500/1000, Loss: 1.5911498069763184
Epoch: 600/1000, Loss: 1.5880122184753418
Epoch: 700/1000, Loss: 1.5845439434051514
Epoch: 800/1000, Loss: 1.5812548398971558
Epoch: 900/1000, Loss: 1.5783064365386963
Epoch: 1000/1000, Loss: 1.575341820716858
Predictions: tensor([3, 0, 0, 2, 1, 4, 0, 4, 2, 1])


2. In this coding exercise, you need to implement the training of a deep MLP on the MNIST dataset using PyTorch and manually tune the hyperparameters. Follow the steps below to proceed:

* Load the MNIST dataset using torchvision.datasets.MNIST. The dataset contains handwritten digit images, and it can be easily accessed through PyTorch's torchvision module.

In [3]:
# Load the MNIST dataset
from torchvision.datasets import MNIST
from torchvision.transforms import ToTensor
from torch.utils.data import DataLoader
from tqdm import tqdm



train_dataset = MNIST('./data', train=True, download=True, transform=ToTensor())
test_dataset = MNIST('./data', train=False, transform=ToTensor())



* Define your deep MLP model. Specify the number of hidden layers, the number of neurons in each layer, and the activation function to be used. You can use the nn.Sequential container to stack the layers.

In [4]:
class MLP(nn.Module):
    device = device
    def __init__(self):
        super().__init__()
        self.layers = nn.Sequential(
            
            nn.Linear(28*28,20),
            nn.ReLU(),
            nn.Linear(20,10)
        )
    
    def forward(self, x):
        
        return self.layers(x)
        
        


* Set up the training loop and the hyperparameters. You can use the CrossEntropyLoss as the loss function and the Stochastic Gradient Descent (SGD) optimizer.

In [12]:
# Set hyperparameters
learning_rate = 0.01
epochs = 100
batch_size = 32
# Create data loaders
train_loader = DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=False)

# Create an instance of the model
model = MLP()
model
print(model)

# Define the loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=learning_rate)



MLP(
  (layers): Sequential(
    (0): Linear(in_features=784, out_features=20, bias=True)
    (1): ReLU()
    (2): Linear(in_features=20, out_features=10, bias=True)
  )
)


* Train the model by iterating over the training dataset for the specified number of epochs. Compute the loss, perform backpropagation, and update the model's parameters. 

In [13]:
# Training loop
for epoch in range(epochs):
    for images, labels in tqdm(train_loader):
        # Flatten the images
        images = images.view(-1, 784)
        images
        labels
        

        # Zero the gradients
        optimizer.zero_grad()


        # Forward pass
        outputs = model(images)
        loss = criterion(outputs,labels)
        
        # Backward pass and optimization
        loss.backward()
        optimizer.step()
    if (epoch+1) % 10 == 0:
        print(f'Epoch: {epoch+1}/{epochs}, Loss: {loss.item()}')
        

100%|██████████| 1875/1875 [00:05<00:00, 373.78it/s]
100%|██████████| 1875/1875 [00:05<00:00, 358.74it/s]
100%|██████████| 1875/1875 [00:04<00:00, 385.72it/s]
100%|██████████| 1875/1875 [00:04<00:00, 420.12it/s]
100%|██████████| 1875/1875 [00:04<00:00, 450.77it/s]
100%|██████████| 1875/1875 [00:04<00:00, 450.47it/s]
100%|██████████| 1875/1875 [00:04<00:00, 460.09it/s]
100%|██████████| 1875/1875 [00:04<00:00, 452.08it/s]
100%|██████████| 1875/1875 [00:04<00:00, 397.66it/s]
100%|██████████| 1875/1875 [00:04<00:00, 380.36it/s]


Epoch: 10/100, Loss: 0.26769542694091797


100%|██████████| 1875/1875 [00:04<00:00, 399.23it/s]
100%|██████████| 1875/1875 [00:04<00:00, 397.90it/s]
100%|██████████| 1875/1875 [00:04<00:00, 392.51it/s]
100%|██████████| 1875/1875 [00:04<00:00, 379.43it/s]
100%|██████████| 1875/1875 [00:04<00:00, 385.68it/s]
100%|██████████| 1875/1875 [00:04<00:00, 391.16it/s]
100%|██████████| 1875/1875 [00:05<00:00, 345.19it/s]
100%|██████████| 1875/1875 [00:04<00:00, 414.07it/s]
100%|██████████| 1875/1875 [00:04<00:00, 412.74it/s]
100%|██████████| 1875/1875 [00:04<00:00, 390.29it/s]


Epoch: 20/100, Loss: 0.05084151402115822


100%|██████████| 1875/1875 [00:04<00:00, 408.99it/s]
100%|██████████| 1875/1875 [00:04<00:00, 420.34it/s]
100%|██████████| 1875/1875 [00:04<00:00, 394.82it/s]
100%|██████████| 1875/1875 [00:04<00:00, 387.57it/s]
100%|██████████| 1875/1875 [00:04<00:00, 383.35it/s]
100%|██████████| 1875/1875 [00:04<00:00, 386.72it/s]
100%|██████████| 1875/1875 [00:04<00:00, 414.01it/s]
100%|██████████| 1875/1875 [00:05<00:00, 369.14it/s]
100%|██████████| 1875/1875 [00:04<00:00, 390.74it/s]
100%|██████████| 1875/1875 [00:04<00:00, 401.35it/s]


Epoch: 30/100, Loss: 0.04214125871658325


100%|██████████| 1875/1875 [00:04<00:00, 401.99it/s]
100%|██████████| 1875/1875 [00:04<00:00, 376.79it/s]
100%|██████████| 1875/1875 [00:04<00:00, 422.24it/s]
100%|██████████| 1875/1875 [00:04<00:00, 424.86it/s]
100%|██████████| 1875/1875 [00:04<00:00, 454.08it/s]
100%|██████████| 1875/1875 [00:04<00:00, 453.39it/s]
100%|██████████| 1875/1875 [00:04<00:00, 435.30it/s]
100%|██████████| 1875/1875 [00:04<00:00, 418.84it/s]
100%|██████████| 1875/1875 [00:04<00:00, 430.48it/s]
100%|██████████| 1875/1875 [00:04<00:00, 416.55it/s]


Epoch: 40/100, Loss: 0.11616217344999313


100%|██████████| 1875/1875 [00:04<00:00, 432.34it/s]
100%|██████████| 1875/1875 [00:04<00:00, 438.28it/s]
100%|██████████| 1875/1875 [00:04<00:00, 450.31it/s]
100%|██████████| 1875/1875 [00:04<00:00, 426.19it/s]
100%|██████████| 1875/1875 [00:04<00:00, 434.50it/s]
100%|██████████| 1875/1875 [00:04<00:00, 441.60it/s]
100%|██████████| 1875/1875 [00:04<00:00, 436.71it/s]
100%|██████████| 1875/1875 [00:04<00:00, 433.47it/s]
100%|██████████| 1875/1875 [00:04<00:00, 436.20it/s]
100%|██████████| 1875/1875 [00:04<00:00, 423.63it/s]


Epoch: 50/100, Loss: 0.021920910105109215


100%|██████████| 1875/1875 [00:04<00:00, 419.79it/s]
100%|██████████| 1875/1875 [00:04<00:00, 428.80it/s]
100%|██████████| 1875/1875 [00:04<00:00, 451.47it/s]
100%|██████████| 1875/1875 [00:04<00:00, 427.27it/s]
100%|██████████| 1875/1875 [00:04<00:00, 435.87it/s]
100%|██████████| 1875/1875 [00:04<00:00, 434.78it/s]
100%|██████████| 1875/1875 [00:04<00:00, 378.70it/s]
100%|██████████| 1875/1875 [00:05<00:00, 360.98it/s]
100%|██████████| 1875/1875 [00:04<00:00, 396.21it/s]
100%|██████████| 1875/1875 [00:04<00:00, 428.19it/s]


Epoch: 60/100, Loss: 0.08003880828619003


100%|██████████| 1875/1875 [00:04<00:00, 427.37it/s]
100%|██████████| 1875/1875 [00:04<00:00, 438.75it/s]
100%|██████████| 1875/1875 [00:04<00:00, 432.75it/s]
100%|██████████| 1875/1875 [00:04<00:00, 435.78it/s]
100%|██████████| 1875/1875 [00:04<00:00, 449.14it/s]
100%|██████████| 1875/1875 [00:04<00:00, 451.24it/s]
100%|██████████| 1875/1875 [00:04<00:00, 428.10it/s]
100%|██████████| 1875/1875 [00:04<00:00, 391.72it/s]
100%|██████████| 1875/1875 [00:04<00:00, 387.51it/s]
100%|██████████| 1875/1875 [00:04<00:00, 385.65it/s]


Epoch: 70/100, Loss: 0.1568288952112198


100%|██████████| 1875/1875 [00:04<00:00, 422.06it/s]
100%|██████████| 1875/1875 [00:04<00:00, 426.36it/s]
100%|██████████| 1875/1875 [00:04<00:00, 438.06it/s]
100%|██████████| 1875/1875 [00:04<00:00, 427.28it/s]
100%|██████████| 1875/1875 [00:04<00:00, 441.97it/s]
100%|██████████| 1875/1875 [00:04<00:00, 440.09it/s]
100%|██████████| 1875/1875 [00:04<00:00, 445.43it/s]
100%|██████████| 1875/1875 [00:04<00:00, 429.51it/s]
100%|██████████| 1875/1875 [00:04<00:00, 435.24it/s]
100%|██████████| 1875/1875 [00:04<00:00, 421.72it/s]


Epoch: 80/100, Loss: 0.0832170620560646


100%|██████████| 1875/1875 [00:04<00:00, 429.88it/s]
100%|██████████| 1875/1875 [00:04<00:00, 432.24it/s]
100%|██████████| 1875/1875 [00:04<00:00, 436.72it/s]
100%|██████████| 1875/1875 [00:04<00:00, 438.40it/s]
100%|██████████| 1875/1875 [00:04<00:00, 434.16it/s]
100%|██████████| 1875/1875 [00:04<00:00, 435.30it/s]
100%|██████████| 1875/1875 [00:04<00:00, 431.89it/s]
100%|██████████| 1875/1875 [00:04<00:00, 443.02it/s]
100%|██████████| 1875/1875 [00:04<00:00, 431.25it/s]
100%|██████████| 1875/1875 [00:04<00:00, 447.30it/s]


Epoch: 90/100, Loss: 0.11947625875473022


100%|██████████| 1875/1875 [00:04<00:00, 429.09it/s]
100%|██████████| 1875/1875 [00:04<00:00, 434.36it/s]
100%|██████████| 1875/1875 [00:04<00:00, 427.92it/s]
100%|██████████| 1875/1875 [00:04<00:00, 425.06it/s]
100%|██████████| 1875/1875 [00:04<00:00, 423.59it/s]
100%|██████████| 1875/1875 [00:04<00:00, 422.35it/s]
100%|██████████| 1875/1875 [00:04<00:00, 432.49it/s]
100%|██████████| 1875/1875 [00:04<00:00, 437.79it/s]
100%|██████████| 1875/1875 [00:04<00:00, 428.92it/s]
100%|██████████| 1875/1875 [00:04<00:00, 424.79it/s]

Epoch: 100/100, Loss: 0.026048120111227036





* Evaluate the trained model on the test dataset and calculate the accuracy (Please take a moment to consider the code below!)

In [110]:
# Evaluation
correct = 0
total = 0
with torch.no_grad():
    for images, labels in test_loader:
        images = images.view(-1, 784)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

accuracy = 100 * correct / total
print("Accuracy: {:.2f}%".format(accuracy))

Accuracy: 96.08%


* Manually tune the hyperparameters, such as the learning rate, by experimenting with different values and observing the performance. You can also search for the optimal learning rate by using techniques like learning rate range test, where you gradually increase the learning rate and monitor the loss.

3. In this coding exercise, you'll have an opportunity to explore the behavior of a deep neural network trained on the CIFAR10 image dataset. Follow the steps below:

* a. Construct a deep neural network (DNN) using 20 hidden layers, each comprising 100 neurons. To facilitate this exploration, employ the Swish activation function for each layer. Utilize nn.ModuleList to manage the layers effectively.

* b. Load the CIFAR10 dataset for training your network. Utilize the appropriate function, such as torchvision.datasets.CIFAR10. The dataset consists of 60,000 color images, with dimensions of 32×32 pixels. It is divided into 50,000 training samples and 10,000 testing samples. With 10 classes in the dataset, ensure that your network has a softmax output layer comprising 10 neurons. When modifying the model's architecture or hyperparameters, conduct a search to identify an appropriate learning rate. Implement early stopping during training and employ the Nadam optimization algorithm.

* c. Experiment by adding batch normalization to your network. Compare the learning curves obtained with and without batch normalization. Analyze whether the model converges faster with batch normalization and observe any improvements in its performance. Additionally, assess the impact of batch normalization on training speed.

* d. As an additional experiment, substitute batch normalization with SELU (Scaled Exponential Linear Units). Make the necessary adjustments to ensure the network self-normalizes. This involves standardizing the input features, initializing the network's weights using LeCun normal initialization (nn.init.kaiming_normal_), and ensuring that the DNN consists solely of dense layers. Observe the effects of utilizing SELU activation and self-normalization on the network's training stability and performance.

In [None]:
# TODO