1. You have been given a partially implemented code for a feed-forward neural network using PyTorch. Your task is to complete the missing parts of the code to make it functional.

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim

# Define the neural network architecture
class NeuralNetwork(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(NeuralNetwork, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        return x

# Define the hyperparameters
input_size = 10
hidden_size = 20
label_size = 5
learning_rate = 0.001
num_epochs = 1000

# Create the neural network object
model = NeuralNetwork(input_size, hidden_size, label_size)

# Define the loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=learning_rate)

# Generate some dummy data for training
train_data = torch.randn(100, input_size)
train_labels = torch.randint(label_size, (100,))

# Training loop
for epoch in range(num_epochs):
    # Forward pass
    # Complete this line to pass the training data through the model and obtain the predictions
    outputs = model(train_data)

    # Compute the loss
    loss = criterion(outputs, train_labels)

    # Backward and optimize
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if (epoch+1) % 100 == 0:
        print(f'Epoch: {epoch+1}/{num_epochs}, Loss: {loss.item()}')

# Test the trained model
test_data = torch.randn(10, input_size)
with torch.no_grad():
    # Complete this line to pass the test data through the model and obtain the predictions
    test_outputs = model(test_data)

    # Print the predictions
    _, predicted = torch.max(test_outputs.data, 1)
    print("Predictions:", predicted)

Epoch: 100/1000, Loss: 1.6340762376785278
Epoch: 200/1000, Loss: 1.627159833908081
Epoch: 300/1000, Loss: 1.6205631494522095
Epoch: 400/1000, Loss: 1.614249587059021
Epoch: 500/1000, Loss: 1.6082069873809814
Epoch: 600/1000, Loss: 1.6024212837219238
Epoch: 700/1000, Loss: 1.596855640411377
Epoch: 800/1000, Loss: 1.5915049314498901
Epoch: 900/1000, Loss: 1.5863336324691772
Epoch: 1000/1000, Loss: 1.5812948942184448
Predictions: tensor([0, 0, 2, 0, 1, 0, 2, 4, 4, 0])


2. In this coding exercise, you need to implement the training of a deep MLP on the MNIST dataset using PyTorch and manually tune the hyperparameters. Follow the steps below to proceed:

* Load the MNIST dataset using torchvision.datasets.MNIST. The dataset contains handwritten digit images, and it can be easily accessed through PyTorch's torchvision module.

In [2]:
from torchvision import datasets, transforms
from torch.utils.data import DataLoader


train_dataset = datasets.MNIST(root='./data', train=True, transform=transforms.ToTensor(), download=True)
test_dataset = datasets.MNIST(root='./data', train=False, transform=transforms.ToTensor(), download=True)


* Define your deep MLP model. Specify the number of hidden layers, the number of neurons in each layer, and the activation function to be used. You can use the nn.Sequential container to stack the layers.

In [3]:
class MLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.ly1 = nn.Linear(784,256)
        self.relu1 = nn.ReLU()
        self.ly2 = nn.Linear(256,128)
        self.relu2 = nn.ReLU()
        self.ly3 = nn.Linear(128,64)
        self.relu3 = nn.ReLU()
        self.ly4 = nn.Linear(64,10)
    
    def forward(self, x):
        x = self.flatten(x)
        x = self.ly1(x)
        x = self.relu1(x)
        x = self.ly2(x)
        x = self.relu2(x)
        x = self.ly3(x)
        x = self.relu3(x)
        x = self.ly4(x)
        return x

* Set up the training loop and the hyperparameters. You can use the CrossEntropyLoss as the loss function and the Stochastic Gradient Descent (SGD) optimizer.

In [4]:
# Set hyperparameters
learning_rate = 0.01
epochs = 20
batch_size = 64 

# Create data loaders
train_loader = DataLoader(train_dataset,batch_size=batch_size , shuffle=True)
test_loader = DataLoader(train_dataset,batch_size=batch_size , shuffle=True)

# Create an instance of the model
model = MLP()

# Define the loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=learning_rate)

* Train the model by iterating over the training dataset for the specified number of epochs. Compute the loss, perform backpropagation, and update the model's parameters. 

In [7]:
# Training loop
from tqdm import tqdm
for epoch in range(epochs):
    for images, labels in tqdm(train_loader):
        images = images.view(-1, 784)
        optimizer.zero_grad()
        output=model(images)
        loos =criterion(output,labels)
        loos.backward()
        optimizer.step()

100%|█████████████████████████████████████████| 938/938 [00:10<00:00, 91.77it/s]
100%|█████████████████████████████████████████| 938/938 [00:10<00:00, 93.20it/s]
100%|████████████████████████████████████████| 938/938 [00:08<00:00, 104.75it/s]
100%|████████████████████████████████████████| 938/938 [00:08<00:00, 105.44it/s]
100%|█████████████████████████████████████████| 938/938 [00:09<00:00, 96.18it/s]
100%|█████████████████████████████████████████| 938/938 [00:09<00:00, 96.39it/s]
100%|████████████████████████████████████████| 938/938 [00:09<00:00, 100.11it/s]
100%|████████████████████████████████████████| 938/938 [00:07<00:00, 124.03it/s]
100%|█████████████████████████████████████████| 938/938 [00:09<00:00, 93.86it/s]
100%|████████████████████████████████████████| 938/938 [00:08<00:00, 111.65it/s]
100%|████████████████████████████████████████| 938/938 [00:08<00:00, 110.40it/s]
100%|█████████████████████████████████████████| 938/938 [00:10<00:00, 92.25it/s]
100%|███████████████████████

* Evaluate the trained model on the test dataset and calculate the accuracy (Please take a moment to consider the code below!)

In [8]:
# Evaluation
correct = 0
total = 0
with torch.no_grad():
    for images, labels in test_loader:
        images = images.view(-1, 784)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

accuracy = 100 * correct / total
print("Accuracy: {:.2f}%".format(accuracy))

Accuracy: 97.19%


* Manually tune the hyperparameters, such as the learning rate, by experimenting with different values and observing the performance. You can also search for the optimal learning rate by using techniques like learning rate range test, where you gradually increase the learning rate and monitor the loss.

In [9]:
learning_rates = [1, 0.1, 0.01, 0.001, 0.0001]
best_accuracy = 0
best_lr = None

for lr in learning_rates:
    model = MLP()
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.SGD(model.parameters(), lr=lr)
    for epoch in range(epochs):
        for images, labels in tqdm(train_loader):
            images = images.view(-1, 784)
            optimizer.zero_grad()
            output = model(images)
            loss = criterion(output, labels)
            loss.backward()
            optimizer.step()
    correct = 0
    total = 0
    with torch.no_grad():
        for images, labels in test_loader:
            images = images.view(-1, 784)
            outputs = model(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    accuracy = 100 * correct / total
    print("Accuracy: {:.2f}%".format(accuracy))

    if accuracy > best_accuracy:  # Fix typo here
        best_accuracy = accuracy
        best_lr = lr

print(f"Best learning rate: {best_lr}")


100%|█████████████████████████████████████████| 938/938 [00:09<00:00, 94.37it/s]
100%|█████████████████████████████████████████| 938/938 [00:09<00:00, 96.27it/s]
100%|████████████████████████████████████████| 938/938 [00:07<00:00, 123.33it/s]
100%|█████████████████████████████████████████| 938/938 [00:10<00:00, 92.94it/s]
100%|█████████████████████████████████████████| 938/938 [00:10<00:00, 93.15it/s]
100%|█████████████████████████████████████████| 938/938 [00:10<00:00, 92.75it/s]
100%|████████████████████████████████████████| 938/938 [00:07<00:00, 124.16it/s]
100%|█████████████████████████████████████████| 938/938 [00:09<00:00, 95.53it/s]
100%|█████████████████████████████████████████| 938/938 [00:09<00:00, 98.69it/s]
100%|█████████████████████████████████████████| 938/938 [00:09<00:00, 95.87it/s]
100%|█████████████████████████████████████████| 938/938 [00:10<00:00, 93.30it/s]
100%|█████████████████████████████████████████| 938/938 [00:10<00:00, 93.66it/s]
100%|███████████████████████

Accuracy: 99.21%


100%|█████████████████████████████████████████| 938/938 [00:10<00:00, 93.28it/s]
100%|████████████████████████████████████████| 938/938 [00:08<00:00, 113.01it/s]
100%|████████████████████████████████████████| 938/938 [00:03<00:00, 239.81it/s]
100%|████████████████████████████████████████| 938/938 [00:07<00:00, 131.59it/s]
100%|█████████████████████████████████████████| 938/938 [00:09<00:00, 95.69it/s]
100%|█████████████████████████████████████████| 938/938 [00:09<00:00, 95.75it/s]
100%|█████████████████████████████████████████| 938/938 [00:10<00:00, 92.31it/s]
100%|█████████████████████████████████████████| 938/938 [00:09<00:00, 93.85it/s]
100%|█████████████████████████████████████████| 938/938 [00:10<00:00, 92.18it/s]
100%|████████████████████████████████████████| 938/938 [00:07<00:00, 118.61it/s]
100%|████████████████████████████████████████| 938/938 [00:09<00:00, 100.34it/s]
100%|█████████████████████████████████████████| 938/938 [00:10<00:00, 92.63it/s]
100%|███████████████████████

Accuracy: 100.00%


100%|████████████████████████████████████████| 938/938 [00:09<00:00, 100.72it/s]
100%|█████████████████████████████████████████| 938/938 [00:09<00:00, 94.76it/s]
100%|█████████████████████████████████████████| 938/938 [00:09<00:00, 94.09it/s]
100%|█████████████████████████████████████████| 938/938 [00:09<00:00, 94.76it/s]
100%|█████████████████████████████████████████| 938/938 [00:10<00:00, 93.34it/s]
100%|█████████████████████████████████████████| 938/938 [00:10<00:00, 93.19it/s]
100%|█████████████████████████████████████████| 938/938 [00:09<00:00, 94.66it/s]
100%|████████████████████████████████████████| 938/938 [00:07<00:00, 123.06it/s]
100%|████████████████████████████████████████| 938/938 [00:09<00:00, 103.27it/s]
100%|█████████████████████████████████████████| 938/938 [00:09<00:00, 94.33it/s]
100%|█████████████████████████████████████████| 938/938 [00:09<00:00, 94.27it/s]
100%|█████████████████████████████████████████| 938/938 [00:09<00:00, 95.85it/s]
100%|███████████████████████

Accuracy: 97.25%


100%|█████████████████████████████████████████| 938/938 [00:10<00:00, 93.67it/s]
100%|█████████████████████████████████████████| 938/938 [00:09<00:00, 96.36it/s]
100%|████████████████████████████████████████| 938/938 [00:08<00:00, 114.91it/s]
100%|█████████████████████████████████████████| 938/938 [00:10<00:00, 93.16it/s]
100%|████████████████████████████████████████| 938/938 [00:06<00:00, 135.69it/s]
100%|█████████████████████████████████████████| 938/938 [00:09<00:00, 97.05it/s]
100%|█████████████████████████████████████████| 938/938 [00:09<00:00, 96.49it/s]
100%|█████████████████████████████████████████| 938/938 [00:10<00:00, 91.27it/s]
100%|█████████████████████████████████████████| 938/938 [00:09<00:00, 98.14it/s]
100%|████████████████████████████████████████| 938/938 [00:09<00:00, 103.45it/s]
100%|█████████████████████████████████████████| 938/938 [00:10<00:00, 90.35it/s]
100%|█████████████████████████████████████████| 938/938 [00:09<00:00, 93.97it/s]
100%|███████████████████████

Accuracy: 85.10%


100%|████████████████████████████████████████| 938/938 [00:08<00:00, 115.85it/s]
100%|█████████████████████████████████████████| 938/938 [00:09<00:00, 95.34it/s]
100%|████████████████████████████████████████| 938/938 [00:09<00:00, 100.41it/s]
100%|████████████████████████████████████████| 938/938 [00:08<00:00, 104.91it/s]
100%|█████████████████████████████████████████| 938/938 [00:09<00:00, 94.64it/s]
100%|█████████████████████████████████████████| 938/938 [00:09<00:00, 96.59it/s]
100%|█████████████████████████████████████████| 938/938 [00:09<00:00, 98.81it/s]
100%|████████████████████████████████████████| 938/938 [00:04<00:00, 194.85it/s]
100%|█████████████████████████████████████████| 938/938 [00:09<00:00, 96.01it/s]
100%|█████████████████████████████████████████| 938/938 [00:10<00:00, 93.29it/s]
100%|█████████████████████████████████████████| 938/938 [00:10<00:00, 93.58it/s]
100%|█████████████████████████████████████████| 938/938 [00:09<00:00, 94.41it/s]
100%|███████████████████████

Accuracy: 13.97%
Best learning rate: 0.1


3. In this coding exercise, you'll have an opportunity to explore the behavior of a deep neural network trained on the CIFAR10 image dataset. Follow the steps below:

* a. Construct a deep neural network (DNN) using 20 hidden layers, each comprising 100 neurons. To facilitate this exploration, employ the Swish activation function for each layer. Utilize nn.ModuleList to manage the layers effectively.

* b. Load the CIFAR10 dataset for training your network. Utilize the appropriate function, such as torchvision.datasets.CIFAR10. The dataset consists of 60,000 color images, with dimensions of 32×32 pixels. It is divided into 50,000 training samples and 10,000 testing samples. With 10 classes in the dataset, ensure that your network has a softmax output layer comprising 10 neurons. When modifying the model's architecture or hyperparameters, conduct a search to identify an appropriate learning rate. Implement early stopping during training and employ the Nadam optimization algorithm.

* c. Experiment by adding batch normalization to your network. Compare the learning curves obtained with and without batch normalization. Analyze whether the model converges faster with batch normalization and observe any improvements in its performance. Additionally, assess the impact of batch normalization on training speed.

* d. As an additional experiment, substitute batch normalization with SELU (Scaled Exponential Linear Units). Make the necessary adjustments to ensure the network self-normalizes. This involves standardizing the input features, initializing the network's weights using LeCun normal initialization (nn.init.kaiming_normal_), and ensuring that the DNN consists solely of dense layers. Observe the effects of utilizing SELU activation and self-normalization on the network's training stability and performance.

In [None]:
# TODO