# Exercise 2: Shallow and Deep Neural Networks

# Q1: The MNIST dataset

Below is a code to train a neural network on the MNIST dataset. The accuracy achieved is ~77%. Try to change the training process in order to improve the network's performance, the best you can.

You can change:

- The loss function.
- The batch size (We will talk about this next week. Meanwhile, if needed, you can read about it a little).
- The learning rate.

Try to tune these parameters in order to achive the best accuracy.

**Don't change the network or the number of epochs**.

**Note:** If you change the loss function, you might need to change relavant parts of the code accordingly.

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
import numpy as np

# Load MNIST dataset from torch datasets
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)

# batch size changed from 64 to 128
train_loader = DataLoader(train_dataset, batch_size=128, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=128, shuffle=False)


In [9]:
# Define a network class
class SoftmaxNet(nn.Module):
    def __init__(self):
        super(SoftmaxNet, self).__init__()
        torch.manual_seed(0)
        self.fc1 = nn.Linear(784, 128)
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, 10)

    def forward(self, x):
        x = torch.flatten(x, 1)
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        # softmax removed â€“ return logits
        x = self.fc3(x)
        return x

In [10]:
# One-hot encoding
def one_hot_encode(labels):
    one_hot = torch.zeros(labels.shape[0], 10)
    one_hot[torch.arange(labels.shape[0]), labels] = 1
    return one_hot


In [11]:
def accuracy(net, test_loader):
    net.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for data, target in test_loader:
            outputs = net(data)
            _, predicted = torch.max(outputs.data, 1)
            total += target.size(0)
            correct += (predicted == target).sum().item()
    return 100 * correct / total

In [12]:
softmax_net = SoftmaxNet()

# loss function changed from MSELoss to CrossEntropyLoss
criterion = nn.CrossEntropyLoss()


def train(epochs):
    # learning rate changed from 0.01 to 0.05
    optimizer = optim.SGD(softmax_net.parameters(), lr=0.05)
    LOSS = []

    for epoch in range(epochs):
        for data, target in train_loader:
            optimizer.zero_grad()
            output = softmax_net(data)

            # loss no longer uses one-hot encoding
            loss = criterion(output, target)

            loss.backward()
            optimizer.step()
            LOSS.append(loss.item())

        print(f'Epoch {epoch+1}, Loss: {loss.item():.4f}')
    return LOSS


train(15)
accuracy(softmax_net, test_loader)


Epoch 1, Loss: 0.2979
Epoch 2, Loss: 0.2279
Epoch 3, Loss: 0.2626
Epoch 4, Loss: 0.2806
Epoch 5, Loss: 0.0939
Epoch 6, Loss: 0.0843
Epoch 7, Loss: 0.1554
Epoch 8, Loss: 0.2217
Epoch 9, Loss: 0.1184
Epoch 10, Loss: 0.0356
Epoch 11, Loss: 0.0709
Epoch 12, Loss: 0.0573
Epoch 13, Loss: 0.0461
Epoch 14, Loss: 0.0935
Epoch 15, Loss: 0.0690


97.4

# Q2: XOR functions

Train a neural network for the XOR dataset (see below). Experiment with different input sizes ($n=4,8,...$). We aim to work
with inputs having $n=16$ bits, or more. Our goals in this exercise are:

1. Train a neural network to achieve best accuracy on the XOR dataset. For this purpose choose the best networks by tuning, at least a subset, of the follwing parameters:

  - The input representation (e.g., 0/1 or 1/-1).
  - Number of layers.
  - Number of neurons in each layer.
  - Choice of activation function(s).
  - Batch size, for the mini-batch algorithm.
  - Number of epochs.
  - Learning rate.

Note: When you change one parameter you might need to re-tune a parameter you already tuned. For example, if you change the batch size, you might want to consider a different choice for the learning rate. Or, if you take a bigger network, you might want to use less epochs, etc.

2. Study, and demonstrate:
  
  - The effect of the number of layers on the number of neurons needed, and the accuracy attained.
  - The effect of the batch size in the minibatch gradient descent algorithm.
  - The effect of the batch size on the learning rate and other network parameters.
  - How the problem changes when the number of input bits grow.

3. On your final network, try to interpret the representation in the different hidden layers.

If needed you can apply any of the regularization methods we will learn in the next lesson.

Note: If you work with very large $n$, you will not be able to generate all possible 0/1 vectors, and you need to construct the dataset differently. Also, in this case it might be necessary to work with regularization.

## The dataset

In [13]:
import torch
from torch.utils.data import DataLoader
import torch.optim as optim
import torch.nn as nn
import itertools as it
import numpy as np
import pandas as pd

In [14]:
import itertools as it
class XORDataset(torch.utils.data.Dataset):

    @staticmethod
    def random_seubset(m, p=0.7):
        np.random.seed(0)
        return (np.random.uniform(0, 1, m) <= p).astype(int)

    # Generate all 0/1 vectors of length n
    @staticmethod
    def generate(n):
        return list(it.product(*[range(x + 1) for x in [1] * n]))

    def __init__(self, n=16, Train=True):

        all = self.generate(n)
        a = self.random_seubset(2**n)
        if Train:
            self.X = torch.tensor(np.array(all)[a == 1])
        else:
            self.X = torch.tensor(np.array(all)[a == 0])

        self.Y = self.X.sum(dim=1) % 2

    def __len__(self):
        return self.X.shape[0]

    def __getitem__(self, idx):
        return self.X[idx], self.Y[idx]

In [15]:

class XORNet(nn.Module):
    def __init__(self, n_in, hidden_layers, neurons_per_layer, activation=nn.ReLU):
        super(XORNet, self).__init__()
        layers = []
        in_dim = n_in

        for h_dim in neurons_per_layer:
            layers.append(nn.Linear(in_dim, h_dim))
            layers.append(activation())
            in_dim = h_dim

        layers.append(nn.Linear(in_dim, 1))
        layers.append(nn.Sigmoid()) # For Binary Cross Entropy
        self.model = nn.Sequential(*layers)

    def forward(self, x):
        return self.model(x.float())

In [16]:
# 1. configuration
n_bits = 16
batch_size = 64
lr = 0.1 # higher learning rate for sgd
epochs = 100

# 2. data preparation
# note: this step might take 5-10 seconds for n=16
print("generating dataset for 16 bits... please wait")
train_ds = XORDataset(n=n_bits, Train=True)
test_ds = XORDataset(n=n_bits, Train=False)

train_loader = DataLoader(train_ds, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_ds, batch_size=batch_size, shuffle=False)

# 3. model initialization
# using 2 hidden layers to solve the 16-bit complexity
# layers: 16 -> 64 -> 32 -> 1
model = XORNet(n_in=n_bits, hidden_layers=2, neurons_per_layer=[64, 32])

# simple binary cross entropy loss
criterion = nn.BCELoss()

# simplest optimizer (sgd) as requested
optimizer = optim.SGD(model.parameters(), lr=lr)

print("starting training on cpu...")

# 4. training loop
for epoch in range(epochs):
    model.train()
    epoch_loss = 0

    for inputs, labels in train_loader:
        # reset gradients
        optimizer.zero_grad()

        # forward pass
        outputs = model(inputs).squeeze()

        # calculate loss
        loss = criterion(outputs, labels.float())

        # backward pass
        loss.backward()

        # update weights
        optimizer.step()
        epoch_loss += loss.item()

    # 5. testing every 10 epochs
    if (epoch + 1) % 10 == 0:
        model.eval()
        correct = 0
        total = 0
        with torch.no_grad():
            for inputs, labels in test_loader:
                outputs = model(inputs).squeeze()
                predicted = (outputs > 0.5).float()
                total += labels.size(0)
                correct += (predicted == labels).sum().item()

        accuracy = 100 * correct / total
        print(f'epoch [{epoch+1}/{epochs}] | loss: {epoch_loss/len(train_loader):.4f} | accuracy: {accuracy:.2f}%')

generating dataset for 16 bits... please wait
starting training on cpu...
epoch [10/100] | loss: 0.6931 | accuracy: 49.03%
epoch [20/100] | loss: 0.6930 | accuracy: 48.32%
epoch [30/100] | loss: 0.4567 | accuracy: 75.47%
epoch [40/100] | loss: 0.3011 | accuracy: 93.38%
epoch [50/100] | loss: 0.0401 | accuracy: 99.10%
epoch [60/100] | loss: 0.0221 | accuracy: 98.46%
epoch [70/100] | loss: 0.0153 | accuracy: 99.42%
epoch [80/100] | loss: 0.0091 | accuracy: 99.71%
epoch [90/100] | loss: 0.0181 | accuracy: 99.46%
epoch [100/100] | loss: 0.0084 | accuracy: 99.75%
