# **Multi-Class Classification: Optimization of Deep Neural Networks (DNN)**

In this Jupyer Notebook, we will implement a multiple layer feed-forward neural network for a multi-class classification task.

***

## **1. Build a Deep Neural Network Model**

First, we implement the `NeuralNetModel1` class. The model takes a $28\times 28$ grey-scale image as input, and pass it through a deep neural network.

The network has 2 hidden layers and 1 output layers, whose sizes are: 512 -> 512 -> 10. That is, the number of output classes is 10. The activation function for each hidden layer is `ReLU`.

The input image is first passed through a `nn.Flatten()` layer so that a 2D tensor becomes 1D.

In [1]:
import torch
torch.manual_seed(0)
torch.use_deterministic_algorithms(True)

from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

In [2]:
class NeuralNetModel1(nn.Module):
    def __init__(self):
        super(NeuralNetModel1, self).__init__()
        self.flatten = nn.Flatten() # Use nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512), # Input size is 28*28
            nn.ReLU(), # ReLU
            nn.Linear(512, 512), # 512 -> 512
            nn.ReLU(), # ReLU
            nn.Linear(512, 10), # 512 -> 10
        )

    def forward(self, x):
        x = self.flatten(x) # Call self.flatten()
        logits = self.linear_relu_stack(x) # Call self.linear_relu_stack()
        return logits

In [3]:
# Test code
sample_input = torch.randn(5, 28, 28)
print('input size:', sample_input.size())

model1 = NeuralNetModel1()
with torch.no_grad():
    output = model1(sample_input)
print('output size:', output.size())

input size: torch.Size([5, 28, 28])
output size: torch.Size([5, 10])


***

## **2. Use Dataloader**

Next, we download the FashionMNIST dataset provided by PyTorch to the folder "data", which takes some time for the first time execution.
We will use the `DataLoader` module to wrap the loaded training and test data, and then specify the `batch_size` correctly for both training and test dataloader.

See <https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader> for more information.

In [4]:
training_data = datasets.FashionMNIST(
    root="data",
    train=True, # True
    download=True,
    transform=ToTensor()
)

test_data = datasets.FashionMNIST(
    root="data",
    train=False, # False
    download=True,
    transform=ToTensor()
)

batch_size = 64
train_loader = DataLoader(training_data, batch_size=batch_size) # Specify data source and batch size correctly
test_loader = DataLoader(test_data, batch_size=batch_size)

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to data/FashionMNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 26421880/26421880 [00:01<00:00, 16972754.95it/s]


Extracting data/FashionMNIST/raw/train-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 29515/29515 [00:00<00:00, 206245.40it/s]


Extracting data/FashionMNIST/raw/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 4422102/4422102 [00:08<00:00, 537172.28it/s] 


Extracting data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 5148/5148 [00:00<00:00, 2257897.83it/s]

Extracting data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw






In [5]:
# Test code
print('Training data size:', len(training_data))
print('Testing data size:', len(test_data))

count = 0
for batch in train_loader:
    X, y = batch
    print('X size:', X.size())
    print('y size:', y.size())
    count += 1
    if count > 0:
        break

Training data size: 60000
Testing data size: 10000
X size: torch.Size([64, 1, 28, 28])
y size: torch.Size([64])


***

## **3. Define Loss and Optimizer**

We will use `nn.CrossEntropyLoss()` as the loss function, and `torch.optim.SGD()` as the optimizer. We need to specify the arguments for `SGD()`, including the learning rate correctly.

See <https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html> and <https://pytorch.org/docs/stable/optim.html> for more information.

In [6]:
learning_rate = 1e-3
loss_fn = nn.CrossEntropyLoss()
optimizer_sgd = torch.optim.SGD(model1.parameters(), lr=learning_rate)

In [7]:
# Test code
print(loss_fn)
print(type(optimizer_sgd))

CrossEntropyLoss()
<class 'torch.optim.sgd.SGD'>


***

## **4. Implement Train and Test Functions**

Now, we have to implement the code for training the model in `train()`, as well as implement the code for testing the model in `test()`. For the backpropagation step, we need to first zero out all gradients by calling `optimizer.zero_grad()` before carrying out `backward()` and `step()` to update parameters.

In `test()`, we need to calculate the number of correct prediction in the current batch, and add it to the `correct` variable.
Finally, we need to divide `correct` by the total number of test examples to obtain the test accuracy.

In [8]:
def train_loop(dataloader, model, loss_fn, optimizer, verbose=True):
    for i, (X, y) in enumerate(dataloader):
        # Compute prediction and loss
        pred = model(X) # Get the prediction output from model
        loss = loss_fn(pred, y) # compute loss by calling loss_fn()

        # Backpropagation
        optimizer.zero_grad() # zero_grad()
        loss.backward() # backward()
        optimizer.step() # step()

        if verbose and i % 100 == 0:
            loss = loss.item()
            current_step = i * len(X)
            print(f"loss: {loss:>7f}  [{current_step:>5d}/{len(dataloader.dataset):>5d}]")

In [9]:
@torch.no_grad()
def test_loop(dataloader, model, loss_fn):
    test_loss, correct = 0, 0

    for X, y in dataloader:
        pred = model(X) # Similar to how it is computed in train()
        loss = loss_fn(pred, y)
        test_loss += loss.item()
        correct += (pred.argmax(1) == y).type(torch.float).sum().item() # Add the number of correct prediction in the current batch to `correct`

    test_loss /= len(dataloader)
    test_acc = correct / len(dataloader.dataset) # Use `correct` to compute accuracy
    print(f"Test Error: \n Accuracy: {(100*test_acc):>0.1f}%, Avg loss: {test_loss:>8f} \n")

Next, we will execute the following cell to start the training and testing loop after making sure that the cell containing the loss function and optimizers has already been executed.

In [10]:
model1 = NeuralNetModel1() # Reset the model
optimizer_sgd = torch.optim.SGD(model1.parameters(), lr=learning_rate) # Because the model1 is reset, the optimizer also needs redefined.

epochs = 10
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_loader, model1, loss_fn, optimizer_sgd, verbose=True) # Use verbose=False to see less information
    test_loop(test_loader, model1, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.296701  [    0/60000]
loss: 2.284246  [ 6400/60000]
loss: 2.266165  [12800/60000]
loss: 2.270359  [19200/60000]
loss: 2.252516  [25600/60000]
loss: 2.231632  [32000/60000]
loss: 2.239763  [38400/60000]
loss: 2.206202  [44800/60000]
loss: 2.202750  [51200/60000]
loss: 2.174662  [57600/60000]
Test Error: 
 Accuracy: 40.0%, Avg loss: 2.171464 

Epoch 2
-------------------------------
loss: 2.178694  [    0/60000]
loss: 2.166120  [ 6400/60000]
loss: 2.112745  [12800/60000]
loss: 2.130643  [19200/60000]
loss: 2.084808  [25600/60000]
loss: 2.037292  [32000/60000]
loss: 2.053999  [38400/60000]
loss: 1.979528  [44800/60000]
loss: 1.982393  [51200/60000]
loss: 1.910791  [57600/60000]
Test Error: 
 Accuracy: 58.6%, Avg loss: 1.913630 

Epoch 3
-------------------------------
loss: 1.944621  [    0/60000]
loss: 1.911706  [ 6400/60000]
loss: 1.797971  [12800/60000]
loss: 1.834718  [19200/60000]
loss: 1.737230  [25600/60000]
loss: 1.692371  [32000/600

Next, we train an ADAM optimizer. Note that the model needs be reset.

In [11]:
model1 = NeuralNetModel1() # Reset the model
optimizer_adam = torch.optim.Adam(model1.parameters(), lr=learning_rate)

epochs = 10
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_loader, model1, loss_fn, optimizer_adam, verbose=True) # Use verbose=False to see less information
    test_loop(test_loader, model1, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.299667  [    0/60000]
loss: 0.565680  [ 6400/60000]
loss: 0.392925  [12800/60000]
loss: 0.491291  [19200/60000]
loss: 0.450433  [25600/60000]
loss: 0.447974  [32000/60000]
loss: 0.378937  [38400/60000]
loss: 0.541005  [44800/60000]
loss: 0.462502  [51200/60000]
loss: 0.504917  [57600/60000]
Test Error: 
 Accuracy: 84.5%, Avg loss: 0.424086 

Epoch 2
-------------------------------
loss: 0.265505  [    0/60000]
loss: 0.358143  [ 6400/60000]
loss: 0.300393  [12800/60000]
loss: 0.383802  [19200/60000]
loss: 0.442644  [25600/60000]
loss: 0.385706  [32000/60000]
loss: 0.301127  [38400/60000]
loss: 0.506399  [44800/60000]
loss: 0.393916  [51200/60000]
loss: 0.408847  [57600/60000]
Test Error: 
 Accuracy: 85.4%, Avg loss: 0.394956 

Epoch 3
-------------------------------
loss: 0.213341  [    0/60000]
loss: 0.321439  [ 6400/60000]
loss: 0.251663  [12800/60000]
loss: 0.315905  [19200/60000]
loss: 0.405041  [25600/60000]
loss: 0.343355  [32000/600

***

## **5. Add Batchnorm and Dropout**

We use `torch.nn.BatchNorm1d()` and `nn.Dropout()` after the ReLU activation of each hidden layer. `Batchnorm1d()` takes the size of previous activation as input. `Dropout()` takes the probability of dropout as input.

For more information, see <https://pytorch.org/docs/stable/generated/torch.nn.BatchNorm1d.html> and <https://pytorch.org/docs/stable/generated/torch.nn.Dropout.html>.

In [12]:
class NeuralNetModel2(nn.Module):
    def __init__(self, dropout = 0.1): # Note the additional dropout parameter here
        """
        :param dropout: float, the probability of dropout
        """
        super(NeuralNetModel2, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(), # ReLU
            nn.BatchNorm1d(512), # Batchnorm
            nn.Dropout(dropout), # Dropout, use the `dropout` parameter

            nn.Linear(512, 512),
            nn.ReLU(), # ReLU
            torch.nn.BatchNorm1d(512), # Batchnorm
            nn.Dropout(dropout), # Dropout, use the `dropout` parameter
            nn.Linear(512, 10),
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

In the following cell, we test with different `dropout` rates, and observe how that affects the test accuracy.

In [13]:
model2 = NeuralNetModel2(dropout=0.5) # Call NeuralNetModel2() with different dropout values
optimizer = torch.optim.Adam(model2.parameters(), lr=learning_rate) # Can also try Adam/SGD optimizer

epochs = 10
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_loader, model2, loss_fn, optimizer, verbose=True) # Use verbose=False to see less information
    test_loop(test_loader, model2, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.625102  [    0/60000]
loss: 0.610666  [ 6400/60000]
loss: 0.552281  [12800/60000]
loss: 0.565507  [19200/60000]
loss: 0.617299  [25600/60000]
loss: 0.525321  [32000/60000]
loss: 0.482823  [38400/60000]
loss: 0.575578  [44800/60000]
loss: 0.607219  [51200/60000]
loss: 0.560870  [57600/60000]
Test Error: 
 Accuracy: 80.9%, Avg loss: 0.538211 

Epoch 2
-------------------------------
loss: 0.416563  [    0/60000]
loss: 0.445116  [ 6400/60000]
loss: 0.357440  [12800/60000]
loss: 0.490875  [19200/60000]
loss: 0.521853  [25600/60000]
loss: 0.494577  [32000/60000]
loss: 0.425976  [38400/60000]
loss: 0.566340  [44800/60000]
loss: 0.519707  [51200/60000]
loss: 0.460394  [57600/60000]
Test Error: 
 Accuracy: 81.6%, Avg loss: 0.515284 

Epoch 3
-------------------------------
loss: 0.276398  [    0/60000]
loss: 0.480249  [ 6400/60000]
loss: 0.375430  [12800/60000]
loss: 0.468507  [19200/60000]
loss: 0.329319  [25600/60000]
loss: 0.494466  [32000/600