# Assignment 1: Multi-Layer Perceptron with MNIST Dataset

In this assignment, you are required to train two MLPs to classify images from the [MNIST database](http://yann.lecun.com/exdb/mnist/) hand-written digit database by using PyTorch.

The process will be broken down into the following steps:
>1. Load and visualize the data.
2. Define a neural network. (30 marks)
3. Train the models. (30 marks)
4. Evaluate the performance of our trained models on the test dataset. (20 marks)
5. Analysis your results. (20 marks)

In [1]:
import torch
import numpy as np

---
## Load and Visualize the Data

Downloading may take a few moments, and you should see your progress as the data is loading. You may also choose to change the `batch_size` if you want to load more data at a time.

This cell will create DataLoaders for each of our datasets.

In [2]:
from torchvision import datasets
import torchvision.transforms as transforms

# number of subprocesses to use for data loading
num_workers = 0
# how many samples per batch to load
batch_size = 20

# convert data to torch.FloatTensor
transform = transforms.ToTensor()

# choose the training and test datasets
train_data = datasets.MNIST(root='data', train=True,
                                   download=True, transform=transform)
test_data = datasets.MNIST(root='data', train=False,
                                  download=True, transform=transform)

# prepare data loaders
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, num_workers=num_workers)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size, num_workers=num_workers)

---
## Define the Network Architecture (30 marks)

* Input: a 784-dim Tensor of pixel values for each image.
* Output: a 10-dim Tensor of number of classes that indicates the class scores for an input image. 

You need to implement three models:
1. a vanilla multi-layer perceptron. (10 marks)
2. a multi-layer perceptron with regularization (dropout or L2 or both). (10 marks)
3. the corresponding loss functions and optimizers. (10 marks)

### Build model_1

In [5]:
import torch.nn as nn
import torch.nn.functional as F
import time
import torch
## Define the MLP architecture
#“Vanilla” Neural Network : this type of MLP has only a single hidden layer 
device = torch.
class VanillaMLP(nn.Module):
    def __init__(self):
        super(VanillaMLP, self).__init__()
        # implement your codes here
        H = 64
        self.linear1 = torch.nn.Linear(784,H)
        self.linear2 = torch.nn.Linear(H,10)

    def forward(self, x):
        # flatten image input
        x = x.view(-1, 28 * 28)
        h_relu = self.linear1(x).clamp(min=0)
        y_pred = self.linear2(h_relu)
        # implement your codes here   
        return F.log_softmax(y_pred,dim=1)

# initialize the MLP

model_1 = VanillaMLP()

# specify loss function
# implement your codes here
def loss_function(y_pred,y):
    loss = F.nll_loss(y_pred,y)
    return loss

# specify your optimizer
# implement your codes here
#随机梯度下降优化
optimizer = torch.optim.SGD(model_1.parameters(),lr=1e-2)

n_epochs = 30  # suggest training between 20-50 epochs
# device= torch.cuda("cpu") 
# model_1.to(device)
# model_1 = model_1.cuda()
model_1.train() # prep model for training

for epoch in range(n_epochs):
    # monitor training loss
    train_loss = 0.0
    total_correct = 0
    
    for data, target in train_loader:
#         data = data.to(device)
#         target = target.to(device)
#         data = data.cuda()
#         target = target.cuda()
        y_pred = model_1(data)
        loss = loss_function(y_pred,target)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        # implement your code here
        # the total loss of this batch
        train_loss += loss.item()
        # the accumulated number of correctly classified samples of this batch
        _,pred = torch.max(y_pred.data,1)
        total_correct += torch.sum(pred == target.data).item()
        
    # print training statistics 
    # calculate average loss and accuracy over an epoch
    train_loss = train_loss / len(train_loader.dataset)
    train_acc = 100. * total_correct / len(train_loader.dataset)
    
    print('Epoch: {} \tTraining Loss: {:.6f} \tTraining Acc: {:.2f}%%'.format(
        epoch+1, 
        train_loss,
        train_acc
        ))

---
## Train the Network (30 marks)

Train your models in the following two cells.

The following loop trains for 30 epochs; feel free to change this number. For now, we suggest somewhere between 20-50 epochs. As you train, take a look at how the values for the training loss decrease over time. We want it to decrease while also avoiding overfitting the training data. 

**The key parts in the training process are left for you to implement.**

### Train model_1

In [8]:
# number of epochs to train the model

Epoch: 1 	Training Loss: 0.035669 	Training Acc: 82.47%%
Epoch: 2 	Training Loss: 0.016511 	Training Acc: 90.72%%
Epoch: 3 	Training Loss: 0.014309 	Training Acc: 91.94%%
Epoch: 4 	Training Loss: 0.012896 	Training Acc: 92.75%%
Epoch: 5 	Training Loss: 0.011753 	Training Acc: 93.40%%
Epoch: 6 	Training Loss: 0.010799 	Training Acc: 93.94%%
Epoch: 7 	Training Loss: 0.009991 	Training Acc: 94.33%%
Epoch: 8 	Training Loss: 0.009291 	Training Acc: 94.70%%
Epoch: 9 	Training Loss: 0.008683 	Training Acc: 95.08%%
Epoch: 10 	Training Loss: 0.008147 	Training Acc: 95.42%%
Epoch: 11 	Training Loss: 0.007669 	Training Acc: 95.67%%
Epoch: 12 	Training Loss: 0.007240 	Training Acc: 95.92%%
Epoch: 13 	Training Loss: 0.006847 	Training Acc: 96.15%%
Epoch: 14 	Training Loss: 0.006491 	Training Acc: 96.35%%
Epoch: 15 	Training Loss: 0.006167 	Training Acc: 96.52%%
Epoch: 16 	Training Loss: 0.005870 	Training Acc: 96.68%%
Epoch: 17 	Training Loss: 0.005599 	Training Acc: 96.87%%
Epoch: 18 	Training Los

### Train model_2

In [None]:
# number of epochs to train the model
n_epochs = 30  # suggest training between 20-50 epochs

model_2.train() # prep model for training

for epoch in range(n_epochs):
    # monitor training loss
    train_loss = 0.0
    total_correct = 0
    
    for data, target in train_loader:
        
        y_pred_2 = model_2(data)
        loss_2 = loss_function_2(y_pred_2,target)
        optimizer_2.zero_grad()
        loss_2.backward()
        optimizer_2.step()
        
        train_loss += loss_2.item()
        _,pred_2 = torch.max(y_pred_2.data,1)
        total_correct += torch.sum(pred_2 == target.data).item()
        
    # print training statistics 
    # calculate average loss and accuracy over an epoch
    train_loss = train_loss / len(train_loader.dataset)
    train_acc = 100. * total_correct / len(train_loader.dataset)
    
    print('Epoch: {} \tTraining Loss: {:.6f} \tTraining Acc: {:.2f}%%'.format(
        epoch+1, 
        train_loss,
        train_acc
        ))

Epoch: 1 	Training Loss: 0.045496 	Training Acc: 74.35%%
Epoch: 2 	Training Loss: 0.024550 	Training Acc: 85.99%%
Epoch: 3 	Training Loss: 0.020899 	Training Acc: 88.01%%
Epoch: 4 	Training Loss: 0.018911 	Training Acc: 89.09%%
Epoch: 5 	Training Loss: 0.017607 	Training Acc: 89.76%%
Epoch: 6 	Training Loss: 0.016677 	Training Acc: 90.45%%
Epoch: 7 	Training Loss: 0.016066 	Training Acc: 90.67%%
Epoch: 8 	Training Loss: 0.015407 	Training Acc: 91.06%%
Epoch: 9 	Training Loss: 0.014900 	Training Acc: 91.25%%
Epoch: 10 	Training Loss: 0.014397 	Training Acc: 91.58%%
Epoch: 11 	Training Loss: 0.014067 	Training Acc: 91.78%%
Epoch: 12 	Training Loss: 0.013784 	Training Acc: 91.98%%
Epoch: 13 	Training Loss: 0.013410 	Training Acc: 92.12%%
Epoch: 14 	Training Loss: 0.013272 	Training Acc: 92.25%%
Epoch: 15 	Training Loss: 0.012963 	Training Acc: 92.34%%
Epoch: 16 	Training Loss: 0.012738 	Training Acc: 92.55%%
Epoch: 17 	Training Loss: 0.012484 	Training Acc: 92.55%%
Epoch: 18 	Training Los

---
## Test the Trained Network (20 marks)

Test the performance of trained models on test data. Except the total test accuracy, you should calculate the accuracy for each class.

### Test model_1

In [53]:
# initialize lists to monitor test loss and accuracy
test_loss = 0.0
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))

model_1.eval() # prep model for *evaluation*

for data, target in test_loader:
    
    # implement your code here
    y_pred = model_1(data)
    loss = loss_function(y_pred,target)

    test_loss += loss.item()
    _,pred = torch.max(y_pred.data,1)
    for label in range(10):
        # the list of number of correctly classified samples of each class of this batch. label is the index.
        for i in range(20):
            if target[i] == label and target[i] == pred[i]:
                class_correct[label] += 1
            if target[i] == label:
                class_total[label] += 1
#         print(label,class_correct[label])
        # the list of total number of samples of each class of this batch. label is the index.
#         class_total[label] += torch.sum(torch.eq(target.data, label)).item()
#         print(label,class_total[label])

# calculate and print avg test loss
test_loss = test_loss / len(test_loader.dataset)
print('Test Loss: {:.6f}\n'.format(test_loss))

for i in range(10):
    if class_total[i] > 0:
        print('Test Accuracy of class %d: %.2f%%' % (i, 100 * class_correct[i] / class_total[i]))
    else:
        print('Test Accuracy of class %d: N/A (no training examples)' % (i))

print('\nTest Accuracy (Overall): %.2f%%' % (100. * np.sum(class_correct) / np.sum(class_total)))



Test Loss: 0.012647

Test Accuracy of class 0: 98.06%
Test Accuracy of class 1: 97.53%
Test Accuracy of class 2: 90.12%
Test Accuracy of class 3: 91.68%
Test Accuracy of class 4: 93.89%
Test Accuracy of class 5: 88.00%
Test Accuracy of class 6: 94.99%
Test Accuracy of class 7: 92.80%
Test Accuracy of class 8: 89.53%
Test Accuracy of class 9: 91.77%

Test Accuracy (Overall): 92.93%


### Test model_2

In [None]:
# initialize lists to monitor test loss and accuracy
test_loss = 0.0
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))

model_2.eval() # prep model for *evaluation*

for data, target in test_loader:
    
    # implement your code here
    y_pred_2 = model_2(data)
    loss_2 = loss_function(y_pred_2,target)

    
    test_loss += loss_2.item()
    _,pred_2 = torch.max(y_pred_2.data,1)
    for label in range(10):
        # the list of number of correctly classified samples of each class of this batch. label is the index.
        for i in range(20):
            if target[i] == label and target[i] == pred_2[i]:
                class_correct[label] += 1
            if target[i] == label:
                class_total[label] += 1
# calculate and print avg test loss
test_loss = test_loss / len(test_loader.dataset)
print('Test Loss: {:.6f}\n'.format(test_loss))

for i in range(10):
    if class_total[i] > 0:
        print('Test Accuracy of class %d: %.2f%%' % (i, 100 * class_correct[i] / class_total[i]))
    else:
        print('Test Accuracy of class %d: N/A (no training examples)' % (i))

print('\nTest Accuracy (Overall): %.2f%%' % (100. * np.sum(class_correct) / np.sum(class_total)))


## Analyze Your Result (20 marks)
Compare the performance of your models with the following analysis. Both English and Chinese answers are acceptable.
1. Does your vanilla MLP overfit to the training data? (5 marks)

Answer:

2. If yes, how do you observe it? If no, why? (5 marks)

Answer:

3. Is regularized model help prevent overfitting? (5 marks)

Answer:

4. Generally compare the performance of two models. (5 marks)

Answer:
