# Assignment 1: Multi-Layer Perceptron with MNIST Dataset

In this assignment, you are required to train two MLPs to classify images from the [MNIST database](http://yann.lecun.com/exdb/mnist/) hand-written digit database by using PyTorch.

The process will be broken down into the following steps:
>1. Load and visualize the data.
2. Define a neural network. (30 marks)
3. Train the models. (30 marks)
4. Evaluate the performance of our trained models on the test dataset. (20 marks)
5. Analysis your results. (20 marks)

In [1]:
import torch
import numpy as np

---
## Load and Visualize the Data

Downloading may take a few moments, and you should see your progress as the data is loading. You may also choose to change the `batch_size` if you want to load more data at a time.

This cell will create DataLoaders for each of our datasets.

In [2]:
from torchvision import datasets
import torchvision.transforms as transforms

# number of subprocesses to use for data loading
num_workers = 8
# how many samples per batch to load
batch_size = 20

# convert data to torch.FloatTensor
transform = transforms.ToTensor()

# choose the training and test datasets
train_data = datasets.MNIST(root='data', train=True,
                                   download=True, transform=transform)
test_data = datasets.MNIST(root='data', train=False,
                                  download=True, transform=transform)

# prepare data loaders
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, num_workers=num_workers)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size, num_workers=num_workers)

---
## Define the Network Architecture (30 marks)

* Input: a 784-dim Tensor of pixel values for each image.
* Output: a 10-dim Tensor of number of classes that indicates the class scores for an input image. 

You need to implement three models:
1. a vanilla multi-layer perceptron. (10 marks)
2. a multi-layer perceptron with regularization (dropout or L2 or both). (10 marks)
3. the corresponding loss functions and optimizers. (10 marks)

### Build model_1

In [3]:
import torch.nn as nn
import torch.nn.functional as F
import time
import torch
## Define the MLP architecture
#“Vanilla” Neural Network : this type of MLP has only a single hidden layer  
device = torch.device("cuda:0,1")
print(torch.device.)
class VanillaMLP(nn.Module):
    def __init__(self):
        super(VanillaMLP, self).__init__()
        # implement your codes here
        self.linear1 = torch.nn.Linear(784,200)
        self.linear2 = torch.nn.Linear(200,100)
        self.linear3 = torch.nn.Linear(100,50)
        self.linear4 = torch.nn.Linear(50,20)
        self.linear5 = torch.nn.Linear(20,10)

    def forward(self, x):
        # flatten image input
        x = x.view(-1, 28 * 28)
        h_relu1 = self.linear1(x).clamp(min=0)
        h_relu2 = self.linear2(h_relu1).clamp(min=0)
        h_relu3 = self.linear3(h_relu2).clamp(min=0)
        h_relu4 = self.linear4(h_relu3).clamp(min=0)
        y_pred = self.linear5(h_relu4)
        # implement your codes here   
        return F.log_softmax(y_pred,dim=1)

# initialize the MLP

model_1 = VanillaMLP()
model_1.to(device)
# specify loss function
# implement your codes here
def loss_function(y_pred,y):
    loss = F.nll_loss(y_pred,y)
    return loss

# specify your optimizer
# implement your codes here
#随机梯度下降优化
optimizer = torch.optim.SGD(model_1.parameters(),lr=1e-2)

n_epochs = 5  # suggest training between 20-50 epochs
# model_1 = model_1.cuda()
model_1.train() # prep model for training

start = time.clock()

for epoch in range(n_epochs):
    # monitor training loss
    train_loss = 0.0
    total_correct = 0
    
    for data, target in train_loader:
        data = data.to(device)
#         data = data.gpu()
#         data = data.cuda()
        target = target.to(device)
#         target = target.gpu()
#         target = target.cuda()
        y_pred = model_1(data)
        loss = loss_function(y_pred,target)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        # implement your code here
        # the total loss of this batch
        train_loss += loss.item()
        # the accumulated number of correctly classified samples of this batch
        _,pred = torch.max(y_pred.data,1)
        total_correct += torch.sum(pred == target.data).item()
        
    # print training statistics 
    # calculate average loss and accuracy over an epoch
    train_loss = train_loss / len(train_loader.dataset)
    train_acc = 100. * total_correct / len(train_loader.dataset)
    
    print('Epoch: {} \tTraining Loss: {:.6f} \tTraining Acc: {:.2f}%%'.format(
        epoch+1, 
        train_loss,
        train_acc
        ))
end = time.clock()
print(end-start)

Epoch: 1 	Training Loss: 0.098675 	Training Acc: 28.02%%
Epoch: 2 	Training Loss: 0.029445 	Training Acc: 81.98%%
Epoch: 3 	Training Loss: 0.014186 	Training Acc: 92.00%%
Epoch: 4 	Training Loss: 0.009195 	Training Acc: 94.78%%
Epoch: 5 	Training Loss: 0.006911 	Training Acc: 96.12%%
102.5456319


---
## Train the Network (30 marks)

Train your models in the following two cells.

The following loop trains for 30 epochs; feel free to change this number. For now, we suggest somewhere between 20-50 epochs. As you train, take a look at how the values for the training loss decrease over time. We want it to decrease while also avoiding overfitting the training data. 

**The key parts in the training process are left for you to implement.**

### Train model_1

In [4]:
import torch.nn as nn
import torch.nn.functional as F
import time
import torch
## Define the MLP architecture
#“Vanilla” Neural Network : this type of MLP has only a single hidden layer 
device = torch.device("cpu") 

class VanillaMLP(nn.Module):
    def __init__(self):
        super(VanillaMLP, self).__init__()
        # implement your codes here
        self.linear1 = torch.nn.Linear(784,200)
        self.linear2 = torch.nn.Linear(200,100)
        self.linear3 = torch.nn.Linear(100,50)
        self.linear4 = torch.nn.Linear(50,20)
        self.linear5 = torch.nn.Linear(20,10)

    def forward(self, x):
        # flatten image input
        x = x.view(-1, 28 * 28)
        h_relu1 = self.linear1(x).clamp(min=0)
        h_relu2 = self.linear2(h_relu1).clamp(min=0)
        h_relu3 = self.linear3(h_relu2).clamp(min=0)
        h_relu4 = self.linear4(h_relu3).clamp(min=0)
        y_pred = self.linear5(h_relu4)
        # implement your codes here   
        return F.log_softmax(y_pred,dim=1)

# initialize the MLP

model_1 = VanillaMLP()
model_1.to(device)
# specify loss function
# implement your codes here
def loss_function(y_pred,y):
    loss = F.nll_loss(y_pred,y)
    return loss

# specify your optimizer
# implement your codes here
#随机梯度下降优化
optimizer = torch.optim.SGD(model_1.parameters(),lr=1e-2)

n_epochs = 5  # suggest training between 20-50 epochs
# model_1 = model_1.cuda()
model_1.train() # prep model for training

start = time.clock()

for epoch in range(n_epochs):
    # monitor training loss
    train_loss = 0.0
    total_correct = 0
    
    for data, target in train_loader:
        data = data.to(device)
#         data = data.gpu()
#         data = data.cuda()
        target = target.to(device)
#         target = target.gpu()
#         target = target.cuda()
        y_pred = model_1(data)
        loss = loss_function(y_pred,target)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        # implement your code here
        # the total loss of this batch
        train_loss += loss.item()
        # the accumulated number of correctly classified samples of this batch
        _,pred = torch.max(y_pred.data,1)
        total_correct += torch.sum(pred == target.data).item()
        
    # print training statistics 
    # calculate average loss and accuracy over an epoch
    train_loss = train_loss / len(train_loader.dataset)
    train_acc = 100. * total_correct / len(train_loader.dataset)
    
    print('Epoch: {} \tTraining Loss: {:.6f} \tTraining Acc: {:.2f}%%'.format(
        epoch+1, 
        train_loss,
        train_acc
        ))
end = time.clock()
print(end-start)

Epoch: 1 	Training Loss: 0.095409 	Training Acc: 34.72%%
Epoch: 2 	Training Loss: 0.025924 	Training Acc: 84.44%%
Epoch: 3 	Training Loss: 0.015137 	Training Acc: 91.43%%
Epoch: 4 	Training Loss: 0.009951 	Training Acc: 94.26%%
Epoch: 5 	Training Loss: 0.007405 	Training Acc: 95.74%%
40.38718510000001
