# Assignment 1: Multi-Layer Perceptron with MNIST Dataset

In this assignment, you are required to train two MLPs to classify images from the [MNIST database](http://yann.lecun.com/exdb/mnist/) hand-written digit database by using PyTorch.

The process will be broken down into the following steps:
>1. Load and visualize the data.
2. Define a neural network. (30 marks)
3. Train the models. (30 marks)
4. Evaluate the performance of our trained models on the test dataset. (20 marks)
5. Analysis your results. (20 marks)

In [1]:
import torch
from torch import nn
import numpy as np
import logging
import sys

# set log
logging.basicConfig(level=logging.INFO,
                    format='%(asctime)s %(levelname)s: %(message)s',
                     datefmt='%Y-%m-%d %H:%M:%S',)

Get the version information of your package

In [2]:
logging.info('The version information:')
logging.info(f'Python: {sys.version}')
logging.info(f'PyTorch: {torch.__version__}')
assert torch.cuda.is_available() == True, 'Please finish your GPU develop environment'

2023-10-11 12:42:00 INFO: The version information:
2023-10-11 12:42:00 INFO: Python: 3.9.18 (main, Sep 11 2023, 14:09:26) [MSC v.1916 64 bit (AMD64)]
2023-10-11 12:42:00 INFO: PyTorch: 1.12.0


---
## Load and Visualize the Data

Downloading may take a few moments, and you should see your progress as the data is loading. You may also choose to change the `batch_size` if you want to load more data at a time.

This cell will create DataLoaders for each of our datasets.

In [3]:
from torchvision import datasets
import torchvision.transforms as transforms
from torch.utils.data.dataset import Dataset

# number of subprocesses to use for data loading
num_workers = 0
# how many samples per batch to load
batch_size = 20

# convert data to torch.FloatTensor
transform = transforms.ToTensor()

# choose the training and test datasets
train_data = datasets.MNIST(root='data', train=True,
                                   download=True, transform=transform)
test_data = datasets.MNIST(root='data', train=False,
                                  download=True, transform=transform)

# prepare data loaders
def classify_label(dataset, num_classes):
    list_index = [[] for _ in range(num_classes)]
    for idx, datum in enumerate(dataset):
        list_index[datum[1]].append(idx)
    return list_index

def partition_train(list_label2indices: list, num_per_class: int):
    random_state = np.random.RandomState(0)
    list_label2indices_train = []
    for indices in list_label2indices:
        random_state.shuffle(indices)
        list_label2indices_train.extend(indices[:num_per_class])
    return list_label2indices_train

class Indices2Dataset(Dataset):
    def __init__(self, dataset):
        self.dataset = dataset
        self.indices = None

    def load(self, indices: list):
        self.indices = indices

    def __getitem__(self, idx):
        idx = self.indices[idx]
        image, label = self.dataset[idx]
        return image, label

    def __len__(self):
        return len(self.indices)

#  sort train data by label
list_label2indices = classify_label(dataset=train_data, num_classes=10)

# how many samples per class to train
list_train = partition_train(list_label2indices, 500)

# prepare data loaders  
indices2data = Indices2Dataset(train_data)
indices2data.load(list_train)
train_loader = torch.utils.data.DataLoader(indices2data, batch_size=batch_size, num_workers=num_workers, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size, num_workers=num_workers, shuffle=True)

### Visualize a Batch of Training Data

The first step in a classification task is to take a look at the data, make sure it is loaded in correctly, then make any initial observations about patterns in that data.

In [4]:
import matplotlib.pyplot as plt
%matplotlib inline  

# obtain one batch of training images
dataiter = iter(train_loader)
images, labels = dataiter.next()
images = images.numpy()

# plot the images in the batch, along with the corresponding labels
fig = plt.figure(figsize=(25, 4))
for idx in np.arange(20):
    ax = fig.add_subplot(2, 20//2, idx+1, xticks=[], yticks=[])
    ax.imshow(np.squeeze(images[idx]), cmap='gray')
    # print out the correct label for each image
    # .item() gets the value contained in a Tensor
    ax.set_title(str(labels[idx].item()))

: 

### View an Image in More Detail

In [None]:
img = np.squeeze(images[1])

fig = plt.figure(figsize = (12,12)) 
ax = fig.add_subplot(111)
ax.imshow(img, cmap='gray')
width, height = img.shape
thresh = img.max()/2.5
for x in range(width):
    for y in range(height):
        val = round(img[x][y],2) if img[x][y] !=0 else 0
        ax.annotate(str(val), xy=(y,x),
                    horizontalalignment='center',
                    verticalalignment='center',
                    color='white' if img[x][y]<thresh else 'black')

NameError: name 'np' is not defined

---
## Set random seed
A random seed is used to ensure that results are reproducible. In other words, using this parameter makes sure that anyone who re-runs your code will get the exact same outputs. Reproducibility is an extremely important concept in data science and other fields. More details to read: [How to Use Random Seeds Effectively](https://towardsdatascience.com/how-to-use-random-seeds-effectively-54a4cd855a79)

In [None]:
import random
import os

## give the number you like such as 2023
seed_value = 2023

np.random.seed(seed_value)
random.seed(seed_value)
os.environ['PYTHONHASHSEED'] = str(seed_value)

torch.manual_seed(seed_value)     
torch.cuda.manual_seed(seed_value)     
torch.cuda.manual_seed_all(seed_value)   
torch.backends.cudnn.deterministic = True
logging.info(f"tha value of the random seed: {seed_value}")

2023-10-10 23:14:38 INFO: tha value of the random seed: 2023


---
## Define the Network Architecture (30 marks)

* Input: a 784-dim Tensor of pixel values for each image.
* Output: a 10-dim Tensor of number of classes that indicates the class scores for an input image. 

You need to implement three models:
1. a vanilla multi-layer perceptron. (10 marks)
2. a multi-layer perceptron with regularization (dropout or L2 or both). (10 marks)
3. the corresponding loss functions and optimizers. (10 marks)

## 定义网络结构（30 分）

* 输入：每幅图像像素值的 784 维张量。
* 输出：表示输入图像类别得分的 10 维类别数张量。

您需要实现三种模型：
1. 香草多层感知器。(10 分）
2. 带正则化（滤除或 L2 或两者）的多层感知器。(10 分）
3. 相应的损失函数和优化器。(10 分）

### Build model_1

In [None]:
## Define the MLP architecture
class VanillaMLP(nn.Module):
    def __init__(self):
        super(VanillaMLP, self).__init__()
        
        # implement your codes here
        #D_in,H,D_out = 784,100,10
        self.linear1 = torch.nn.Linear(784,100)
        self.linear2 = torch.nn.Linear(100,10)#need two MLPs to classify images
        


    def forward(self, x):
        # flatten image input
        x = x.view(-1, 28 * 28) #view ()相当于reshape、resize，重新调整Tensor的形状,-1表示不确定的数


        # implement your codes here
        h_relu = self.linear1(x).clamp(min=0)
        x = self.linear2(h_relu)
        
        
        return x

# initialize the MLP
model_1 = VanillaMLP()

# specify loss function
# implement your codes here
loss_model_1 = torch.nn.Softmax() #十分类任务

# specify your optimizer
# implement your codes here
optimizer_model_1 = torch.optim.SGD(model_1.parameters(),lr=1e-4)

### Build model_2

In [None]:
## Define the MLP architecture
class RegularizedMLP(nn.Module):
    def __init__(self):
        super(RegularizedMLP, self).__init__()
        
        # implement your codes here
        self.layer1 = torch.nn.Sequential(
            
            nn.Linear(784,100),
            nn.ReLU(),
            nn.Dropout(0.5))
        self.layer2 =  torch.nn.Sequential(
            
            nn.Linear(784,100),
            nn.ReLU(), 
            nn.BatchNorm2d(0.00001))

    def forward(self, x):
        # flatten image input
        x = x.view(-1, 28 * 28)

        # implement your codes here
        x = self.layer1(x)
        x = self.layer2(x)
        
        return x


# initialize the MLP
model_2 = RegularizedMLP()

# specify loss function
# implement your codes here
loss_model_2 = torch.nn.Softmax(dim=1)

# specify your optimizer
# implement your codes here
optimizer_model_2 = torch.optim(model_2.parameters(),lr=1e-4)

---
## Train the Network (30 marks)

Train your models in the following two cells.

The following loop trains for 30 epochs; feel free to change this number. For now, we suggest somewhere between 20-50 epochs. As you train, take a look at how the values for the training loss decrease over time. We want it to decrease while also avoiding overfitting the training data. 

We will introduce some metrics of classification tasks and you will learn how implement these metrics with scikit-learn.

There are supply some references for you to learn: [evaluation_metrics_spring2020](https://cs229.stanford.edu/section/evaluation_metrics_spring2020.pdf).

In training processing, we will use accuracy,  Area Under ROC and top k accuracy.

**The key parts in the training process are left for you to implement.**

### Train model_1
#### Train model_1

In [None]:
# import scikit-learn packages
# please use the function imported from scikit-learn to metric the process of training of the model
from sklearn.metrics import accuracy_score,roc_auc_score, top_k_accuracy_score
# number of epochs to train the model
n_epochs = 20  # suggest training between 20-50 epochs

model_1.train() # prep model for training

train_loss_list = []
train_acc_list = []
train_auc_list = []
train_top_k_acc_list = []


# GPU check
logging.info(f'GPU is available: {torch.cuda.is_available()}')
if torch.cuda.is_available():
    gpu_num = torch.cuda.device_count()
    logging.info(f"Train model on {gpu_num} GPUs:")
    for i in range(gpu_num):
        print('\t GPU {}.: {}'.format(i,torch.cuda.get_device_name(i)))
    model_1 = model_1.cuda()

for epoch in range(n_epochs):
    # monitor training loss
    train_loss = 0.0
    pred_array = None
    label_array =  None
    
    one_hot_label_matrix = None
    pred_matrix = None
    

    for data, label in train_loader:
        data = data.cuda()
        label = label.cuda()
        # implement your code here
        train_loss += loss_model_1 #这里肯定是算loss
        

        # finish the the computation of variables of metric
        # implement your codes here
        if pred_matrix is None:
            pred_matrix = model_1(data) #pred images, labels(data)list_train
        else:
            pred_matrix = np.concatenate() # 把矩阵合并成一整个矩阵

        if one_hot_label_matrix is None:
            one_hot_label_matrix = torch.scatter(model_1(label),1,) #这应该放真实的样本标签 groundtruth，应该是去数据集取的吧
        else:
            one_hot_label_matrix = np.concatenate()

        pred = torch.argmax(pred, axis=1) #返回指定维度最大值的序号
        if pred_array is None:
            pred_array = pred_matrix(pred)#   想要从训练中的model1取出最大的预测值 pred
        else:
            pred_array = np.concatenate()

        if label_array is None:
            label_array = one_hot_label_matrix(pred)#
        else:
            label_array = np.concatenate()

        


        
        
    # print training statistics 
    # read the API document at https://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics to finish your code
    # don't craft your own code
    # calculate average loss and accuracy over an epoch
    
    top_k = 3
    train_loss = train_loss / len(train_loader.dataset)
    train_acc = 100. * accuracy_score(label_array, pred_array) #label_array, pred_array对应值，对得上对不上
    train_auc = roc_auc_score(one_hot_label_matrix, pred_matrix , multi_class='ovo')  #pred_matrix
    #one_hot_label_matrix shape (n_samples, n_classes).
    top_k_acc = top_k_accuracy_score(label_array, pred_matrix , k=top_k,)
    # append the value of the metric to the list
    train_loss_list.append(train_loss.cpu().detach().numpy())
    train_acc_list.append(train_acc)
    train_auc_list.append(train_auc)
    train_top_k_acc_list.append(top_k_acc)
    
    logging.info('Epoch: {} \tTraining Loss: {:.6f} \tTraining Acc: {:.2f}% \t top {} Acc: {:.2f}% \t AUC Score: {:.4f}'.format(
        epoch+1, 
        train_loss,
        train_acc,
        top_k,
        top_k_acc,
        train_auc,
        ))

#### Visualize the training process of the model_1
Please read the [document](https://matplotlib.org/stable/tutorials/introductory/pyplot.html#sphx-glr-tutorials-introductory-pyplot-py) to finish the training process visualization.
For more information, please refer to the [document](https://matplotlib.org/stable/tutorials/index.html)
##### Plot the change of the loss of model_1 during training

In [None]:
epochs_list = list(range(1,n_epochs+1))
plt.figure(figsize=(20, 8))
plt.plot(epochs_list, train_loss_list)
plt.title('Model_1 loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train'], loc='upper right')
plt.show()

##### Plot the change of the accuracy of model_1 during training

In [None]:
plt.figure(figsize=(20, 8))
plt.plot(epochs_list, train_acc_list)
plt.title('Model_1 accuracy')
plt.ylabel('accuracy')
plt.xlabel('Epoch')
plt.legend(['Train'], loc='upper right')
plt.show()

##### Plot the change of the AUC Score of model_1 during training


In [None]:
plt.figure(figsize=(20, 8))
plt.plot(epochs_list, train_auc_list)
plt.title('Model_1 auc')
plt.ylabel('Auc')
plt.xlabel('Epoch')
plt.legend(['Train'], loc='upper right')
plt.show()

### Train model_2

#### Train model_2

In [None]:
# import scikit-learn packages
# please use the function imported from scikit-learn to metric the process of training of the model
from sklearn.metrics import accuracy_score,roc_auc_score, top_k_accuracy_score
# number of epochs to train the model
n_epochs = 20  # suggest training between 20-50 epochs

model_1.train() # prep model for training

train_loss_list = []
train_acc_list = []
train_auc_list = []
train_top_k_acc_list = []


# GPU check
logging.info(f'GPU is available: {torch.cuda.is_available()}')
if torch.cuda.is_available():
    gpu_num = torch.cuda.device_count()
    logging.info(f"Train model on {gpu_num} GPUs:")
    for i in range(gpu_num):
        print('\t GPU {}.: {}'.format(i,torch.cuda.get_device_name(i)))
    model_1 = model_1.cuda()

for epoch in range(n_epochs):
    # monitor training loss
    train_loss = 0.0
    pred_array = None
    label_array =  None
    
    one_hot_label_matrix = None
    pred_matrix = None
    

    for data, label in train_loader:
        data = data.cuda()
        label = label.cuda()
        # implement your code here
        

        # finish the the computation of variables of metric
        # implement your codes here
        if pred_matrix is None:
            pred_matrix = 
        else:
            pred_matrix = np.concatenate()

        if one_hot_label_matrix is None:
            one_hot_label_matrix = 
        else:
            one_hot_label_matrix = np.concatenate()

        pred = torch.argmax(pred, axis=1)
        if pred_array is None:
            pred_array = 
        else:
            pred_array = np.concatenate()

        if label_array is None:
            label_array = 
        else:
            label_array = np.concatenate()

        


        
        
    # print training statistics 
    # read the API document at https://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics to finish your code
    # don't craft your own code
    # calculate average loss and accuracy over an epoch
    
    top_k = 3
    train_loss = train_loss / len(train_loader.dataset)
    train_acc = 100. * accuracy_score(label_array, pred_array)
    train_auc = roc_auc_score(one_hot_label_matrix, pred_matrix , multi_class='ovo')
    top_k_acc = top_k_accuracy_score(label_array, pred_matrix , k=top_k,)
    # append the value of the metric to the list
    train_loss_list.append(train_loss.cpu().detach().numpy())
    train_acc_list.append(train_acc)
    train_auc_list.append(train_auc)
    train_top_k_acc_list.append(top_k_acc)
    
    logging.info('Epoch: {} \tTraining Loss: {:.6f} \tTraining Acc: {:.2f}% \t top {} Acc: {:.2f}% \t AUC Score: {:.4f}'.format(
        epoch+1, 
        train_loss,
        train_acc,
        top_k,
        top_k_acc,
        train_auc,
        ))

#### Visualize the training process of the model_2
Please read the [document](https://matplotlib.org/stable/tutorials/introductory/pyplot.html#sphx-glr-tutorials-introductory-pyplot-py) to finish the training process visualization.
For more information, please refer to the [document](https://matplotlib.org/stable/tutorials/index.html)
##### Plot the change of the loss of model_2 during training

In [None]:
epochs_list = list(range(1,n_epochs+1))
plt.figure(figsize=(20, 8))
plt.plot(epochs_list, train_loss_list)
plt.title('Model_2 loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train'], loc='upper right')
plt.show()

##### Plot the change of the accuracy of model_2 during training

In [None]:
plt.figure(figsize=(20, 8))
plt.plot(epochs_list, train_acc_list)
plt.title('Model_2 accuracy')
plt.ylabel('accuracy')
plt.xlabel('Epoch')
plt.legend(['Train'], loc='upper right')
plt.show()

##### Plot the change of the AUC Score of model_1 during training


In [None]:
plt.figure(figsize=(20, 8))
plt.plot(epochs_list, train_auc_list)
plt.title('Model_2 auc')
plt.ylabel('Auc')
plt.xlabel('Epoch')
plt.legend(['Train'], loc='upper right')
plt.show()

---
## Test the Trained Network (20 marks)

Test the performance of trained models on test data. Except the total test accuracy, you should calculate the accuracy for each class.

About metrics, in test processing, we will use accuracy, top k accuracy, precision, recall, f1-score and confusion matrix. 

Besides, we will visualize the confusion matrix.

Last but not least, we will compare your implementation of function to compute accuracy with the implementation of scikit-learn.

In [None]:
## define your implementation of function to compute accuracy
def accuracy_score_manual(label_array, pred_array):
    # implement your codes here

### Test model_1

In [None]:
from sklearn.metrics import classification_report,ConfusionMatrixDisplay
# initialize lists to monitor test loss and accuracy
test_loss = 0.0

pred_array = None
label_array =  None

one_hot_label_matrix = None
pred_matrix = None

model_1.eval() # prep model for *evaluation*

for data, label in test_loader:
    data = data.cuda()
    label = label.cuda()
    # implement your code here
    pred = 
    test_loss = 

    if pred_matrix is None:
        pred_matrix = 
    else:
        pred_matrix = 

    if one_hot_label_matrix is None:
        one_hot_label_matrix = 
    else:
        one_hot_label_matrix = 
    pred = torch.argmax(pred, axis=1)
    
    if pred_array is None:
        pred_array = 
    else:
        pred_array = 

    if label_array is None:
        label_array = 
    else:
        label_array = 
# calculate and print avg test loss
test_loss = test_loss / len(test_loader.dataset)
test_acc = accuracy_score(label_array, pred_array)
test_auc = roc_auc_score(one_hot_label_matrix, pred_matrix , multi_class='ovo')
test_top_k3_acc = top_k_accuracy_score(label_array, pred_matrix , k=3)
test_top_k5_acc = top_k_accuracy_score(label_array, pred_matrix , k=5)

logging.info('Test Loss: {:.6f}'.format(test_loss))
logging.info('Test Accuracy: {:.6f}'.format(test_acc))
logging.info('Test top 3 Accuracy: {:.6f}'.format(test_top_k3_acc ))
logging.info('Test top 5 Accuracy: {:.6f}'.format(test_top_k5_acc ))
logging.info('The classification report of test for model_1')
print(classification_report(label_array, pred_array))

In [None]:
ConfusionMatrixDisplay.from_predictions(label_array,pred_array)
plt.show()

In [None]:
## compare your implementation of function to compute accuracy with the implementation of scikit-learn.
your_test_acc = accuracy_score_manual(label_array, pred_array)
assert abs(your_test_acc - test_acc) < 1e-5 , 'Please check your implementation of function to compute accuracy'

### Test model_2

In [None]:
from sklearn.metrics import classification_report,ConfusionMatrixDisplay
# initialize lists to monitor test loss and accuracy
test_loss = 0.0

pred_array = None
label_array =  None

one_hot_label_matrix = None
pred_matrix = None

model_2.eval() # prep model for *evaluation*

for data, label in test_loader:
    data = data.cuda()
    label = label.cuda()
    # implement your code here
    pred = 
    test_loss = 

    if pred_matrix is None:
        pred_matrix = 
    else:
        pred_matrix = 

    if one_hot_label_matrix is None:
        one_hot_label_matrix = 
    else:
        one_hot_label_matrix = 
    pred = torch.argmax(pred, axis=1)
    
    if pred_array is None:
        pred_array = 
    else:
        pred_array = 

    if label_array is None:
        label_array = 
    else:
        label_array = 
# calculate and print avg test loss
test_loss = test_loss / len(test_loader.dataset)
test_acc = accuracy_score(label_array, pred_array)
test_auc = roc_auc_score(one_hot_label_matrix, pred_matrix , multi_class='ovo') #one_hot_label_matrix ytrue
test_top_k3_acc = top_k_accuracy_score(label_array, pred_matrix , k=3)
test_top_k5_acc = top_k_accuracy_score(label_array, pred_matrix , k=5)

logging.info('Test Loss: {:.6f}'.format(test_loss))
logging.info('Test Accuracy: {:.6f}'.format(test_acc))
logging.info('Test top 3 Accuracy: {:.6f}'.format(test_top_k3_acc ))
logging.info('Test top 5 Accuracy: {:.6f}'.format(test_top_k5_acc ))
logging.info('The classification report of test for model_1')
print(classification_report(label_array, pred_array))

In [None]:
ConfusionMatrixDisplay.from_predictions(label_array,pred_array)
plt.show()

In [None]:
## compare your implementation of function to compute accuracy with the implementation of scikit-learn.
your_test_acc = accuracy_score_manual(label_array, pred_array)
assert abs(your_test_acc - test_acc) < 1e-5 , 'Please check your implementation of function to compute accuracy'

---
## Analyze Your Result (20 marks)
Compare the performance of your models with the following analysis. Both English and Chinese answers are acceptable.
1. Does your vanilla MLP overfit to the training data? (5 marks)你的香草MLP是否与训练数据过拟合?

Answer:

2. If yes, how do you observe it? If no, why? (5 marks)如果是，如何观察?如果没有，为什么?

Answer:

3. Is regularized model help prevent overfitting? (5 marks)正则化模型是否有助于防止过拟合

Answer:

4. Generally compare the performance of two models. (5 marks)比较两种型号的性能

Answer:
