<a href="https://colab.research.google.com/github/SJZHZ/Multi-modal-Learning/blob/main/assignment_1_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Assignment 1
## [Section 2]
This assignment will make you familier with 
1. loading and preprocessing data using built-in function
2. how to construct a simple CNN model
3. the training and testing pipeline


In this assignment, you might find some tutorials useful, such as https://pytorch.org/tutorials/beginner/basics/intro.html.

In [1]:
# Import dependencies.
import random
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

In [2]:
# Set up your device 
cuda = torch.cuda.is_available()
device = torch.device("cuda:0" if cuda else "cpu")

In [3]:
# Set up random seed to 1008. Do not change the random seed.
# Yes, these are all necessary when you run experiments!
seed = 1008
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
if cuda:
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.deterministic = True

## 1. Data: MNIST [2 pt]
#### Load the MNIST training and test dataset using $\texttt{torch.utils.data.DataLoader}$ and $\texttt{torchvision.datasets}$. 

This dataset consists of images of handwritten digit, and thus the number of classes is 10. The shape of image in MNIST dataset is (28, 28, 1)

The normalization parameters we will use is (0.1307, 0.3081)

More details please refer to  http://yann.lecun.com/exdb/mnist/.

### 1.1. Load Training Set [1 pt]

In [4]:
# Load the MNIST training set with batch size 128, apply data shuffling and normalization
# test_loader = TODO
channel_mean = 0.1307
channel_std = 0.3081
train_data = datasets.FashionMNIST(     # MNIST的源下载不了，换了FashionMNIST
    root="data",
    train=True,                         # 训练集
    download=True,                      # 能检测本地缓存的数据集，下一次就不用下载了
    transform=transforms.Compose(       # Compose可以压缩两个变换函数到一个对象中，流式地调用
        [transforms.ToTensor(), transforms.Normalize(mean=channel_mean,std=channel_std)])
)
train_loader = torch.utils.data.DataLoader(
    dataset=train_data,
    batch_size = 128,
    shuffle=True
)

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to data/FashionMNIST/raw/train-images-idx3-ubyte.gz


100.0%


Extracting data/FashionMNIST/raw/train-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw/train-labels-idx1-ubyte.gz


100.0%


Extracting data/FashionMNIST/raw/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz


100.0%


Extracting data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz


100.0%

Extracting data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw






### 1.2. Load Test Set [1 pt]

In [5]:
# Load the MNIST test set with batch size 128, apply normalization
# test_loader = TODO
test_data = datasets.FashionMNIST(
    root="data",
    train=False,                        # 测试集
    #download=True,
    transform=transforms.Compose(
        [transforms.ToTensor(), transforms.Normalize(mean=channel_mean,std=channel_std)])
)
test_loader = torch.utils.data.DataLoader(
    dataset=test_data,
    batch_size = 128,
    shuffle=True
)

for X, y in test_loader:
    print(f"Shape of X [N, C, H, W]: {X.shape}")
    print(f"Shape of y: {y.shape} {y.dtype}")
    break

Shape of X [N, C, H, W]: torch.Size([128, 1, 28, 28])
Shape of y: torch.Size([128]) torch.int64


## 2. Models [3 pts]
#### You are going to define two convolutional neural networks which are trained to classify MNIST digits

### 2.1. CNN without Batch Norm [2 pts]

In [6]:
# Fill in the values below that make this network valid for MNIST data
# Hint: to make sure these, you may calculate the shape of x of every line in the forward.
conv1_in_ch = 1             # TODO
# 灰度图只有一个通道
conv2_in_ch = 20            # TODO
# 卷积：前一层输出通道数=后一层输入通道数
fc1_in_features = 50*4*4    # TODO
# 全连接层输入是一个“特征向量”（提前展平），卷积层中每个通道的每个位置都对应全连接层中的一个特征
fc2_in_features = 500       # TODO
# 全连接：前一层输出特征数=后一层输入特征数
n_classes = 10              # TODO

In [7]:
class NetWithoutBatchNorm(nn.Module):
    def __init__(self):
        super(NetWithoutBatchNorm, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=conv1_in_ch, out_channels=20, kernel_size=5, stride=1)
        self.conv2 = nn.Conv2d(in_channels=conv2_in_ch, out_channels=50, kernel_size=5, stride=1)
        self.fc1 = nn.Linear(in_features=fc1_in_features, out_features=500)
        self.fc2 = nn.Linear(in_features=fc2_in_features, out_features=n_classes)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.max_pool2d(x, kernel_size=2, stride=2)
        x = F.relu(self.conv2(x))
        x = F.max_pool2d(x, kernel_size=2, stride=2)
        x = x.view(-1, fc1_in_features) # reshaping
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        
        # Return the log_softmax of x.
        #print(F.softmax(x, dim=1)[0])      # 按行（逐向量）作归一化 √
        #print(F.softmax(x, dim=0)[0])      # 按列（逐维度）作归一化 X
        return F.log_softmax(x, dim=1)      # TODO
        # 把输出归一化为概率

### 2.2. CNN with Batch Norm [1 pt]

In [8]:
# Fill in the values below that make this network valid for MNIST data
# Hint: to make sure these, you may calculate the shape of x of every line in the forward.
conv1_bn_size = 20          # TODO
# 20个通道，分别对每个通道Batch Norm
conv2_bn_size = 50          # TODO
# 50个通道，分别对每个通道Batch Norm
fc1_bn_size = 500           # TODO
# 500个特征，分别对每个特征Batch Norm

In [9]:
# Define the CNN with architecture explained in Part 2.2
class NetWithBatchNorm(nn.Module):
    def __init__(self):
        super(NetWithBatchNorm, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=conv1_in_ch, out_channels=20, kernel_size=5, stride=1)
        self.conv1_bn = nn.BatchNorm2d(conv1_bn_size)
        self.conv2 = nn.Conv2d(in_channels=conv2_in_ch, out_channels=50, kernel_size=5, stride=1)
        self.conv2_bn = nn.BatchNorm2d(conv2_bn_size)
        self.fc1 = nn.Linear(in_features=fc1_in_features, out_features=500)
        self.fc1_bn = nn.BatchNorm1d(fc1_bn_size)
        self.fc2 = nn.Linear(in_features=fc2_in_features, out_features=n_classes)

    def forward(self, x):
        x = F.relu(self.conv1_bn(self.conv1(x)))
        x = F.max_pool2d(x, kernel_size=2, stride=2)
        x = F.relu(self.conv2_bn(self.conv2(x)))
        x = F.max_pool2d(x, kernel_size=2, stride=2)
        x = x.view(-1, fc1_in_features)
        x = F.relu(self.fc1_bn(self.fc1(x)))
        x = self.fc2(x)

        # Return the log_softmax of x.
        return F.log_softmax(x, dim=1)      # TODO


## 3. Training & Evaluation [4 pts]

### 3.1. Define training method [1 pt]

In [10]:
def train(model, device, train_loader, optimizer, epoch, log_interval = 100):
    # Set model to training mode
    model.train()
    # Loop through data points
    for batch_idx, (data, target) in enumerate(train_loader):
    
        # Send data and target to device
        x = data.to(device)
        y = target.to(device)       # 整数，标签表示类别
        # TODO
        
        # Zero out the optimizer
        optimizer.zero_grad()
        # TODO
        
        # Pass data through model
        y_pred = model(x)           # 10维“概率”向量
        # TODO
        
        # Compute the negative log likelihood loss
        loss = F.nll_loss(y_pred, y)
        # nll_loss接受一组预测向量和一组标签，返回一个数值
        # 组内每行（对应一个数据）：按标签指定的下标取出预测向量的对应维度分量，并取反
        # 一列即一组（对应一个batch）：取上述结果的平均值并返回
        # TODO
        
        # Backpropagate loss
        loss.backward()
        # TODO
        
        # Make a step with the optimizer
        optimizer.step()
        # TODO
    
        if batch_idx % log_interval == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()))

### 3.2. Define test method [1 pt]

In [11]:
# Define test method
def test(model, device, test_loader):
    # Set model to evaluation mode
    model.eval()
    # Variable for the total loss 
    test_loss = 0
    # Counter for the correct predictions
    num_correct = 0
    
    # don't need autograd for eval
    with torch.no_grad():
        # Loop through data points
        for data, target in test_loader:

            # Send data to device
            x = data.to(device)
            y = target.to(device)
            # TODO
            
            # Pass data through model
            y_pred = model(x)
            # TODO
            
            # Compute the negative log likelihood loss with reduction='sum' and add to total test_loss
            test_loss += F.nll_loss(y_pred, y, reduction='sum')
            # TODO
            
            # Get predictions from the model for each data point
            pred = y_pred.argmax(1)             # 逐行取argmax
            # TODO
            
            # Add number of correct predictions to total num_correct
            num_correct += (pred == y).sum()
            # TODO
    
    # Compute the average test_loss
    avg_test_loss = test_loss / len(test_loader.dataset)    # TODO
    
    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        avg_test_loss, num_correct, len(test_loader.dataset),
        100. * num_correct / len(test_loader.dataset)))

### 3.3 Train NetWithoutBatchNorm() [1 pt]

In [12]:
# Deifne model and sent to device
model = NetWithoutBatchNorm().to(device)                                    # TODO

# Optimizer: SGD with learning rate of 1e-2 and momentum of 0.5
optimizer = torch.optim.SGD(model.parameters(), lr=1e-2, momentum=0.5)      # TODO

# Training loop with 10 epochs
for epoch in range(1, 10 + 1):

    # Train model
    train(model, device, train_loader, optimizer, epoch)
    # TODO

    # Test model
    test(model, device, test_loader)
    # TODO


Test set: Average loss: 0.6528, Accuracy: 7491/10000 (75%)


Test set: Average loss: 0.4984, Accuracy: 8250/10000 (82%)


Test set: Average loss: 0.4518, Accuracy: 8408/10000 (84%)


Test set: Average loss: 0.4128, Accuracy: 8535/10000 (85%)


Test set: Average loss: 0.4104, Accuracy: 8514/10000 (85%)


Test set: Average loss: 0.3783, Accuracy: 8636/10000 (86%)


Test set: Average loss: 0.3585, Accuracy: 8712/10000 (87%)


Test set: Average loss: 0.3440, Accuracy: 8738/10000 (87%)


Test set: Average loss: 0.3444, Accuracy: 8755/10000 (88%)


Test set: Average loss: 0.3738, Accuracy: 8586/10000 (86%)



### 3.4 Train NetWithBatchNorm() [1 pt]

In [13]:
# Deifne model and sent to device
model = NetWithBatchNorm().to(device)                                       # TODO

# Optimizer: SGD with learning rate of 1e-2 and momentum of 0.5
optimizer = torch.optim.SGD(model.parameters(), lr=1e-2, momentum=0.5)      # TODO

# Training loop with 10 epochs
for epoch in range(1, 10 + 1):
    
    # Train model
    train(model, device, train_loader, optimizer, epoch)
    # TODO
    
    # Test model
    test(model, device, test_loader)
    # TODO


Test set: Average loss: 0.3746, Accuracy: 8655/10000 (87%)


Test set: Average loss: 0.3288, Accuracy: 8803/10000 (88%)


Test set: Average loss: 0.2964, Accuracy: 8942/10000 (89%)


Test set: Average loss: 0.3191, Accuracy: 8839/10000 (88%)


Test set: Average loss: 0.2936, Accuracy: 8907/10000 (89%)


Test set: Average loss: 0.3118, Accuracy: 8834/10000 (88%)


Test set: Average loss: 0.2659, Accuracy: 9060/10000 (91%)


Test set: Average loss: 0.2620, Accuracy: 9091/10000 (91%)


Test set: Average loss: 0.2850, Accuracy: 8993/10000 (90%)


Test set: Average loss: 0.2583, Accuracy: 9101/10000 (91%)



## 4. Empirically, which of the models achieves higher accuracy faster? [1pt]

Answer: 

In [14]:
NetWithBatchNorm

__main__.NetWithBatchNorm