# DenseNet with MNIST Dataset

`Author: YUAN Yanzhe`

- This notebook is a reproduction of the [DenseNet paper](https://openaccess.thecvf.com/content_cvpr_2017/html/Huang_Densely_Connected_Convolutional_CVPR_2017_paper.html).
  - If you want to do parameter fine-tuning, setting hyperparameters on the entrance of the model is recommended.
    - e.g. def \_\_init\_\_(param) 
- The code runs on Google Colab, GPU mode

一些细节：
- DenseNet和ResNet最大的区别是：DenseNet里前一模块的输出不是像ResNet那样和当前模块的输出相加（y=y+identity(x)），而是在feature_num这一维度上连结(cat on dim=1)。这样模块输出可以直接传入当前模块后面的层。
- DenseNet的主要构建模块是稠密块（dense block）和过渡层（transition layer）。前者定义了输入和输出是如何连结的，后者则用来控制通道数，使之不过大。
  - cnn_block用ResNet改良版的“BN-激活-卷积”结构，不改变image size改变特征数。dense block由多个这样的cnn_block结构组成。
  - dense block的输入是cnn_block的数量，输入的特征数，以及增长率（即每个cnn_block的输出feature_num，注意，这里cnn_block本体最后一个conv的输出一直是不变的（即增长率），最后需要和原输入在feature_num维度cat作为cnn block的输出以及下一个cnn block的输入，以此达到连接的效果。所以最后cnn_block的输出是输入特征数+增长率，整个一个dense block的输出就是：输入特征+cnn_block数量\*增长率）
  - transition block就是控制减少特征数的。主要是一个11卷积和平均池化，前者减少特征数后者减半image size。
- Dense Net网络结构：
  - densenet_layer
    - 首先一个类似于resnet第一层的结构
    - 再来4个dense block-transition_block结构
    - 全局池化层（BN-ReLU-Pool）: 充当最后一个transition block
  - flatten：去除最后2维。
  - 特征维全连接层：nn.Linear(~,10)。 这里的feature_num是248，当然，可以更大
- 和之前的模型框架不同，这里多加入了验证环节，用一个数据输入网络得出输出，用于debug。此时，模型的设计更重要，不止是需要forward里顺序对。
``` python 
X = torch.rand((1, 1, 96, 96))
for layer in net.children():
    X = layer(X)
    print(' output shape:\t', X.shape)
```
  - 相对应的，由于我们用的是class方法继承nn.Module来定义类，且这里用了net.children()或者net.named_children()方法，打印的是net中继承nn.Module的第一层级（想象成一颗树结构），根据这个层级来运行net。
    - 所以，我们需要把flatten层实现，不然无法运行
    - 并且，不要self定义嵌入大结构的小结构，用self定义最外面的（比如sequence）结构。因为net也会按是否有继承nn.Module的子结构依次打印。如果像之前googlenet的主网络init中的定义方法一样，我们的net包含一些self.blocks，最后又将blocks串联变成一个self.block_layer，与fc和fallten等layer一起定义。这时候我们的self.blocks需要去掉self，不然用验证无法进行，会断路。
      - 虽然但是，这样forward是会正常运行的所以用数据集进行测试的时候是无所谓的。这里为了能够验证，并且方便debug，在设计网络的时候可以注意一下。
     

In [2]:
from google.colab import drive
drive.mount('/content/drive')


Mounted at /content/drive


In [2]:
import os
os.chdir('/content/drive/MyDrive/Colab Notebooks/d2dl_pytorch')

In [3]:
# Import Packages
import torch
from torch import nn as nn
from torch import optim as optim
from torch.utils import data as Data

import torchvision
from torchvision import datasets
from torchvision import transforms

import numpy as np
import pandas as pd 
import time

import d2lzh_pytorch as d2dl

print(torch.__version__)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('device on:', device)

1.7.0+cu101
device on: cuda


In [33]:
# Hyperparameters
batch_size = 256
num_epochs = 5
learning_rate = 0.001

num_classes = 10

# network hyperparameter
num_channels, growth_rate = 64, 32  # num_channels为当前的通道数
num_convs_in_dense_blocks = [4, 4, 4, 4]

# Load Data
# non-default argument follows default argument, has to define non-default value first
def load_data_from_mnist(batch_size, resize=None, root=''):
    trans = []
    if resize:
        trans.append(transforms.Resize(resize))
    trans.append(transforms.ToTensor())
    transform = transforms.Compose(trans)

    train_data = torchvision.datasets.MNIST(root=root,train=True,transform=transform,download=False)
    test_data = torchvision.datasets.MNIST(root=root,train=False,transform=transform,download=False)
    train_iterator = Data.DataLoader(train_data,batch_size=batch_size,shuffle=True,num_workers=4)
    test_iterator = Data.DataLoader(test_data,batch_size=batch_size,shuffle=True,num_workers=4)

    return train_iterator, test_iterator

def load_data_fashion_mnist(batch_size, resize=None, root=''):
    trans = []
    if resize:
        trans.append(torchvision.transforms.Resize(size=resize))
    trans.append(torchvision.transforms.ToTensor())

    transform = torchvision.transforms.Compose(trans)
    mnist_train = torchvision.datasets.FashionMNIST(root=root, train=True, download=True, transform=transform)
    mnist_test = torchvision.datasets.FashionMNIST(root=root, train=False, download=True, transform=transform)

    train_iter = torch.utils.data.DataLoader(mnist_train, batch_size=batch_size, shuffle=True, num_workers=4)
    test_iter = torch.utils.data.DataLoader(mnist_test, batch_size=batch_size, shuffle=False, num_workers=4)

    return train_iter, test_iter

#train_iterator, test_iterator = load_data_fashion_mnist(batch_size,resize=96)
train_iterator, test_iterator = load_data_from_mnist(batch_size,resize=96)

# Define Model
class flattenLayer(nn.Module):
    # this can be realized by y.view(x.shape[0],-1) in forward()
    # however, in order to view the parameter changing over each block, we define it
    def __init__(self):
        super(flattenLayer,self).__init__()
    def forward(self,x):
        return x.view(x.shape[0],-1)

class globalAvgPool(nn.Module):
    # the function of global average pooling is to reduce the image size to (1,1),
    # which is convenient to reduce dimension later
    def __init__(self):
        super(globalAvgPool,self).__init__()
    def forward(self, x):
        return nn.functional.avg_pool2d(x,x.size()[2:])

class denseBlock(nn.Module):
    # DenseBlock consists of a number of (num) cnn blocks, vertically sequentially every block is denseBlock-like-connected.
    # the image size is remained but the feature_num is accumulated through cnn blocks. 
    # in the feature_num dim the current layer is concatenate with the previous layer (f_num = f_num + f_num_previous)
    # inside every cnn blocks, a optimized structrue for CNN block in ResNet: (bn-relu-cnn) is used.
    def __init__(self, num_cnns, c_in, c_out):
        super(denseBlock,self).__init__()
        block = []
        for i in range(num_cnns):
            feature_in = c_in + i * c_out  
            block.append(self.cnn_block(feature_in,c_out))  # c_in + i * c_out is accumulated 
        self.denseBlock_list = nn.ModuleList(block)
        self.c_out = c_in + num_cnns * c_out  # record the final output feature_num in one dense block
         
    def cnn_block(self, c_in, c_out):
        blk = nn.Sequential(
            nn.BatchNorm2d(c_in),
            nn.ReLU(),
            nn.Conv2d(c_in,c_out,kernel_size=3,padding=1)
        )
        return blk

    def forward(self, x):
        for block in self.denseBlock_list:
            y = block(x)
            x = torch.cat((x,y),dim=1)   # this is where dense connection happens
        return x

class denseNet(nn.Module):
    # DenseNet consists of transition_block and denseBlock. the details is as follows:
    # the first part is like GoogLeNet and ResNet: 7*7conv-bn-relu-pool
    # the second part is 4 pair of denseBlock-transition_block (like 4 resnet block in ResNet)
    # we use transition block to reduce the image size by half (ResNet uses just a residual block with stride 2)
    # the third layer is a globalAvgPooling based layer to reduce the image size to 1*1 to replace fc(fnn).
    # the fourth part is the linear layer to reduce feature_num and feed into softmax.
    def __init__(self, num_channels, growth_rate, num_convs_in_dense_blocks): 
        super(denseNet,self).__init__()
        block_1 = nn.Sequential(  # just like ResNet
            nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3),
            nn.BatchNorm2d(64), 
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        )
        block_2 = nn.Sequential()
        for i, num_convs in enumerate(num_convs_in_dense_blocks):
            dense_block = denseBlock(num_convs,num_channels,growth_rate)
            block_2.add_module('DenseBlock_%d' %i, dense_block)  # add dense block to increase features
            num_channels = dense_block.c_out  # record the final output feature_num in one dense block
            #print('num_channels:',num_channels)
            
            if i != (len(num_convs_in_dense_blocks)-1):
                block_2.add_module('TransitionBlock_%d' %i, self.transitions_block(num_channels,num_channels//2))  # add transition block to reduce features. 
                num_channels = num_channels//2
        global_pool_block = nn.Sequential(
            nn.BatchNorm2d(num_channels),
            nn.ReLU(),
            globalAvgPool(),
        )
        self.denseNet_layer = nn.Sequential(block_1,block_2,global_pool_block)
        self.flatten_layer = flattenLayer()
        self.fc_layer = nn.Linear(num_channels,10)
    
    def transitions_block(self, c_in, c_out):
        blk = nn.Sequential(
            nn.BatchNorm2d(c_in),
            nn.ReLU(),
            nn.Conv2d(c_in,c_out,kernel_size=1),
            nn.AvgPool2d(kernel_size=2,stride=2),  # reduce the image size by half
        )
        return blk

    def forward(self, x):
        y = self.denseNet_layer(x)
        y = self.flatten_layer(y)
        y = self.fc_layer(y)  
        return y

net = denseNet(num_channels,growth_rate,num_convs_in_dense_blocks)
print(net)

loss_func = nn.CrossEntropyLoss()
optimizor = optim.Adam(net.parameters(), lr=learning_rate)



denseNet(
  (denseNet_layer): Sequential(
    (0): Sequential(
      (0): Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3))
      (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU()
      (3): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    )
    (1): Sequential(
      (DenseBlock_0): denseBlock(
        (denseBlock_list): ModuleList(
          (0): Sequential(
            (0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (1): ReLU()
            (2): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          )
          (1): Sequential(
            (0): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (1): ReLU()
            (2): Conv2d(96, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          )
          (2): Sequential(
            (0): BatchNorm2d(128, eps=1e-05, momentum

In [34]:
# Take a look at the network
X = torch.rand((1, 1, 96, 96))
for layer in net.children():
    X = layer(X)
    print(' output shape:\t', X.shape)


 output shape:	 torch.Size([1, 248, 1, 1])
 output shape:	 torch.Size([1, 248])
 output shape:	 torch.Size([1, 10])


In [25]:
# Train Model
def evaluate_model(net, test_iterator, device):
    net = net.to(device)
    print('testing on:', device)
    with torch.no_grad():
        correct,num_exp = 0.0,0
        for X,y in test_iterator:
            if isinstance(net, nn.Module):
                net.eval()  # eval mode will shut off dropout function
                correct += (net(X.to(device)).argmax(1)==y.to(device)).float().sum().cpu().item()
                net.train()
            else: 
                print('is this your self-defined nn module?? we are not considering GPU if so')
                if('is_training' in net.__code__.co_varnames): 
                    acc_sum += (net(X, is_training=False).argmax(dim=1) == y).float().sum().item() 
                else:
                    acc_sum += (net(X).argmax(dim=1) == y).float().sum().item() 
            num_exp += y.size(0)
     
    return correct/num_exp*100

def train_model(num_epochs, train_iterator, test_iterator, loss_func, optimizor, net, device):
    net = net.to(device)
    print('training on:', device)
    for epoch in range(num_epochs):
        total_loss,total_batch,total_acc,total_num,start_time = 0.0,0,0.0,0,time.time()
        for X, y in train_iterator:
            X = X.to(device)
            y = y.to(device)

            output = net(X)
            loss = loss_func(output,y)
            optimizor.zero_grad()
            loss.backward()
            optimizor.step()
            
            total_loss += loss.cpu().item()
            total_batch += 1
            total_acc += (output.argmax(1)==y).sum().cpu().item()
            total_num += y.size(0)
        
        test_acc = evaluate_model(net, test_iterator, device)
        print('Epoch: {}, Average loss: {:.4f}, Average accuracy: {:.2f}%, Test Accuracy: {:.2f}%, time: {:.1f}sec' \
              .format(epoch+1, total_loss/total_batch, total_acc/total_num*100, test_acc, time.time()-start_time))

train_model(num_epochs,train_iterator,test_iterator,loss_func,optimizor,net,device)
        
# Prediction

training on: cuda
testing on: cuda
Epoch: 1, Average loss: 0.1517, Average accuracy: 96.65%, Test Accuracy: 98.46%, time: 35.8sec
testing on: cuda
Epoch: 2, Average loss: 0.0360, Average accuracy: 98.88%, Test Accuracy: 99.13%, time: 35.9sec
testing on: cuda
Epoch: 3, Average loss: 0.0273, Average accuracy: 99.18%, Test Accuracy: 98.35%, time: 35.7sec
testing on: cuda
Epoch: 4, Average loss: 0.0210, Average accuracy: 99.36%, Test Accuracy: 98.90%, time: 35.9sec
testing on: cuda
Epoch: 5, Average loss: 0.0199, Average accuracy: 99.40%, Test Accuracy: 99.22%, time: 35.8sec
