## 基于MindSpore框架的DenseNet案例实现

### 1 模型简介

DenseNet模型于2017年在论文《Densely Connected Convolutional Networks》中被提出。DenseNet通过特征在channel上的连接来实现特征重用（feature reuse）。这些特点让DenseNet在参数和计算成本更少的情形下实现比ResNet更优的性能，DenseNet也因此斩获CVPR 2017的最佳论文奖。本篇文章首先介绍DenseNet的原理以及网路架构，然后讲解DenseNet在MindSpore上的实现。

#### 1.1 模型结构
相比ResNet，DenseNet提出了一个更激进的密集连接机制：即互相连接所有的层，具体来说就是每个层都会接受其前面所有层作为其额外的输入。图1为ResNet网络的连接机制，作为对比，图1为DenseNet的密集连接机制。可以看到，ResNet是每个层与前面的某层（一般是2~3层）短路连接在一起，连接方式是通过元素级相加。而在DenseNet中，每个层都会与前面所有层在通道维度上连接在一起，并作为下一层的输入。对于一个L层的网络，DenseNet共包含$ \frac{L(L+1)}{2}$个连接，相比ResNet，这是一种密集连接。而且DenseNet是直接连接来自不同层的特征图，这可以实现特征重用，提升效率，这一特点是DenseNet与ResNet最主要的区别。

<center>
    <img src="https://s1.ax1x.com/2022/09/24/xAeebR.jpg" alt="image-2022092401" style="zoom:75%;" />
    <br>
    <div style="color:orange;
    display: inline-block;
    color: #999;
    padding: 2px;">图1 一个5层的密集块，增长率为k=4。每一层都将所有前面的特征图作为输入。</div>
</center>

传统的网络在L层的输出为：
$$
x_{l}=H_{l}(x_{l−1})
$$

而对于ResNet，增加了来自上一层输入的identity函数：

$$
x_{l} = H_{l}(x_{l−1})+x_{l-1}
$$

在DenseNet中，会连接前面所有层作为输入：
$$
x_{l}=H_{l}([x_{0},x_{1},...,x_{l−1}])
$$

其中，上面的$H_{l}( ⋅ )$代表是非线性转化函数（non-liear transformation），它是一个组合操作，其可能包括一系列的BN(Batch Normalization)，ReLU，Pooling及Conv操作。注意这里l层与 $l−1$层之间可能实际上包含多个卷积层。

CNN网络一般要经过Pooling或者stride>1的Conv来降低特征图的大小，而DenseNet的密集连接方式需要特征图大小保持一致。为了解决这个问题，DenseNet网络中使用DenseBlock+Transition的结构，其中DenseBlock是包含很多层的模块，每个层的特征图大小相同，层与层之间采用密集连接方式。而Transition模块是连接两个相邻的DenseBlock，并且通过Pooling使特征图大小降低。图2给出了具有三个密集块的深度 DenseNet，各个DenseBlock之间通过Transition连接在一起。

<center>
    <img src="https://s1.ax1x.com/2022/09/24/xAu3xU.jpg" alt="image-2022092402" style="zoom:75%;" />
    <br>
    <div style="color:orange;
    display: inline-block;
    color: #999;
    padding: 2px;">图2 具有三个密集块的深度 DenseNet。 两个相邻块之间的层称为过渡层，并通过卷积和池化改变特征图大小。</div>
</center>


#### 1.2 DenseNet优点

a) 相比ResNet拥有更少的参数数量；

b) 旁路加强了特征的重用；

c) 网络更易于训练,并具有一定的正则效果；

d) 缓解了gradient vanishing和model degradation的问题。


## 2. 案例实现

### 2.1 数据集下载与准备
+ Mini-ImageNet：https://drive.google.com/drive/folders/17a09kkqVivZQFggCw9I_YboJ23tcexNM

+ 下载的数据集根目录dataset/下有train、val、test文件夹
+ 但由于该数据集用于元学习和小样本学习，数据集的划分不是从每个类别中进行采样的，所以需要进一步对下载的数据集重新进行train、val、test的划分，使train、val、test都包含全部的分类

In [1]:
import os, random, shutil

# 先把上述链接中下载好的val、test目录下的所有文件夹都剪切至train目录，完成合并
# 如下代码是使train、val、test都含有全部100个分类，并针对每个分类按照6:2:2的比例划分

def moveFile(fileDir, tarDir, rate):
    pathDir = os.listdir(fileDir)  # 取图片的原始路径
    filenumber = len(pathDir)
    if filenumber > 360: # 重新划分好的train目录下每个分类的图片数量=360，此行是为了防止重复进行划分
        rate = rate  # 自定义抽取图片的比例，比方说100张抽10张，那就是0.1
        picknumber = int(filenumber * rate)  # 按照rate比例从文件夹中取一定数量图片
        sample = random.sample(pathDir, picknumber)  # 随机选取picknumber数量的样本图片
        for name in sample:
            shutil.move(fileDir + '/' + name, tarDir + '/' + name)

data_dir = "dataset/train"

for classes_num in os.listdir(data_dir):
    src_dir = os.path.join("./dataset/train/",classes_num)
    dst_dir_val = os.path.join("./dataset/val/", classes_num)
    dst_dir_test = os.path.join("./dataset/test/", classes_num)
    
    if not os.path.exists(dst_dir_val):
        os.mkdir(dst_dir_val)
    
    if not os.path.exists(dst_dir_test):
        os.mkdir(dst_dir_test)
    
    moveFile(src_dir, dst_dir_val, 0.2)
    moveFile(src_dir, dst_dir_test, 0.25)

+ 重新划分后，train、val、test每个文件夹下目录结构如下：
```
.
└── train/val/test 
     ├── class1
     │    ├── 000000000001.jpg
     │    ├── 000000000002.jpg
     │    ├── ...
     ├── class2
     │    ├── 000000000001.jpg
     │    ├── 000000000002.jpg
     │    ├── ...
     ├── class3
     │    ├── 000000000001.jpg
     │    ├── 000000000002.jpg
     │    ├── ...
     ├── class100
          ├── 000000000001.jpg
          ├── 000000000002.jpg
          ├── ...
```

### 2.2 数据集创建
在创建数据集时使用了mindspore框架提供的ImageFolderDataset，该方法可以从从树状结构的文件目录中读取图片构建源数据集，也就是在2.1中处理好的train/val/test目录，且这3个目录下都包含100个分类目录，同一个分类目录的所有图片都会被分配相同的label，由此即可返回生成好的两列数据集：\[image, label\]。此外数据集创建时还进行了包括数据增强，归一化等transfrom定义，之后通过mindspore的map方法来分别完成样本和label的transform映射。

In [2]:
import os
import math
import numpy as np
from PIL import Image, ImageFile
from mindspore import dtype as mstype
import mindspore.dataset as de
import mindspore.dataset.vision as vision_C
import mindspore.dataset.transforms as normal_C
from mindspore.dataset.vision import Inter

ImageFile.LOAD_TRUNCATED_IMAGES = True

def classification_dataset(data_dir, image_size, per_batch_size, max_epoch, rank,group_size,
                           mode='train', input_mode='folder', root='', num_parallel_workers=None,
                           shuffle=None, sampler=None, class_indexing=None, drop_remainder=True,
                           transform=None, target_transform=None):
   

    mean = [0.485 * 255, 0.456 * 255, 0.406 * 255]
    std = [0.229 * 255, 0.224 * 255, 0.225 * 255]

    if transform is None:
        if mode == 'train':
            transform_img = [
                vision_C.Decode(),
                vision_C.Resize(image_size, interpolation=Inter.NEAREST),
                vision_C.RandomHorizontalFlip(prob=0.5),
                vision_C.RandomColorAdjust(brightness=0.4, contrast=0.4, saturation=0.4),
                vision_C.Normalize(mean=mean, std=std),
                vision_C.HWC2CHW()]
        else:
            transform_img = [
                vision_C.Decode(),
                vision_C.Resize(image_size, interpolation=Inter.NEAREST),
                vision_C.Normalize(mean=mean, std=std),
                vision_C.HWC2CHW()]
    else:
        transform_img = transform

    if target_transform is None:
        transform_label = [normal_C.TypeCast(mstype.int32)]
    else:
        transform_label = target_transform

    de_dataset = de.ImageFolderDataset(data_dir, num_parallel_workers=num_parallel_workers,
                                       shuffle=shuffle, sampler=sampler, class_indexing=class_indexing,
                                       num_shards=group_size, shard_id=rank)

    de_dataset = de_dataset.map(input_columns="image", num_parallel_workers=8, operations=transform_img)
    de_dataset = de_dataset.map(input_columns="label", num_parallel_workers=8, operations=transform_label)

    columns_to_project = ["image", "label"]
    de_dataset = de_dataset.project(columns=columns_to_project)
    de_dataset = de_dataset.batch(per_batch_size, drop_remainder=drop_remainder)
    de_dataset = de_dataset.repeat(1)

    return de_dataset

In [3]:
if __name__ == '__main__':
    data_dir = "dataset/train"
    train_dataset = classification_dataset(data_dir, image_size=[224, 224],per_batch_size=4, max_epoch=20, rank=0, group_size=1)
    for item, (image, label) in enumerate(train_dataset):
        if item < 5:
            print(f"Shape of image [N, C, H, W]: {image.shape} {image.dtype}",'---',f"Shape of label [N, C, H, W]: {label.shape} {label.dtype}")

Shape of image [N, C, H, W]: (4, 3, 224, 224) Float32 --- Shape of label [N, C, H, W]: (4,) Int32
Shape of image [N, C, H, W]: (4, 3, 224, 224) Float32 --- Shape of label [N, C, H, W]: (4,) Int32
Shape of image [N, C, H, W]: (4, 3, 224, 224) Float32 --- Shape of label [N, C, H, W]: (4,) Int32
Shape of image [N, C, H, W]: (4, 3, 224, 224) Float32 --- Shape of label [N, C, H, W]: (4,) Int32
Shape of image [N, C, H, W]: (4, 3, 224, 224) Float32 --- Shape of label [N, C, H, W]: (4,) Int32


### 2.3 模型构建

In [4]:
import math
from functools import reduce
from collections import OrderedDict
import mindspore.nn as nn
import mindspore as ms
from mindspore import Tensor
from mindspore.ops import operations as P
from mindspore.common import initializer as init
__all__ = ["DenseNet121", "DenseNet100"]

+ 定义Kaiming初始化函数

本案例在构建Densenet121网络时，需要计算的Kaiming初始化函数，根据Kaiming Normal计算公式：$ \frac{gain}{\sqrt{fan}}$，具体流程首先时定义了KamingInit类，并通过_calculate_gain函数计算公式分子gain，再通过定义继承了KamingInit类的KamingNormal类，完成分母fan的计算，完成针对数组的Kaiming初始化函数。

In [5]:
def _select_fan(array, mode):
    mode = mode.lower()
    valid_modes = ['fan_in', 'fan_out']
    if mode not in valid_modes:
        raise ValueError("Mode {} not supported, please use one of {}".format(mode, valid_modes))

    fan_in, fan_out = _calculate_in_and_out(array)
    return fan_in if mode == 'fan_in' else fan_out

def _calculate_gain(nonlinearity, param=None):
    linear_fns = ['linear', 'conv1d', 'conv2d', 'conv3d', 'conv_transpose1d', 'conv_transpose2d', 'conv_transpose3d']
    if nonlinearity in linear_fns or nonlinearity == 'sigmoid':
        return 1
    if nonlinearity == 'tanh':
        return 5.0 / 3
    if nonlinearity == 'relu':
        return math.sqrt(2.0)
    if nonlinearity == 'leaky_relu':
        if param is None:
            negative_slope = 0.01
        elif not isinstance(param, bool) and isinstance(param, int) or isinstance(param, float):
            negative_slope = param
        else:
            raise ValueError("negative_slope {} not a valid number".format(param))
        return math.sqrt(2.0 / (1 + negative_slope ** 2))

    raise ValueError("Unsupported nonlinearity {}".format(nonlinearity))
    
    
def _assignment(arr, num):
    if arr.shape == ():
        arr = arr.reshape((1))
        arr[:] = num
        arr = arr.reshape(())
    else:
        if isinstance(num, np.ndarray):
            arr[:] = num[:]
        else:
            arr[:] = num
    return arr


def _calculate_in_and_out(arr):
    dim = len(arr.shape)
    if dim < 2:
        raise ValueError("If initialize data with xavier uniform, the dimension of data must greater than 1.")

    n_in = arr.shape[1]
    n_out = arr.shape[0]

    if dim > 2:
        counter = reduce(lambda x, y: x * y, arr.shape[2:])
        n_in *= counter
        n_out *= counter
    return n_in, n_out


class KaimingInit(init.Initializer):
    def __init__(self, a=0, mode='fan_in', nonlinearity='leaky_relu'):
        super(KaimingInit, self).__init__()
        self.mode = mode
        self.gain = _calculate_gain(nonlinearity, a)

    def _initialize(self, arr):
        pass

class KaimingUniform(KaimingInit):
    def _initialize(self, arr):
        fan = _select_fan(arr, self.mode)
        bound = math.sqrt(3.0) * self.gain / math.sqrt(fan)
        data = np.random.uniform(-bound, bound, arr.shape)

        _assignment(arr, data)

class KaimingNormal(KaimingInit):
    def _initialize(self, arr):
        fan = _select_fan(arr, self.mode)
        std = self.gain / math.sqrt(fan)
        data = np.random.normal(0, std, arr.shape)

        _assignment(arr, data)


def default_recurisive_init(custom_cell):
    for _, cell in custom_cell.cells_and_names():
        if isinstance(cell, nn.Conv2d):
            cell.weight.set_data(init.initializer(KaimingUniform(a=math.sqrt(5)), cell.weight.shape, cell.weight.dtype))
            if cell.bias is not None:
                fan_in, _ = _calculate_in_and_out(cell.weight.asnumpy())
                bound = 1 / math.sqrt(fan_in)
                cell.bias.set_data(Tensor(np.random.uniform(-bound, bound, cell.bias.shape), cell.bias.dtype))
        elif isinstance(cell, nn.Dense):
            cell.weight.set_data(init.initializer(KaimingUniform(a=math.sqrt(5)), cell.weight.shape, cell.weight.dtype))
            if cell.bias is not None:
                fan_in, _ = _calculate_in_and_out(cell.weight.asnumpy())
                bound = 1 / math.sqrt(fan_in)
                cell.bias.set_data(Tensor(np.random.uniform(-bound, bound, cell.bias.shape), cell.bias.dtype))
        elif isinstance(cell, (nn.BatchNorm2d, nn.BatchNorm1d)):
            pass


+ 构建DenseBlock

In [6]:
class GlobalAvgPooling(nn.Cell):
    def __init__(self):
        super(GlobalAvgPooling, self).__init__()
        self.mean = P.ReduceMean(True)
        self.shape = P.Shape()
        self.reshape = P.Reshape()

    def construct(self, x):
        x = self.mean(x, (2, 3))
        b, c, _, _ = self.shape(x)
        x = self.reshape(x, (b, c))
        return x

class CommonHead(nn.Cell):
    def __init__(self, num_classes, out_channels):
        super(CommonHead, self).__init__()
        self.avgpool = GlobalAvgPooling()
        self.fc = nn.Dense(out_channels, num_classes, has_bias=True)

    def construct(self, x):
        x = self.avgpool(x)
        x = self.fc(x)
        return x

def conv7x7(in_channels, out_channels, stride=1, padding=3, has_bias=False):
    return nn.Conv2d(in_channels, out_channels, kernel_size=7, stride=stride, has_bias=has_bias,
                     padding=padding, pad_mode="pad")


def conv3x3(in_channels, out_channels, stride=1, padding=1, has_bias=False):
    return nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, has_bias=has_bias,
                     padding=padding, pad_mode="pad")


def conv1x1(in_channels, out_channels, stride=1, padding=0, has_bias=False):
    return nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride, has_bias=has_bias,
                     padding=padding, pad_mode="pad")


class _DenseLayer(nn.Cell):
    """
    the dense layer, include 2 conv layer
    """
    def __init__(self, num_input_features, growth_rate, bn_size, drop_rate):
        super(_DenseLayer, self).__init__()
        self.norm1 = nn.BatchNorm2d(num_input_features)
        self.relu1 = nn.ReLU()
        self.conv1 = conv1x1(num_input_features, bn_size*growth_rate)

        self.norm2 = nn.BatchNorm2d(bn_size*growth_rate)
        self.relu2 = nn.ReLU()
        self.conv2 = conv3x3(bn_size*growth_rate, growth_rate)

        # nn.Dropout in MindSpore use keep_prob, diff from Pytorch
        self.keep_prob = 1.0 - drop_rate
        self.dropout = nn.Dropout(keep_prob=self.keep_prob)

    def construct(self, features):
        bottleneck = self.conv1(self.relu1(self.norm1(features)))
        new_features = self.conv2(self.relu2(self.norm2(bottleneck)))
        if self.keep_prob < 1:
            new_features = self.dropout(new_features)
        return new_features

class _DenseBlock(nn.Cell):
    """
    the dense block
    """
    def __init__(self, num_layers, num_input_features, bn_size, growth_rate, drop_rate):
        super(_DenseBlock, self).__init__()
        self.cell_list = nn.CellList()
        for i in range(num_layers):
            layer = _DenseLayer(
                num_input_features + i * growth_rate,
                growth_rate=growth_rate,
                bn_size=bn_size,
                drop_rate=drop_rate
            )
            self.cell_list.append(layer)

        self.concate = P.Concat(axis=1)

    def construct(self, init_features):
        features = init_features
        for layer in self.cell_list:
            new_features = layer(features)
            features = self.concate((features, new_features))
        return features

+ 构建Transition层

In [7]:
class _Transition(nn.Cell):
    """
    the transition layer
    """
    def __init__(self, num_input_features, num_output_features, avgpool=False):
        super(_Transition, self).__init__()
        if avgpool:
            poollayer = nn.AvgPool2d(kernel_size=2, stride=2)
        else:
            poollayer = nn.MaxPool2d(kernel_size=2, stride=2)
        self.features = nn.SequentialCell(OrderedDict([
            ('norm', nn.BatchNorm2d(num_input_features)),
            ('relu', nn.ReLU()),
            ('conv', conv1x1(num_input_features, num_output_features)),
            ('pool', poollayer)
        ]))

    def construct(self, x):
        x = self.features(x)
        return x

+ 构建DenseNet主干网络框架

In [8]:
class Densenet(nn.Cell):
    """
    the densenet architecture
    """
    __constants__ = ['features']

    def __init__(self, growth_rate, block_config, num_init_features=None, bn_size=4, drop_rate=0):
        super(Densenet, self).__init__()

        layers = OrderedDict()
        if num_init_features:
            layers['conv0'] = conv7x7(3, num_init_features, stride=2, padding=3)
            layers['norm0'] = nn.BatchNorm2d(num_init_features)
            layers['relu0'] = nn.ReLU()
            layers['pool0'] = nn.MaxPool2d(kernel_size=3, stride=2, pad_mode='same')
            num_features = num_init_features
        else:
            layers['conv0'] = conv3x3(3, growth_rate*2, stride=1, padding=1)
            layers['norm0'] = nn.BatchNorm2d(growth_rate*2)
            layers['relu0'] = nn.ReLU()
            num_features = growth_rate * 2

        # Each denseblock
        for i, num_layers in enumerate(block_config):
            block = _DenseBlock(
                num_layers=num_layers,
                num_input_features=num_features,
                bn_size=bn_size,
                growth_rate=growth_rate,
                drop_rate=drop_rate
            )
            layers['denseblock%d'%(i+1)] = block
            num_features = num_features + num_layers*growth_rate

            if i != len(block_config)-1:
                if num_init_features:
                    trans = _Transition(num_input_features=num_features, num_output_features=num_features // 2,
                                        avgpool=False)
                else:
                    trans = _Transition(num_input_features=num_features, num_output_features=num_features // 2,
                                        avgpool=True)
                layers['transition%d'%(i+1)] = trans
                num_features = num_features // 2

        # Final batch norm
        layers['norm5'] = nn.BatchNorm2d(num_features)
        layers['relu5'] = nn.ReLU()

        self.features = nn.SequentialCell(layers)
        self.out_channels = num_features

    def construct(self, x):
        x = self.features(x)
        return x

    def get_out_channels(self):
        return self.out_channels

    
def _densenet121(**kwargs):
    return Densenet(growth_rate=32, block_config=(6, 12, 24, 16), num_init_features=64, **kwargs)

+ 构建DenseNet121网络

In [9]:
class DenseNet121(nn.Cell):
    """
    the densenet121 architecture
    """
    def __init__(self, num_classes, include_top=True):
        super(DenseNet121, self).__init__()
        self.backbone = _densenet121()
        out_channels = self.backbone.get_out_channels()
        self.include_top = include_top
        if self.include_top:
            self.head = CommonHead(num_classes, out_channels)

        default_recurisive_init(self)
        for _, cell in self.cells_and_names():
            if isinstance(cell, nn.Conv2d):
                cell.weight.set_data(init.initializer(KaimingNormal(a=math.sqrt(5), mode='fan_out',
                                                                    nonlinearity='relu'),
                                                      cell.weight.shape,
                                                      cell.weight.dtype))
            elif isinstance(cell, nn.BatchNorm2d):
                cell.gamma.set_data(init.initializer('ones', cell.gamma.shape))
                cell.beta.set_data(init.initializer('zeros', cell.beta.shape))
            elif isinstance(cell, nn.Dense):
                cell.bias.set_data(init.initializer('zeros', cell.bias.shape))

    def construct(self, x):
        x = self.backbone(x)
        if not self.include_top:
            return x
        x = self.head(x)
        return x

In [10]:
import os
import time
import argparse
import datetime
from collections import Counter
import mindspore.nn as nn
from mindspore import Tensor
from mindspore.nn.optim import Momentum
from mindspore.communication.management import init, get_rank, get_group_size
from mindspore.train.callback import ModelCheckpoint
from mindspore.train.callback import CheckpointConfig, Callback
from mindspore.train.serialization import load_checkpoint, load_param_into_net
from mindspore.train.model import Model
from mindspore.train.loss_scale_manager import DynamicLossScaleManager, FixedLossScaleManager
from mindspore import context
from mindspore.context import ParallelMode
from mindspore.common import set_seed
from mindspore import save_checkpoint
from mindspore.common import initializer as init
from mindspore.nn.loss.loss import _Loss
from mindspore.ops import operations as P
from mindspore.ops import functional as F
from mindspore.common import dtype as mstype

#from utils.lr_scheduler import MultiStepLR, CosineAnnealingLR
#from utils.crossentropy import CrossEntropy
#from utils.optimizers import get_param_groups
# set_seed(1)


### 2.5 模型训练及评估

+ 定义学习率决策

定义了CosineAnnealingLR学习率scheduler决策，以便于在后续的模型训练中使用，主要包含lr, T_max, eta_min等几种参数，其中lr就是设置好的初始学习率，T_max是cosine循环的最大次数，eta_min是在consine循环中最低的学习率，通过修改T_max与eta_min这两个参数可以直接调整训练过程中学习率的变化曲线。

In [11]:
class _WarmUp():
    def __init__(self, warmup_init_lr):
        self.warmup_init_lr = warmup_init_lr

    def get_lr(self):
        # Get learning rate during warmup
        raise NotImplementedError

class _LinearWarmUp(_WarmUp):
    """
    linear warmup function
    """
    def __init__(self, lr, warmup_epochs, steps_per_epoch, warmup_init_lr=0):
        self.base_lr = lr
        self.warmup_init_lr = warmup_init_lr
        self.warmup_steps = int(warmup_epochs * steps_per_epoch)

        super(_LinearWarmUp, self).__init__(warmup_init_lr)

    def get_warmup_steps(self):
        return self.warmup_steps

    def get_lr(self, current_step):
        lr_inc = (float(self.base_lr) - float(self.warmup_init_lr)) / float(self.warmup_steps)
        lr = float(self.warmup_init_lr) + lr_inc * current_step
        return lr

class _LRScheduler():

    def __init__(self, lr, max_epoch, steps_per_epoch):
        self.base_lr = lr
        self.steps_per_epoch = steps_per_epoch
        self.total_steps = int(max_epoch * steps_per_epoch)

    def get_lr(self):
        # Compute learning rate using chainable form of the scheduler
        raise NotImplementedError
        
        
        
class CosineAnnealingLR(_LRScheduler):
    def __init__(self, lr, T_max, steps_per_epoch, max_epoch, warmup_epochs=0, eta_min=0):
        self.T_max = T_max
        self.eta_min = eta_min
        self.warmup = _LinearWarmUp(lr, warmup_epochs, steps_per_epoch)
        super(CosineAnnealingLR, self).__init__(lr, max_epoch, steps_per_epoch)

    def get_lr(self):
        warmup_steps = self.warmup.get_warmup_steps()

        lr_each_step = []
        current_lr = self.base_lr
        for i in range(self.total_steps):
            if i < warmup_steps:
                lr = self.warmup.get_lr(i+1)
            else:
                cur_ep = i // self.steps_per_epoch
                if i % self.steps_per_epoch == 0 and i > 0:
                    current_lr = self.eta_min + \
                                 (self.base_lr - self.eta_min) * (1. + math.cos(math.pi*cur_ep / self.T_max)) / 2

                lr = current_lr

            lr_each_step.append(lr)

        return np.array(lr_each_step).astype(np.float32)

+ 定义交叉熵损失函数

In [12]:
class CrossEntropy(_Loss):
    """
    loss function CrossEntropy
    """
    def __init__(self, smooth_factor=0., num_classes=1000):
        super(CrossEntropy, self).__init__()
        self.onehot = P.OneHot()
        self.on_value = Tensor(1.0 - smooth_factor, mstype.float32)
        self.off_value = Tensor(1.0 * smooth_factor / (num_classes -1), mstype.float32)
        self.ce = nn.SoftmaxCrossEntropyWithLogits()
        self.mean = P.ReduceMean(False)

    def construct(self, logit, label):
        one_hot_label = self.onehot(label,
                                    F.shape(logit)[1], self.on_value, self.off_value)
        loss = self.ce(logit, one_hot_label)
        loss = self.mean(loss, 0)
        return loss

+ 模型训练

在模型训练时，首先是设置模型训练的epoch次数，再通过2.1节中自定义的classification_dataset方法创建了训练集，其中训练集batch_size大小为4；损失函数使用CrossEntropy，优化器使用Momentum，并设置初始学习率为0.1。回调函数方面使用了自定义的ProgressMonitor来监控训练过程中每个epoch和每个step结束后，损失值Loss的变化情况，并在训练结束后保存当前最优模型。

In [13]:
from mindspore import ops
from mindspore import ms_function
from mindspore import context
import mindspore
def get_param_groups(network):
    """
    get parameter groups
    """
    decay_params = []
    no_decay_params = []
    for x in network.trainable_params():
        parameter_name = x.name
        if parameter_name.endswith('.bias'):
            # all bias not using weight decay
            no_decay_params.append(x)
        elif parameter_name.endswith('.gamma'):
            # bn weight bias not using weight decay, be carefully for now x not include BN
            no_decay_params.append(x)
        elif parameter_name.endswith('.beta'):
            # bn weight bias not using weight decay, be carefully for now x not include BN
            no_decay_params.append(x)
        else:
            decay_params.append(x)

    return [{'params': no_decay_params, 'weight_decay': 0.0}, {'params': decay_params}]

def get_top5_acc(top5_arg, gt_class):
    sub_count = 0
    for top5, gt in zip(top5_arg, gt_class):
        if gt in top5:
            sub_count += 1
    return sub_count

def train(model, dataset, loss_fn, optimizer):
    # Define forward function
    def forward_fn(data, label):
        logits = model(data)
        loss = loss_fn(logits, label)
        return loss, logits
    # Get gradient function
    grad_fn = ops.value_and_grad(forward_fn, None, optimizer.parameters, has_aux=False)
    # Define function of one-step training
    # @ms_function
    def train_step(data, label):
        (loss, logits), grads = grad_fn(data, label)
        loss = ops.depend(loss, optimizer(grads))
        return loss, logits

    size = dataset.get_dataset_size()
    model.set_train(True)
    train_loss = 0
    top1_correct = 0
    top5_correct = 0
    img_tot = 0
    for batch, (data, label) in enumerate(dataset.create_tuple_iterator()):
        loss, logits = train_step(data, label)

        logits = logits.asnumpy()
        label = label.asnumpy()
        top1_output = np.argmax(logits, (-1))
        top5_output = np.argsort(logits)[:, -5:]

        t1_correct = np.equal(top1_output, label).sum()
        top1_correct += t1_correct
        top5_correct += get_top5_acc(top5_output, label)
        img_tot += label.shape[0]
        train_loss += loss.asnumpy()

    train_loss /= size
    acc1 = 100.0 * top1_correct / img_tot
    acc5 = 100.0 * top5_correct / img_tot
    print('Train Loss={:.4f}, top1_correct={}, top5_correct={}, tot={}, Top1_acc={:.2f}%, Top5_acc={:.2f}%'.format(train_loss, top1_correct, top5_correct, img_tot, acc1, acc5))

def val(model, dataset, loss_fn):
    size = dataset.get_dataset_size()
    model.set_train(False)
    val_loss = 0
    top1_correct = 0
    top5_correct = 0
    img_tot = 0
    for batch, (data, label) in enumerate(dataset.create_tuple_iterator()):
        logits = model(data)
        val_loss += loss_fn(logits, label).asnumpy()
        logits = logits.asnumpy()
        label = label.asnumpy()
        top1_output = np.argmax(logits, (-1))
        top5_output = np.argsort(logits)[:, -5:]

        t1_correct = np.equal(top1_output, label).sum()
        top1_correct += t1_correct
        top5_correct += get_top5_acc(top5_output, label)
        img_tot += label.shape[0]
        
    val_loss /= size
    acc1 = 100.0 * top1_correct / img_tot
    acc5 = 100.0 * top5_correct / img_tot
    print('Val Loss={:.4f} top1_correct={}, top5_correct={}, tot={}, Top1_acc={:.2f}%, Top5_acc={:.2f}%'.format(val_loss, top1_correct, top5_correct, img_tot, acc1, acc5))
    return top1_correct / img_tot

context.set_context(mode=context.GRAPH_MODE, device_target='Ascend')
train_dataset = classification_dataset("dataset/train", image_size=[168, 168],per_batch_size=512, max_epoch=20, rank=0, group_size=1, mode = 'train')
val_dataset = classification_dataset("dataset/val", image_size=[168, 168],per_batch_size=512, max_epoch=20, rank=0, group_size=1, mode = 'val')
steps_per_epoch = train_dataset.get_dataset_size()

epochs = 3
net = DenseNet121(100)
criterion = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean')
# criterion = CrossEntropy(smooth_factor=0.0, num_classes=100)
lr_scheduler = CosineAnnealingLR(lr=0.0001,T_max=120,steps_per_epoch=steps_per_epoch,max_epoch=epochs,warmup_epochs=0,eta_min=0)
lr_schedule = lr_scheduler.get_lr()
opt = Momentum(params=get_param_groups(net),learning_rate=Tensor(lr_schedule),momentum=0.9,weight_decay=0.0001,loss_scale=1024)
# opt = nn.SGD(params=net.trainable_params(),learning_rate=0.001)

best_acc = 0
for epoch in range(epochs):
    print(f"Epoch [{epoch+1} / {epochs}]")
    train(net, train_dataset, criterion, opt)
    checkpoint_best = val(net, val_dataset,criterion)
    if checkpoint_best > best_acc:
        print('Acc improved from %0.4f to %0.4f' % (best_acc, checkpoint_best))
        best_acc = checkpoint_best
        mindspore.save_checkpoint(net, 'checkpoint/densenet.ckpt')
        print("saving best checkpoint at: {} ".format('checkpoint/densenet.ckpt'))
    else:
        print('Acc did not improve from %0.4f' % (best_acc),"\n-------------------------------")
print("Done!")

Epoch [1 / 3]
Train Loss=4.6541, top1_correct=391, top5_correct=1897, tot=35840, Top1_acc=1.09%, Top5_acc=5.29%
Val Loss=4.6363 top1_correct=136, top5_correct=634, tot=11776, Top1_acc=1.15%, Top5_acc=5.38%
Acc improved from 0.0000 to 0.0115
saving best checkpoint at: checkpoint/densenet.ckpt 
Epoch [2 / 3]
Train Loss=4.6438, top1_correct=362, top5_correct=1791, tot=35840, Top1_acc=1.01%, Top5_acc=5.00%
Val Loss=4.6475 top1_correct=137, top5_correct=589, tot=11776, Top1_acc=1.16%, Top5_acc=5.00%
Acc improved from 0.0115 to 0.0116
saving best checkpoint at: checkpoint/densenet.ckpt 
Epoch [3 / 3]
Train Loss=4.6490, top1_correct=321, top5_correct=1820, tot=35840, Top1_acc=0.90%, Top5_acc=5.08%
Val Loss=4.6559 top1_correct=96, top5_correct=590, tot=11776, Top1_acc=0.82%, Top5_acc=5.01%
Acc did not improve from 0.0116 
-------------------------------
Done!


### 2.6 模型效果评估

模型效果评估方面首先是通过2.1节中自定义的classification_dataset方法创建了验证集，之后通过mindspore提供的load_checkpoint方法加载模型参数，再将测试集输入模型完成预测，并分别计算预测类别的Top-1和Top-5准确率。

In [14]:
import argparse
import glob
from mindspore.communication.management import init, get_rank, get_group_size, release
from mindspore.train.serialization import load_checkpoint, load_param_into_net
from mindspore.ops import operations as P
from mindspore.ops import functional as F
from mindspore.common import dtype as mstype
from mindspore.common import initializer as init
from mindspore import Tensor

In [17]:
def evaluation():
    data_dir = "dataset/val"
    dataset = classification_dataset("dataset/test", image_size=[168, 168], per_batch_size=512, max_epoch=20, rank=0, group_size=1, mode = 'val')
    
    network = DenseNet121(100)
    load_checkpoint("checkpoint/densenet.ckpt", net=network)
    
    # network.add_flags_recursive(fp16=True)

    img_tot = 0
    top1_correct = 0
    top5_correct = 0
    network.set_train(False)
    for batch, (data, label) in enumerate(dataset.create_tuple_iterator()):
        logits = network(data)
        logits = logits.asnumpy()
        label = label.asnumpy()
        top1_output = np.argmax(logits, (-1))
        top5_output = np.argsort(logits)[:, -5:]

        t1_correct = np.equal(top1_output, label).sum()
        top1_correct += t1_correct
        top5_correct += get_top5_acc(top5_output, label)
        img_tot += label.shape[0]
        
    acc1 = 100.0 * top1_correct / img_tot
    acc5 = 100.0 * top5_correct / img_tot
    print('Test top1_correct={}, top5_correct={}, tot={}, Top1_acc={:.2f}%, Top5_acc={:.2f}%'.format(top1_correct, top5_correct, img_tot, acc1, acc5))


In [16]:
evaluation()

Test top1_correct=137, top5_correct=589, tot=11776, Top1_acc=1.16%, Top5_acc=5.00%


## 3 总结

本案例基于MindSpore框架针对mini-ImageNet数据集，首先在对原始数据集进行预处理后完成了数据读取与数据集创建，之后通过框架构建了Densenet121模型并自定义了回调函数，并在模型训练后，进行了分类索引与数据集提供的分类编号、分类名称的转换，顺利利用模型完成了评估和预测。通过此案例进一步加深了对Densenet模型结构和特性的理解，并通过使用MindSpore框架高效地完成了整个案例实现流程，加深了对框架提供的API的理解与掌握。