## **引言**

卷积神经网络推动了计算机视觉诸多任务的进步，比如图像识别、目标检测等。但是，神经网络在移动设备上的应用还亟待解决，主要原因是现有模型**又大又慢**。因而，一些研究提出了模型的压缩方法，比如剪枝、量化、知识蒸馏等；还有一些则着重于高效的网络结构设计，比如MobileNet，ShuffleNet等。本文就设计了一种全新的神经网络基本单元Ghost模块，从而搭建出轻量级神经网络架构**GhostNet**。

众所周知，在一个训练好的深度神经网络中，通常会包含丰富甚至**冗余的特征图**，以保证对输入数据有全面的理解。

![image1](images/image-20221010175743115.png)

图1 ResNet50第一个残差块处理后的特征图可视化

如上图所示，在**ResNet-50**中，将经过第一个残差块处理后的特征图拿出来，三个相似的（**红、蓝、绿**）特征图对示例用相同颜色的框注释。 该对中的一个特征图可以通过**廉价操作**将另一特征图变换而获得，可以认为其中一个特征图是另一个的“**ghost**”。

在本文中，作者提出了一种新颖的**Ghost模块**，可以使用**更少的参数**来生成更多特征图。

具体来说，深度神经网络中的**普通卷积层**将分为**两部分**。第一部分涉及**普通卷积**，但是将严格控制它们的总数。给定第一部分的**固有特征图**，然后将一系列**简单的线性运算**应用于**生成更多特征图**。

与普通卷积神经网络相比，在不更改输出特征图大小的情况下，该Ghost模块中所需的**参数总数**和**计算复杂度**均已降低。基于Ghost模块，作者建立了一种有效的神经体系结构，即**GhostNet**。作者首先在基准神经体系结构中**替换原始的卷积层**，以证明Ghost模块的有效性，然后在几个基准视觉数据集上验证GhostNet的优越性。

普通卷积操作：**卷积操作**就是**卷积核**（过滤器 / Filter）在原始图片中**进行滑动**得到**特征图**的过程。

![image-20221010180117058](./images/image-20221010180117058.png)

图2 图像与卷积核



![image-20221010180259527](./images\image-20221010180259527.png)
![image-20221010180324315](./images\image-20221010180324315.png)

图3 卷积操作

![image-20221010182805976](./images\image-20221010182805976.png)

图4 卷积操作

 我们可以通过在卷积运算之前**更改卷积核矩阵的数值**来执行诸如边缘检测，锐化和模糊之类的操作——这意味着不同的卷积核可以从图像中检测不同的特征，例如边缘， 曲线等。

![image-20221010183348805](./images\image-20221010183348805.png)

图5 卷积结果

上图输入通道为3channel,3x3

核函数为3个卷积核（输入通道为3，2个channel（输出通道为2），kernel_size未知

输出即为2个通道

普通卷积
![image-20221011152412155](./images\image-20221011152412155.png)

ghost module操作
![image-20221011152542091](./images\image-20221011152542091.png)


图6 普通卷积与ghost module操作

从m个通道的输入数据生成n个通道的输出，卷积核的个数是m*n，在一些比较大的通道数上，其效率会非常的低。而且普通卷积的方法在输出的不同特征图中，通常会包含丰富甚至冗余的**特征图**。通过研究发现这些特征图可以由通过一些**简单线性变换**得到。

Ghost module的方法将其中**一部分特征图**通过以往的普通卷积方法得到，**另一部分**通过一些**线性变换**从这些**已经得到的特征图**中获取。这样就能大大**降低卷积操作的次数**，提高运行效率。

例：从m个通道的输入数据生成n个通道的输出，卷积核的个数是**mn**

若使用Ghost module，设中间特征通道数为m<p<n，且线性变换为**depthwise conv**，则卷积核的个数为**mp+(n-p)**

## GhostNet结构

下图是Ghost bottleneck结构图，很类似resnet结构，不同的是channel是先升维再降维。

![image-20221011152243605](./images\image-20221011152243605.png)
![image-20221011152249621](./images\image-20221011152249621.png)

图7 Ghost bottleneck结构图

下面是GhostNet的网络结构图

<img src="./images\image-20221011152641626.png" alt="image-20221011152641626" style="zoom:50%;" />

​																					图8 GhostNet的网络结构图

先导入各种前置包

In [1]:
from functools import partial
import math
import numpy as np
import mindspore.nn as nn
from mindspore.ops import operations as P
from mindspore import Tensor
import mindspore as ms
from mindspore import ops, Tensor, context, nn
from mindspore import dtype as mstype

#禁用warnings
import warnings
warnings.filterwarnings("ignore")

数据集加载

In [None]:
#如果有ASCEND则ASCEND 加速
# ms.set_context(mode=ms.GRAPH_MODE, device_target="Ascend")

# 数据集读取
from mindvision.classification.dataset import Cifar10

# 数据集根目录
data_dir = "./CIFAR10"

# 下载解压并加载CIFAR-10训练数据集
download_train = Cifar10(path=data_dir, split="train", batch_size=4096, repeat_num=1, shuffle=True, resize=32, download=True)
dataset_train = download_train.run()

step_size = dataset_train.get_dataset_size()

# 下载解压并加载CIFAR-10测试数据集
download_eval = Cifar10(path=data_dir, split="test", batch_size=1024, resize=32, download=True)
dataset_eval = download_eval.run()

查看数据集结构

In [None]:
# 可视化
import numpy as np
import matplotlib.pyplot as plt

data = next(dataset_train.create_dict_iterator())

images = data["image"].asnumpy()
labels = data["label"].asnumpy()
print(f"Image shape: {images.shape}, Label: {labels}")

plt.figure()
for i in range(1, 7):
    plt.subplot(2, 3, i)
    image_trans = np.transpose(images[i - 1], (1, 2, 0))
    mean = np.array([0.4914, 0.4822, 0.4465])
    std = np.array([0.2023, 0.1994, 0.2010])
    image_trans = std * image_trans + mean
    image_trans = np.clip(image_trans, 0, 1)
    plt.title(f"{download_train.index2label[labels[i - 1]]}")
    plt.imshow(image_trans)
    plt.axis("off")
plt.show()

深度可分离卷积的结构
depthwise conv

In [None]:
class ConvBnAct(nn.Cell):
    def __init__(self, in_chs, out_chs, kernel_size,
                 stride=1, act_layer=nn.ReLU):
        super(ConvBnAct, self).__init__()
        self.conv = nn.Conv2d(in_chs, out_chs, kernel_size, stride, pad_mode='pad', padding=kernel_size // 2,
                              has_bias=False)
        self.bn = nn.BatchNorm2d(out_chs)
        self.act = act_layer()

    def construct(self, x):
        x = self.conv(x)
        x = self.bn(x)
        x = self.act(x)
        return x

构造GhostModule层
最为重要的部分

In [None]:
class GhostModule(nn.Cell):

    def __init__(self, in_channels, out_channels, kernel_size=1, ratio=2, dw_size=3, stride=1, relu=True):
        super(GhostModule, self).__init__()
        self.out_channels = out_channels
        init_channels = math.ceil(out_channels / ratio)  
        new_channels = init_channels * (ratio - 1)  
            
        self.primary_conv = nn.SequentialCell(
            nn.Conv2d(in_channels=in_channels, out_channels=init_channels, kernel_size=kernel_size, stride=stride,
                      pad_mode='pad', padding=kernel_size // 2, has_bias=False),
            nn.BatchNorm2d(init_channels),
            nn.ReLU() if relu else nn.SequentialCell(),
        )
        self.cheap_operation = nn.SequentialCell(
            nn.Conv2d(in_channels=init_channels, out_channels=new_channels, kernel_size=dw_size, stride=1,
                      pad_mode='pad', padding=dw_size // 2, group=init_channels, has_bias=False),
            nn.BatchNorm2d(new_channels),
            nn.ReLU() if relu else nn.SequentialCell(),
        )

    def construct(self, x):
        """ construct """
        x1 = self.primary_conv(x)
        x2 = self.cheap_operation(x1)
        concat = ms.ops.Concat(axis=1)
        out = concat((x1, x2))
        return out[:, :self.out_channels, :, :]

In [None]:
def _make_divisible(x, divisor=4):
    return int(np.ceil(x * 1. / divisor) * divisor)

In [None]:
def hard_sigmoid(x, inplace: bool = False):
    if inplace:
        return x.add_(3.).clamp_(0., 6.).div_(6.)
    else:
        relu = ms.ops.ReLU()
        return relu(x + 3.) / 6.

In [None]:
class GlobalAvgPooling(nn.Cell):
    def __init__(self, keep_dims=False):
        super(GlobalAvgPooling, self).__init__()
        self.mean = P.ReduceMean(keep_dims=keep_dims)

    def construct(self, x):
        """ construct """
        x = self.mean(x, (2, 3))
        return x

In [None]:
class SE(nn.Cell):
    def __init__(self, num_out, ratio=4):
        super(SE, self).__init__()
        num_mid = _make_divisible(num_out // ratio)
        self.pool = GlobalAvgPooling(keep_dims=True)
        self.conv_reduce = nn.Conv2d(in_channels=num_out, out_channels=num_mid,
                                     kernel_size=1, has_bias=True, pad_mode='pad')
        self.act1 = nn.ReLU()
        self.conv_expand = nn.Conv2d(in_channels=num_mid, out_channels=num_out,
                                     kernel_size=1, has_bias=True, pad_mode='pad')
        self.act2 = hard_sigmoid
        self.mul = P.Mul()

    def construct(self, x):
        """ construct of SE module """
        out = self.pool(x)
        out = self.conv_reduce(out)
        out = self.act1(out)
        out = self.conv_expand(out)
        out = self.act2(out)
        out = self.mul(x, out)
        return out

Bottleneck结构

In [None]:
class GhostBottleneck(nn.Cell):
    """ Ghost bottleneck w/ optional SE"""

    def __init__(self, in_chs, mid_chs, out_chs, dw_kernel_size=3,
                 stride=1, act_layer=nn.ReLU, se_ratio=0.):
        super(GhostBottleneck, self).__init__()
        has_se = se_ratio is not None and se_ratio > 0.
        self.stride = stride

        self.ghost1 = GhostModule(in_chs, mid_chs, relu=True)

        if self.stride > 1:
            self.conv_dw = nn.Conv2d(in_channels=mid_chs, out_channels=mid_chs, kernel_size=dw_kernel_size,
                                     stride=stride, pad_mode='pad',
                                     padding=(dw_kernel_size - 1) // 2,
                                     group=mid_chs, has_bias=False)
            self.bn_dw = nn.BatchNorm2d(mid_chs)

        if has_se:
            self.se = SE(mid_chs, ratio=se_ratio)
        else:
            self.se = None
            
        self.ghost2 = GhostModule(mid_chs, out_chs, relu=False)
        
        if (in_chs == out_chs and self.stride == 1):
            self.shortcut = nn.SequentialCell()
        else:
            self.shortcut = nn.SequentialCell(
                nn.Conv2d(in_channels=in_chs, out_channels=in_chs, kernel_size=dw_kernel_size, stride=stride,
                          pad_mode='pad', padding=(dw_kernel_size - 1) // 2, group=in_chs, has_bias=False),
                nn.BatchNorm2d(in_chs),
                nn.Conv2d(in_channels=in_chs, out_channels=out_chs, kernel_size=1, stride=1,
                          pad_mode='valid', padding=0, has_bias=False),
                nn.BatchNorm2d(out_chs),
            )

    def construct(self, x):
        residual = x
        x = self.ghost1(x)
        if self.stride > 1:
            x = self.conv_dw(x)
            x = self.bn_dw(x)
        if self.se is not None:
            x = self.se(x)
        x = self.ghost2(x)
        x += self.shortcut(residual)
        return x

主体部分

值得注意的是：

1、mindspore中在前向传播处使用的任何变量需要在init中定义为self变量才可使用

2、最后的output层的liner采用dense

In [None]:
class GhostNet(nn.Cell):
    def __init__(self, cfgs,in_channel = 3, num_classes=10, width=1.0, dropout=0.2):
        super(GhostNet, self).__init__()
        # setting of inverted residual blocks
        self.cfgs = cfgs

        # ---- building first layer ---- #
        output_channel = _make_divisible(16 * width, 4)  # setting divisible channels | output_channel = 16
        self.conv_stem = nn.Conv2d(in_channels=in_channel,
                                   out_channels=output_channel,
                                   kernel_size=3, 
                                   padding=1, 
                                   stride=2,
                                   has_bias=False, 
                                   pad_mode='pad')  # first conv
        self.bn1 = nn.BatchNorm2d(output_channel)
        self.act1 = nn.ReLU()
        input_channel = output_channel

        # ---- building inverted residual [blocks] ---- #
        stages = nn.SequentialCell()
        block = GhostBottleneck
        for cfg in self.cfgs:
            layers = nn.SequentialCell()
            for k, exp_size, c, se_ratio, s in cfg:
                output_channel = _make_divisible(c * width, 4)
                hidden_channel = _make_divisible(exp_size * width, 4)
                layers.append(block(input_channel, hidden_channel, output_channel, k, s,
                                    se_ratio=se_ratio))
                input_channel = output_channel
            stages.append(layers)

        output_channel = _make_divisible(exp_size * width, 4)
        stages.append(ConvBnAct(input_channel, output_channel, 1))

        input_channel = output_channel
        self.blocks = stages

        # ---- building last several layers ---- #
        output_channel = 128
        self.global_pool = GlobalAvgPooling(keep_dims=True)
        self.conv_head = nn.Conv2d(in_channels=input_channel,
                                   out_channels=output_channel,
                                   kernel_size=1, padding=0, stride=1,
                                   has_bias=True, pad_mode='pad')
        self.act2 = nn.ReLU()
        self.squeeze = P.Flatten()

        # ---- cls head ---- #
        self.final_dropout = dropout
        if self.final_dropout > 0.:
            self.Dropout = nn.Dropout(self.final_dropout)
            
        self.classifier = nn.Dense(output_channel, num_classes,has_bias=True)
        
        self._initialize_weights()

    def construct(self, x):
        # ---- Input Conv ---- #
        x = self.conv_stem(x)  # (bs, 1, 224, 224) -> (bs, 16, 112, 112)
        x = self.bn1(x)
        x = self.act1(x)
        # ---- Blocks Conv ---- #
        x = self.blocks(x)

        # ---- Last Conv ---- #
        x = self.global_pool(x)
        x = self.conv_head(x)
        x = self.act2(x)

        # ---- FC && Cls ---- #
        x = self.squeeze(x)
        if self.final_dropout > 0.:
            x = self.Dropout(x)
        x = self.classifier(x)

        return x
    
    def _initialize_weights(self):
        self.init_parameters_data()
        for _, m in self.cells_and_names():
            if isinstance(m, (nn.Conv2d)):
                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                m.weight.set_data(Tensor(np.random.normal(0, np.sqrt(2. / n),
                                                          m.weight.data.shape).astype("float32")))
                if m.bias is not None:
                    m.bias.set_data(
                        Tensor(np.zeros(m.bias.data.shape, dtype="float32")))
            elif isinstance(m, nn.BatchNorm2d):
                m.gamma.set_data(
                    Tensor(np.ones(m.gamma.data.shape, dtype="float32")))
                m.beta.set_data(
                    Tensor(np.zeros(m.beta.data.shape, dtype="float32")))
            elif isinstance(m, nn.Dense):
                m.weight.set_data(Tensor(np.random.normal(
                    0, 0.01, m.weight.data.shape).astype("float32")))
                if m.bias is not None:
                    m.bias.set_data(
                        Tensor(np.zeros(m.bias.data.shape, dtype="float32")))

参数列表

参考上文的GhostNet网络结构图

In [None]:
def ghostnet(**kwargs):
    """
    Constructs a GhostNet model
    """
    cfgs = [
        # kernel_size, exp_size(hidden_size), output_channels, SE_rate, stride
        # stage 1
        [[3, 16, 16, 0, 1]],   
        [[3, 48, 24, 0, 2]],

        # stage 2
        [[3, 72, 24, 0, 1]],
        [[5, 72, 40, 0.25, 2]],

        # stage 3
        [[5, 120, 40, 0.25, 1]],
        [[3, 240, 80, 0, 2]],

        # stage 4
        [[3, 200, 80, 0, 1],
         [3, 184, 80, 0, 1],
         [3, 184, 80, 0, 1],
         [3, 480, 112, 0.25, 1],
         [3, 672, 112, 0.25, 1]],
        [[5, 672, 160, 0.25, 2]],

        # stage 5
        [[5, 960, 160, 0, 1],
         [5, 960, 160, 0.25, 1],
         [5, 960, 160, 0, 1],
         [5, 960, 160, 0.25, 1]]
    ]
    return GhostNet(cfgs, **kwargs)

测试一下网络能不能跑通

In [None]:
net = ghostnet()
uniform = ms.ops.UniformReal()
input = uniform((24, 3, 36, 36)) 
output = net(input)

print(output.shape)

In [None]:
epochs_num = 1000

以下是mindspore的训练loss监视器，我们将其继承并添加了几个功能，让其可以直接将结果输出到文本中

In [None]:
from mindvision.engine.callback import LossMonitor
from mindspore.train.callback._callback import Callback, _handle_loss
import time
class Mylossmonitor(LossMonitor):
    def __init__(self, fp, lr_init=None, per_print_times=1):
        self.lr_init = lr_init
        self.per_print_times = per_print_times
        self.fp =fp
        self.last_print_time = 0
    
    def epoch_end(self, run_context):
        callback_params = run_context.original_args()
        epoch_mseconds = (time.time() - self.epoch_time) * 1000
        per_step_mseconds = epoch_mseconds / callback_params.batch_num
        print(f"Epoch time: {epoch_mseconds:5.3f} ms, "
              f"per step time: {per_step_mseconds:5.3f} ms, "
              f"avg loss: {np.mean(self.losses):5.3f}", flush=True)
        self.fp.write(f"Epoch time: {epoch_mseconds:5.3f} ms, " +
              f"per step time: {per_step_mseconds:5.3f} ms, " + 
              f"avg loss: {np.mean(self.losses):5.3f}\n")
    
    def step_end(self, run_context):
        """After step end print training info."""
        callback_params = run_context.original_args()
        step_mseconds = (time.time() - self.step_time) * 1000
        loss = callback_params.net_outputs

        if isinstance(loss, (tuple, list)):
            if isinstance(loss[0], ms.Tensor) and isinstance(loss[0].asnumpy(), np.ndarry):
                loss = loss[0]

        if isinstance(loss, ms.Tensor) and isinstance(loss.asnumpy(), np.ndarray):
            loss = np.mean(loss.asnumpy())

        self.losses.append(loss)
        cur_step_in_epoch = (callback_params.cur_step_num - 1) % callback_params.batch_num + 1

        # Boundary check.
        if isinstance(loss, float) and (np.isnan(loss) or np.isinf(loss)):
            raise ValueError(f"Invalid loss, terminate training.")

        def print_info():
            lr_output = self.lr_init[callback_params.cur_step_num - 1] if isinstance(self.lr_init,
                                                                                     list) else self.lr_init
            print(f"Epoch:[{(callback_params.cur_epoch_num - 1):3d}/{callback_params.epoch_num:3d}], "
                  f"step:[{cur_step_in_epoch:5d}/{callback_params.batch_num:5d}], "
                  f"loss:[{loss:5.3f}/{np.mean(self.losses):5.3f}], "
                  f"time:{step_mseconds:5.3f} ms, "
                  f"lr:{lr_output:5.5f}", flush=True)
            self.fp.write(f"Epoch:[{(callback_params.cur_epoch_num - 1):3d}/{callback_params.epoch_num:3d}], " +
                  f"step:[{cur_step_in_epoch:5d}/{callback_params.batch_num:5d}], " +
                  f"loss:[{loss:5.3f}/{np.mean(self.losses):5.3f}], " +
                  f"time:{step_mseconds:5.3f} ms, " +
                  f"lr:{lr_output:5.5f}\n")

        if (callback_params.cur_step_num - self.last_print_time) >= self.per_print_times:
            self.last_print_time = callback_params.cur_step_num
            print_info()
        

训练部分

loss采用SoftmaxCrossEntropyWithLogits

optimizer采用带动量的Momentum,lr为0.01

In [None]:
from mindspore.train.serialization import load_checkpoint, load_param_into_net

net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean')

net_opt = nn.Momentum(net.trainable_params(), learning_rate=0.01, momentum=0.9)

config_ck = ms.CheckpointConfig(save_checkpoint_steps=1875, keep_checkpoint_max=10)

#directory为网络保存地址
ckpoint = ms.ModelCheckpoint(prefix="GhostNet", directory="./GhostNet", config=config_ck)

##此处为读取已完成的网络参数
# CKPT_1 = './GhostNet/GhostNet-600_24.ckpt'
# param_dict = load_checkpoint(CKPT_1)
# load_param_into_net(net, param_dict)
# load_param_into_net(net_opt, param_dict)

model = ms.Model(net, loss_fn=net_loss, optimizer=net_opt, metrics={'accuracy'})

import datetime
today = datetime.datetime.today().isoformat()
t1 = time.clock()
file_name = './log' + today +'.txt'

with open(file_name , 'w+', buffering = 1, encoding = 'utf-8') as f:
    model.train(epochs_num, dataset_train, callbacks=[ckpoint, Mylossmonitor(f,0.01, 1875)])

    acc = model.eval(dataset_eval)

    f.write(str(epochs_num)+" : " + str(acc) + '\n')
    t2 = time.clock()
    f.write(f"cost time: {(t2-t1)*1000} ms\n")
    f.close()

print("{}".format(acc))