<a href="https://colab.research.google.com/github/OUCTheoryGroup/colab_demo/blob/master/MobileNetV2_CIFAR10.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## MobileNet V2

论文：MobileNetV2: Inverted Residuals and Linear Bottlenecks, *CVPR* 2018

**MobileNet V1 的主要问题：** 结构非常简单，但是没有使用RestNet里的residual learning；另一方面，Depthwise Conv确实是大大降低了计算量，但实际中，发现不少训练出来的kernel是空的。

**MobileNet V2 的主要改动一：设计了Inverted residual block**

![替代文字](https://gaopursuit.oss-cn-beijing.aliyuncs.com/202003/20200309092536334.jpg)

ResNet中的bottleneck，先用1x1卷积把通道数由256降到64，然后进行3x3卷积，不然中间3x3卷积计算量太大。所以bottleneck是两边宽中间窄（也是名字的由来）。

现在我们中间的3x3卷积可以变成Depthwise，计算量很少了，所以通道可以多一些。所以MobileNet V2 先用1x1卷积提升通道数，然后用Depthwise 3x3的卷积，再使用1x1的卷积降维。作者称之为Inverted residual block，中间宽两边窄。

**MobileNet V2 的主要改动二：去掉输出部分的ReLU6**

在 MobileNet V1 里面使用 ReLU6，ReLU6 就是普通的ReLU但是限制最大输出值为 6，这是为了在移动端设备 float16/int8 的低精度的时候，也能有很好的数值分辨率。Depthwise输出比较浅，应用ReLU会带来信息损失，所以在最后把ReLU去掉了（注意下图中标红的部分没有ReLU）。

![替代文字](https://gaopursuit.oss-cn-beijing.aliyuncs.com/202003/20200309142715231.jpg)

下面就是 Inverted residual block 部分的代码，主要思路就是:

expand + Depthwise + Pointwise 其中，expand就是增大feature map数量的意思。需要指出的是，当步长为1的时候，要加一个 shortcut；步长为2的时候，目的是降低feature map尺寸，就不需要加 shortcut 了。



In [0]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np
import torch.optim as optim

class Block(nn.Module):
    '''expand + depthwise + pointwise'''
    def __init__(self, in_planes, out_planes, expansion, stride):
        super(Block, self).__init__()
        self.stride = stride
        # 通过 expansion 增大 feature map 的数量
        planes = expansion * in_planes
        self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=1, stride=1, padding=0, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride, padding=1, groups=planes, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)
        self.conv3 = nn.Conv2d(planes, out_planes, kernel_size=1, stride=1, padding=0, bias=False)
        self.bn3 = nn.BatchNorm2d(out_planes)

        # 步长为 1 时，如果 in 和 out 的 feature map 通道不同，用一个卷积改变通道数
        if stride == 1 and in_planes != out_planes:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=1, padding=0, bias=False),
                nn.BatchNorm2d(out_planes))
        # 步长为 1 时，如果 in 和 out 的 feature map 通道相同，直接返回输入
        if stride == 1 and in_planes == out_planes:
            self.shortcut = nn.Sequential()

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = F.relu(self.bn2(self.conv2(out)))
        out = self.bn3(self.conv3(out))
        # 步长为1，加 shortcut 操作
        if self.stride == 1:
            return out + self.shortcut(x)
        # 步长为2，直接输出
        else:
            return out

## 创建 MobileNetV2 网络

注意，因为 CIFAR10 是 32*32，因此，网络有一定修改。

In [0]:
class MobileNetV2(nn.Module):
    # (expansion, out_planes, num_blocks, stride)
    cfg = [(1,  16, 1, 1),
           (6,  24, 2, 1), 
           (6,  32, 3, 2),
           (6,  64, 4, 2),
           (6,  96, 3, 1),
           (6, 160, 3, 2),
           (6, 320, 1, 1)]

    def __init__(self, num_classes=10):
        super(MobileNetV2, self).__init__()
        self.conv1 = nn.Conv2d(3, 32, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(32)
        self.layers = self._make_layers(in_planes=32)
        self.conv2 = nn.Conv2d(320, 1280, kernel_size=1, stride=1, padding=0, bias=False)
        self.bn2 = nn.BatchNorm2d(1280)
        self.linear = nn.Linear(1280, num_classes)

    def _make_layers(self, in_planes):
        layers = []
        for expansion, out_planes, num_blocks, stride in self.cfg:
            strides = [stride] + [1]*(num_blocks-1)
            for stride in strides:
                layers.append(Block(in_planes, out_planes, expansion, stride))
                in_planes = out_planes
        return nn.Sequential(*layers)

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.layers(out)
        out = F.relu(self.bn2(self.conv2(out)))
        out = F.avg_pool2d(out, 4)
        out = out.view(out.size(0), -1)
        out = self.linear(out)
        return out

## 创建 DataLoader

In [0]:
# 使用GPU训练，可以在菜单 "代码执行工具" -> "更改运行时类型" 里进行设置
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

transform_train = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))])

transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,  download=True, transform=transform_train)
testset  = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform_test)

trainloader = torch.utils.data.DataLoader(trainset, batch_size=128, shuffle=True, num_workers=2)
testloader = torch.utils.data.DataLoader(testset, batch_size=128, shuffle=False, num_workers=2)

实例化网络

In [0]:
# 网络放到GPU上
net = MobileNetV2().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=0.001)

## 模型训练

In [16]:
for epoch in range(10):  # 重复多轮训练
    for i, (inputs, labels) in enumerate(trainloader):
        inputs = inputs.to(device)
        labels = labels.to(device)
        # 优化器梯度归零
        optimizer.zero_grad()
        # 正向传播 +　反向传播 + 优化 
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        # 输出统计信息
        if i % 100 == 0:   
            print('Epoch: %d Minibatch: %5d loss: %.3f' %(epoch + 1, i + 1, loss.item()))

print('Finished Training')

Epoch: 1 Minibatch:     1 loss: 2.309
Epoch: 1 Minibatch:   101 loss: 1.566
Epoch: 1 Minibatch:   201 loss: 1.462
Epoch: 1 Minibatch:   301 loss: 1.226
Epoch: 2 Minibatch:     1 loss: 1.218
Epoch: 2 Minibatch:   101 loss: 1.381
Epoch: 2 Minibatch:   201 loss: 1.089
Epoch: 2 Minibatch:   301 loss: 1.117
Epoch: 3 Minibatch:     1 loss: 1.056
Epoch: 3 Minibatch:   101 loss: 0.964
Epoch: 3 Minibatch:   201 loss: 0.877
Epoch: 3 Minibatch:   301 loss: 1.126
Epoch: 4 Minibatch:     1 loss: 0.841
Epoch: 4 Minibatch:   101 loss: 0.734
Epoch: 4 Minibatch:   201 loss: 0.775
Epoch: 4 Minibatch:   301 loss: 0.738
Epoch: 5 Minibatch:     1 loss: 0.739
Epoch: 5 Minibatch:   101 loss: 0.812
Epoch: 5 Minibatch:   201 loss: 0.708
Epoch: 5 Minibatch:   301 loss: 0.544
Epoch: 6 Minibatch:     1 loss: 0.769
Epoch: 6 Minibatch:   101 loss: 0.683
Epoch: 6 Minibatch:   201 loss: 0.486
Epoch: 6 Minibatch:   301 loss: 0.583
Epoch: 7 Minibatch:     1 loss: 0.559
Epoch: 7 Minibatch:   101 loss: 0.596
Epoch: 7 Min

## 模型测试

In [17]:
correct = 0
total = 0

for data in testloader:
    images, labels = data
    images, labels = images.to(device), labels.to(device)
    outputs = net(images)
    _, predicted = torch.max(outputs.data, 1)
    total += labels.size(0)
    correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %.2f %%' % (
    100 * correct / total))

Accuracy of the network on the 10000 test images: 82.13 %
