# State-of-the-Art model architecture

本文將會介紹所有你需要知道的 CNN 模型，概念其實大同小異

## CNN 架構

- 影像分類: 特徵提取器 (feature extractor) + 分類器 (classifier)
- 影像分割: 網路骨幹 (backbone) + 全卷機網路 (fully connected network)
- 物件偵測: 網路骨幹 (backbone) + 分類器 (classifier)

### 影像分類任務

- LeNet、AlexNet、[VGG16]()，主要為 feature extractor + classifier
- [ResNet](https://arxiv.org/pdf/1512.03385.pdf)，利用 residual block 的方式讓網路做的更深
  - Identical residual block
  - Convolution residual block
  - BasisBlock
  - Bottleneck
  - ResNeXt: 簡化 Inception + GConv
- [DenseNet](https://arxiv.org/pdf/1608.06993.pdf)，利用 concat 的方式讓網路做的更深
  - Bottleneck layers
  - Compression
- [MobileNet](https://arxiv.org/pdf/1704.04861.pdf)，利用 DW + PW 輕量化網路
  - V1 $\to$ deepthwise + pointwise
    - deepthwise: 一種維度為 ```in_channels``` 的捲機做為特徵提取
    - pointwise: ```in_channels``` 種維度為  ```out_channels``` 的 1x1 捲機做為特徵組合
  - V2 $\to$ Expansion layer、Linear Bottlenck
    - Expansion layer: 低維度映射到高維度，可以想像成 unzip
    - Linear Bottlencek: 對低維空間做 relu 會損失信息，所以用 linear activation function
  - V3 $\to$ SE、h-swish
- [SPP](https://arxiv.org/pdf/1406.4729.pdf) 多尺度分割後池化
- [ASPP](https://arxiv.org/pdf/1606.00915v2.pdf) 空洞捲機、Sum fusion
- [Inception](https://arxiv.org/pdf/1409.4842.pdf)
  - V1 $\to$ 1x1、3x3、5x5、MaxPool2d 並行抓取特徵 (GoogLeNet)
  - V2 $\to$ 2 個 3x3 代替 5x5，利用 1xn 搭配 nx1 代替 nxn
  - V3 $\to$ 輔助分類器使用 BN、使用 label smoothing
  - V4 $\to$ Inception + Resdual connection (Inception-ResNet)
- [Xception](https://arxiv.org/pdf/1610.02357.pdf) 先用 1x1 捲機操作，之後把輸出的通道分別接上 3x3 捲機後連接
- [ShuffleNet](https://arxiv.org/pdf/1807.11164.pdf) 
  - V1 $\to$ GConv + channel shuffle
    - GConv: 將特徵圖、捲機核按照通道分組，分別計算後連接
    - channel shuffle: 進一步將 GConv 分組後的特徵圖再分組，保證每個組別都含有其他組別的信息
  - V2 $\to$ 優化網路建議、設計新的 block
    - 輸入維度盡可能等於輸出維度
    - 不能無腦增大 group 數
    - 網路不能太多分枝
    - 盡可能減少 element-wise 操作
- [SEnet](https://arxiv.org/pdf/1709.01507.pdf) feature recalibration
  - feature recalibration: 對於每一個特徵圖預測一個常數，做為特徵圖的權重
- [PolyNet](https://arxiv.org/pdf/1611.05725.pdf) 將 Inception 模塊中的 conv 換成 Inception 模塊，並且共享權值
- [NASNet](https://arxiv.org/pdf/1707.07012.pdf) 利用 PPO 自主學習網路 Block，但整體網路還需要自己定義
  - Normal Cell: 輸入特徵圖和輸出特徵圖大小一致
  - Reduction Cell: 對輸入特徵圖執行一次降採樣 (捲機 stride 預設為 2)
  - 步驟:
    - 第 $h_i$ 層 layers 之前的輸出選一個做為 hidden layer A 的輸入
    - 第 $h_i$ 層 layers 之前的輸出選一個做為 hidden layer B 的輸入
    - 為 A 的 feature map 選擇一個運算 (分離捲機、空洞捲機、捲機...)
    - 為 B 的 feature map 選擇一個運算 (分離捲機、空洞捲機、捲機...)
    - 選擇 add 或者 concat 方法合併 A 和 B
- [AmoebaNet](https://arxiv.org/pdf/1802.01548.pdf) 延續 NASNet，自主學習網路架構
  - 步驟:
    - 隨機初始化 P 個個體 (網路) 加入族群，訓練後驗證網路精度，將 P 個個體加入歷史族群
    - 若歷史族群數量少於 C，則從族群隨機抽取 S 個樣本，從中選出精度最高的做為父個體
    - 對父個體變異後得到子個體，訓練後驗證子個體網路精度
    - 將子個體加入歷史族群和族群中，從族群中淘汰最老的個體 (非精度最差)
    - 重複上述步驟直到歷史族群數量大過 C，最後選出精度最高的網路
  - 變異方式 (一個子網路包含 5 個 Block):
    - 改變 Block 的輸入 (hidden layer) 或操作 (空洞捲機、池化...) (二選一)
    - 選一種 Block 類型 (Normal cell or Reduction cell) (二選一)
    - 選一個 Block 進行變體 (五選一)
    - 選一個 Block 的輸入分支 (hidden layer A or hidden layer B) (二選一)
    - 使用第一步驟選出的輸入變體或操作對 Block 做變異
- [EfficientNet](https://arxiv.org/pdf/1905.11946.pdf) 在資源有限的情況下調整 resolution、depth、weight 來獲得最佳模型
  - depth: $d=\alpha^\phi$
  - width: $w=\beta^\phi$
  - resolution $r=\gamma^\phi$
  - constraint to $\alpha*\beta^2*\gamma^2 \approx 2$ and $\alpha, \beta, \gamma \geq 1$
  - 步驟:
    - 固定 $\phi=1$，利用 grid search 找最優 $\alpha, \beta, \gamma$
    - 固定 $\alpha, \beta, \gamma$，更改 $\phi$ 生成 EfficientNetB1~B7



# ResNet

In [None]:
import torchvision

In [None]:
torchvision.models.resnet50()

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 

In [None]:
import torch.nn as nn
class BottleNeck(nn.Module):

    def __init__(self, in_channels, out_channels, stride=1, downsample=None):
        super().__init__()

        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=1, padding=0, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channels),
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False),
        self.bn2 = nn.BatchNorm2d(out_channels),
        self.conv3 = nn.Conv2d(out_channels, in_channels, kernel_size=1, stride=1, padding=0),
        self.relu = nn.ReLU(inplace=True)

        self.downsample = downsample

    def forward(self, x):

        identity = x
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.conv2(x)
        x = self.bn2(x)
        x = self.conv3(x)

        if self.downsample:
            x = self.downsample(x) + identity
        x = self.relu(x)
        
        return x

class ResNet(nn.Module):

    def __init__(self, block, layers, num_classes):
        super().__init__()

        self.in_channels = 3
        out_channels = 64

        self.conv1 = nn.Conv2d(self.in_channels, out_channels, kernel_size=7, stride=2, padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)

        self.in_channels = out_channels

        self.layer1 = self._make_layer(block, 256, layers[0], 1)
        self.layer2 = self._make_layer(block, 32, layers[1], 2)
        self.layer3 = self._make_layer

        self.avg_pool = nn.AvgPool2d(8,ceil_mode=False)
        self.fc = nn.Linear(64, num_classes)

    def _make_layer(self, block, out_channels, num_blocks, stride=1):
        downsample = None
        if stride != 1 or self.in_channels != out_channels:
            downsample = nn.Sequential(
                nn.Conv2d(self.in_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False),
                nn.BatchNorm2d(out_channels)
            )
        
        layers = []
        layers.append(block(self.in_channels, out_channels, stride, downsample))
        self.in_channels = out_channels

        for i in range(num_blocks):
            layers.append(block(out_channels, out_channels))
        return nn.Sequential(*layers)



        

In [None]:
import torch.nn as nn

class CBR(nn.Module):

    def __init__(self, in_channels, out_channels, kernel_size, stride, padding):
        super().__init__()

        self.layers = nn.Sequential(
            nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding),
            nn.BatchNorm2d(out_channels),
            nn.ReLU(inplace=True),
        )

    def forward(self, x):
        return self.layers(x)

class IRB(nn.Module):

    def __init__(self, in_channels, out_channels, activation=nn.ReLU()):
        super().__init__()

        assert in_channels == out_channels

        self.layers = nn.Sequential(
            CBR(in_channels, out_channels, 3, 1, 1),
            CBR(out_channels, out_channels, 3, 1, 1),

            nn.Conv2d(out_channels, out_channels, 3, 1, 1),
            nn.BatchNorm2d(out_channels)
        )

        self.activation = activation

    def forward(self, x):
        identity = x
        x = self.layers(x)
        x = x + identity
        x = self.activation(x)
        return x

  
 class CRB(nn.Module):

    def __init__(self, in_channels, out_channels, activation=nn.ReLU()):
        super().__init__()

        self.layers = nn.Sequential(
            CBR(in_channels, out_channels, 3, 1, 1),
            CBR(out_channels, out_channels, 3, 1, 1),

            nn.Conv2d(out_channels, out_channels, 3, 1, 1),
            nn.BatchNorm2d(out_channels)
        )

        self.activation = activation

    def forward(self, x):
        identity = x
        x = self.layers(x)
        x = x + identity
        x = self.activation(x)
        return x       