# Modify the width of the code block(Can ignore)

In [12]:
from IPython.core.display import display, HTML

display(HTML("<style>.container { width:90% !important; }</style>")) #<--- -----Change the width value[0-100] to what you want.

---

# EfficentNetV2: Smaller Models and Faster Training
[(Tan, & Le,  2021)](https://arxiv.org/pdf/2104.00298.pdf)
<img src="Assets/m4.png" width="70%"/>

## Abstract
In EfficientNetV1, the author focus on accuracy, parameters efficiency and FLOPs, but FLOPs are only indirect indicators of inference speed and cannot be used as speed evaluation criteria([Ma, Zhang, Zheng,& Sun, 2018](https://arxiv.org/abs/1807.11164)).
1. **FLOPs do not consider important factors that may affect speed**
 * **MAC (memory access cost):** such as Group Conv, it will take up a lot of running time.
 * **Degree of parallelism:** Represents the number of repeated stacking of Operators in the current Stage. the measurement of how many operations a computer can perform at the same time. In the same FLOPs, models with high parallelism may be faster than models with low parallelism.
2. **Run on different hardware:** In the same FLOPs, different hardware platforms will have different speeds.

In EfficientNetV2, the author aim to imporve the training speed while maintaining the parameters efficiency. It can be seen from the table that V2 has greater advantages in training speed and inference speed compared to V1.

<img src="Assets/m7.png" width="40%"/>

### EfficientNetV2 is worth paying attention to
In the past research, the main focus was on accuracy and parameter performance. In recent years, the improvement in accuracy has reached saturation, and attention has begun to focus on the training speed and inference speed of the network.
1. Introduce the Fused-MBConv module
2. Introduce a progressive learning strategy (training faster)

## EfficientNetV1 vs EfficientNetV2
### 1. The training speed is very slow when the size of the training image is large.
Reduce the size of the training image to speed up the training and use a larger batch_size.

<img src="Assets/m8.png" width="60%"/>

### 2. Using Depthwise convolutions in the shallow layer of the network will be very slow.

Using **Depthwise convolutions** in shallow networks can be very slow. It is not possible to make the most of  the existing accelerators (In theory, FLOPs are small, but they are not as fast as expected in fact), so the author introduced **Fused-MBConv**.

<img src="Assets/m9.png" width="60%"/>

### 3. Scaling up each stage equally is sub-optimal.
In EfficientNetV1, the depth and width of each stage are scaling up equally. However, each stage has different effects on the training speed and parameters of the network, so the strategy of directly using the same scaling factor is unreasonable. So the author uses a non-uniform scaling factor strategy to scale the model.

<img src="Assets/m10.png" width="50%"/>

##### EfficientNetV1[(Tan, Le, 2019)](https://arxiv.org/pdf/1905.11946.pdf)
1. **input_size** represents the image size of the input network when training the network.
2. **width_coefficient** represents the scaling up factor in the channel.
3. **depth_coefficient** represents the scaling up factor in the depth.
4. **drop_connect_rate** is the drop_rate used by the dropout layer (Stochastic Depth) in the MBConv (increasing from 0 to drop_connect_rate).
5. **dropout_rate** is the dropout_rate of dropout layer before the last FC layer.

## Result
1. The New Neural NetWork - EfficentV2, performs better than the previous network in terms of training and parameters.
2. A method to improve progressive learning is proposed, which dynamically adjusts the regularization method（Dropout、Rand Augment、Mixup） according to the size of the image to improve the training speed and accuracy.
3. The training speed is increased by 11 times (EfficientNet V2-M is compared with EfficientNet-B7), and the amount of parameters is reduced to 1/6.8.

<img src="Assets/m11.png" width="50%"/>



***

# Model Implement

<h3>Note: The use of Swish must torch>1.7 </h3>

In [13]:
import torch
print(f"Pytorch version: {torch.__version__}")

Pytorch version: 1.8.1


# Import libarary

In [29]:
import os
import sys
import glob

import json
import pickle

import random
import math

from collections import OrderedDict
from functools import partial
from typing import Callable, Optional

from PIL import Image
import matplotlib.pyplot as plt
import cv2

import torch
import torch.nn as nn
from torch import Tensor

import torch.optim as optim
import torch.optim.lr_scheduler as lr_scheduler
from torch.utils.tensorboard import SummaryWriter
from torchsummary import summary

from torch.utils.data import Dataset
import torchvision.transforms as transforms

from tqdm.notebook import tqdm

## Deep Networks with Stochastic Depth

<img src="Assets/m2.png" width="50%" style="float: left;"/>

##### Deep Networks with Stochastic Depth[(Huang, Sun, Liu, Sedra,& Weinberger, 2019)](https://arxiv.org/pdf/1603.09382.pdf)
<p>
1. Improve training speed.<br>
2. Small increase in accuracy. <br>
3. In EfficientNetV2, drop_prob ranges from 0 to 0.2(red block).<br>
4. Applied to the layer called <b>DropOut</b> in Fused-MBConv and MBConv.
</p>

Drop paths (Stochastic Depth) per sample when applied in main path of residual blocks.
This function is taken from the [rwightman](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/layers/drop.py#L140).

In [15]:
#================================================= * Define DropPath Function * =================================================#

def drop_path(x, drop_prob: float = 0., training: bool = False):
    if drop_prob == 0. or not training:
        return x
    keep_prob = 1 - drop_prob
    shape = (x.shape[0],) + (1,) * (x.ndim - 1)  # work with diff dim tensors, not just 2D ConvNets
    random_tensor = keep_prob + torch.rand(shape, dtype = x.dtype, device = x.device)
    random_tensor.floor_()  # binarize
    output = x.div(keep_prob) * random_tensor
    return output

#=================================================== * Define Model Class * =====================================================#

class DropPath(nn.Module):
    def __init__(self, drop_prob = None):
        super(DropPath, self).__init__()
        self.drop_prob = drop_prob
    
    def forward(self, x):
        #self.training: if set model.eval(), self.training = False
        return drop_path(x, self.drop_prob, self.training)  

## Squeeze-and-Excitation(SE)
<img src="Assets/m15.png" width="50%" style= "float:left"/>

##### Squeeze-and-Excitation Networks[(Hu, Shen, Albanie, Sun,& Wu, 2017)](https://arxiv.org/pdf/1709.01507.pdf)
<p>
1. Average Pooling<br>
2. The input channels of FC1 is <b>1/4 of the channels input to the MBConv</b>, and the <b>Swish</b> activation function is used.<br>
3. The input channels of FC2 is the <b>channels output of the Depthwise Conv(SE Input)</b>, and the <b>Sigmoid</b> activation function is used.
</p>    

In [16]:
class SqueezeExcite(nn.Module):
    def __init__(self,
                 input_channels: int,   # <------- MBConv Input channels.
                 expand_channels: int,  # <------- SE Inputchannels (DW Output channels).
                 se_ratio: float = 0.25):        
        super(SqueezeExcite, self).__init__()
        
        
        squeeze_channels = int(input_channels * se_ratio)
        
        #=============================================== * FC1 * =====================================================#
        self.conv_reduce = nn.Conv2d(expand_channels, squeeze_channels, 1)
        self.act1 = nn.SiLU()  # alias Swish
        
        #=============================================== * FC2 * =====================================================#
        self.conv_expand = nn.Conv2d(squeeze_channels, expand_channels, 1)
        self.act2 = nn.Sigmoid()

        
    def forward(self, x: Tensor) -> Tensor:
        scale = x.mean((2, 3), keepdim=True) # Global polling
        scale = self.conv_reduce(scale)      # FC1
        scale = self.act1(scale)             # Swish
        scale = self.conv_expand(scale)      # FC2
        scale = self.act2(scale)             # Sigmoid
        return scale * x

## Define Conv + BN + Act  Layer

In [17]:
class ConvBNAct(nn.Module):
    def __init__(self,
                 in_channels: int,
                 out_channels: int,
                 kernel_size: int = 3,
                 stride: int = 1,
                 groups: int = 1,
                 norm_layer: Optional[Callable[..., nn.Module]] = None,
                 activation_layer: Optional[Callable[..., nn.Module]] = None):        
        super(ConvBNAct, self).__init__()

        padding = (kernel_size - 1) // 2
        
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        if activation_layer is None:
            activation_layer = nn.SiLU

        self.conv = nn.Conv2d(in_channels  = in_channels,
                              out_channels = out_channels,
                              kernel_size  = kernel_size,
                              stride       = stride,
                              padding      = padding,
                              groups       = groups,
                              bias         = False)

        self.bn = norm_layer(out_planes)
        self.act = activation_layer()

    def forward(self, x):
        result = self.conv(x)
        result = self.bn(result)
        result = self.act(result)

        return result                                                                   

## MBConv Block
<img src="Assets/m14.png" width="60%"/>

##### EfficientNetV1[(Tan, Le, 2019)](https://arxiv.org/pdf/1905.11946.pdf)
MBConv is actually an improved version of **InvertedResidualBlock** in the MobileNetV3[(Howard, Sandler, Chu, Chen, Chen, Tan,...& Adam, 2019)](https://arxiv.org/pdf/1905.02244.pdf). The difference is that the Swish activation function is used in MBConv, and the SE (Squeeze-and-Excitation) module is added to each MBConv.

1. The first 1x1 convolutional layer used to increase the dimension, the number of convolution kernels is n times the input channel, n ∈ {1, 6}. **When n = 1, remove the first 1x1 convolutional layer** that used to increase dimensions, that is, the MBConv in Stage2 does not have the first 1x1 convolutional layer (similar to MobileNetV3).
2. DepwiseConv kxk, where k is the kernel size, there are 3x3 and 5x5 in EfficentV1.
3. **The shortcut only exists when the input and the output channel of MBConv have the same shape and stride ==1.**
4. SE (Squeeze-and-Excitation)
5. Conv1x1 is used to reduce dimensions.
6. **Dropout(Stochastic Depth) only exists when using shortcut.**

In [18]:
class MBConv(nn.Module):
    def __init__(self,
                 kernel_size: int,
                 input_channels: int,
                 out_channels: int,
                 expand_ratio: int,
                 stride: int,
                 se_ratio: float,
                 drop_rate: float,
                 norm_layer: Callable[..., nn.Module]):
        super(MBConv, self).__init__()
       
        
        if stride not in [1, 2]:
            raise ValueError("illegal stride value.")
            
        self.out_channels = out_channels       
        self.drop_rate = drop_rate

        #shortcut used condition.(stride == 1, in-channels == out-channels)
        self.has_shortcut = (stride == 1 and input_channels == out_channels)

        # alias Swish
        activation_layer = nn.SiLU  
        
        # In EfficientNetV2, there is no expansion=1 in MBConv, so conv_pw must exist.
        assert expand_ratio != 1
        
        #expand width
        expanded_channels = input_channels * expand_ratio
        
           
        #====================================================== * Point-wise Expansion(1x1, s1) * ==================================================#
        # in:*para-IN* || out:*para-IN* x ratio || kernel:1 || stride: 1 || group: 1 || Nomormal: *para*(BN) || Act: SiLu
        
        self.expand_conv = ConvBNAct(in_channels      = input_channels,
                                     out_channels     = expanded_channels,
                                     kernel_size      = 1,
                                     norm_layer       = norm_layer,
                                     activation_layer = activation_layer)
      
        #==================================================== * Depth-wise convolution(3x3, s1/s2) * ===============================================#
        # in:*para-IN* x ratio || out:*para-IN* x ratio || kernel: *para* || stride: *para* || group: *para-IN* x ratio || Nomormal: *para*(BN) || Act: SiLu
        
        self.dwconv = ConvBNAct(in_channels      = expanded_channels,
                                out_channels     = expanded_channels,
                                kernel_size      = kernel_size,
                                stride           = stride,
                                groups           = expanded_channels,
                                norm_layer       = norm_layer,
                                activation_layer = activation_layer)

        #======================================================== * Squeeze-and-Excitation(SE) * ===================================================#
        # input_channel:*para-IN*(MBConv-IN) || expanded_channel: *para-IN* x ratio(DW-OUT)
        
        self.SE = SqueezeExcite(input_channels   = input_channels,
                                expand_channels  = expanded_channels,
                                se_ratio         = se_ratio) if se_ratio > 0 else nn.Identity()
        
        #======================================================= * Point-wise linear projection * ==================================================#
        # in:*para-IN* x ratio || out: *para-OUT* || kernel: 1 || stride: 1 || group: 1 || Nomormal: *para*(BN) || Act: None
        
        self.project_conv = ConvBNAct(in_channels       = expanded_channels,
                                      out_channels      = out_channels,
                                      kernel_size       = 1,
                                      norm_layer        = norm_layer,
                                      activation_layer  = nn.Identity)  # There is no activation function, so use Identity (empty layer)


        #================================================================ * Drop Path * ============================================================#
        # Use the dropout layer only when using the shortcut.
        if self.has_shortcut and drop_rate > 0:
            self.dropout = DropPath(drop_rate)

            
    def forward(self, x: Tensor) -> Tensor:
        result = self.expand_conv(x)
        result = self.dwconv(result)
        result = self.SE(result)
        result = self.project_conv(result)

        #shortcut used condition.(stride == 1, in-channels == out-channels)
        if self.has_shortcut:
            if self.drop_rate > 0:
                result = self.dropout(result)
            result += x

        return result

## Fused-MBConv Block

<img src="Assets/m13.png" width="60%"/>

1. When **expansion =1**, On the main branch, there is **only 3×3 ProjectConv**, follow by BN, SILU(activation functions), and **Dropout**.
2. When **expansion !=1**, On the main branch, there is an **3×3ExpandConv** to increase dimensions , followed by BN and SILU(activation functions), and then **1×1ProjectConv**, final BN and **Dropout**.
3. The **shortcut** only exists when **stride=1 and the input channel is equal to the output channel** of the main branch.
4. **Dropout(Stochastic Depth) only exists when the Shortcut is used.**
5. **Fused-MBConv does not use Squeeze-and-Excitation(SE).**


 `Problems that occur when BN and Dropout are used together`[(Li, Chen, Hu, &Yang, 2018)](https://arxiv.org/pdf/1801.05134.pdf)

In [19]:
class FusedMBConv(nn.Module):
    def __init__(self,
                 kernel_size: int,
                 input_channels: int,
                 out_channels: int,
                 expand_ratio: int,
                 stride: int,
                 se_ratio: float,
                 drop_rate: float,
                 norm_layer: Callable[..., nn.Module]):
        super(FusedMBConv, self).__init__()

        
        assert stride in [1, 2]
        assert se_ratio == 0

        self.out_channels = out_channels        
        self.drop_rate = drop_rate
        
        # shortcut used condition.(stride == 1, in-channels == out-channels)
        self.has_shortcut = (stride == 1 and input_channels == out_channels)

        # alias Swish
        activation_layer = nn.SiLU  
        
        #expand width
        expanded_channels = input_channels * expand_ratio
        
        # Expand-Conv only exists when the expansion ratio !=1.
        self.has_expansion = (expand_ratio != 1)
        
        
        ################################################################### * Expansion != 1 * ###########################################################
        if self.has_expansion:
            
            #===================================================== * Expansion convolution(3x3, s1/S2) * ===============================================#
            # in:*para-IN* || out:*para-IN* x ratio || kernel: *para* || stride: *para* || group: 1 || Nomormal: *para*(BN) || Act: SiLu
            self.expand_conv = ConvBNAct(in_channels      = input_channels,
                                         out_channels     = expanded_channels,
                                         kernel_size      = kernel_size,
                                         stride           = stride,
                                         norm_layer       = norm_layer,
                                         activation_layer = activation_layer)
            
            #==================================================== * Point-wise linear projection(1x1, s1) * ============================================#
            # in:*para-IN* x ratio || out:*para-OUT* || kernel: 1 || stride: 1 || group: 1 || Nomormal: *para*(BN) || Act: None
            self.project_conv = ConvBNAct(in_channels      = expanded_channels,
                                          out_channels     = out_channels,
                                          kernel_size      = 1,
                                          norm_layer       = norm_layer,
                                          activation_layer = nn.Identity)  # There is no activation function, so use Identity (empty layer)
            
       
        ################################################################### * Expansion = 1 * ############################################################
        # When only project_conv exists.
        else:
            
            #=================================================== * Point-wise linear projection(3x3, s1/s2) * ==========================================#
            # in:*para-IN* || out:*para-OUT* || kernel: *para* || stride: *para* || group: 1 || Nomormal: *para*(BN) || Act: SiLu
            self.project_conv = ConvBNAct(in_channels      = in_channels,
                                          out_channels     = out_channels,
                                          kernel_size      = kernel_size,
                                          stride           = stride,
                                          norm_layer       = norm_layer,
                                          activation_layer = activation_layer)  # There is activation function.

        
        #=================================================================== * Drop Path * =============================================================#
        # Use the dropout layer only when using the shortcut.
        if self.has_shortcut and drop_rate > 0:
            self.dropout = DropPath(drop_rate)

            
    def forward(self, x: Tensor) -> Tensor:
        if self.has_expansion:
            result = self.expand_conv(x)
            result = self.project_conv(result)
        else:
            result = self.project_conv(x)

        # shortcut used condition.(stride == 1, in-channels == out-channels)
        if self.has_shortcut:
            if self.drop_rate > 0:
                result = self.dropout(result)
            result += x

        return result

# Define Efficentv2 Base Model

<img src="Assets/m12.png" width="50%" style="float: left;"/>
<p> 
<h5> Different with EfficentV1</h5>
1. In addition to using MBConv, Fused-MBConv module is also used.<br>
2. Use a smaller expansion ratio (In V1 is 6). <br>
3. Prefer to use smaller kernel_size(3×3). In V1, used 5x5.<br>
4. Removed the last stage with stride 1 in EfficientNetV1 (stage8 in V1_S)
</p>
<p>
<h5>Where:</h5>
1. <b>MBConv4</b> represents the expansion factor of the first convolutional layer on the main branch is 4.<br>
2. <b>SE0.25</b> represents the channels in the first full connected layer of the SE module is 1/4 of the channels of the input MBConv.<br>
3. <b>Layers</b> is the number of repeats.<br>
4. <b>Stride</b> that value only applies to the first block, else is 1.
</p>

## Parameters of each layer (model_cnf)

1. First dimension: represents Stages **(Excluding Stage0 and Stage7)**.
2. Second dimension:
> * [0] represents the number of **repeated stacking of Operators** in the current Stage.
  * [1] represents **kernel_size**.
  * [2] represents **stride**.
  * [3] represents **expansion ratio**.
  * [4] represents **input channels**.
  * [5] represents **output channels**.
  * [6] represents conv_type, **0 is Fused-MBConv**, **1 is MBConv**.
  * [7] represents using SE, and **se_ratio**.

In [20]:
class EfficientNetV2(nn.Module):
    def __init__(self,
                 model_cnf: list,
                 num_classes: int = 1000,
                 num_features: int = 1280,
                 dropout_rate: float = 0.2,
                 drop_connect_rate: float = 0.2):
        super(EfficientNetV2, self).__init__()

        # The len of each layer of para must be 8.
        for cnf in model_cnf:
            assert len(cnf) == 8

        # Bind BatchNorm param, eps = 10^-3, momentum = 0.1
        norm_layer = partial(nn.BatchNorm2d, eps = 1e-3, momentum=0.1)
        
        
        #========================================================= * Stage 0 * =========================================================#
        # Stage0 Conv (In: 3(RGB Image) || Out: stage1-IN || kernel: 3 || stride: 2 || norm: BN || Act: SiLu)
        
        self.stem = ConvBNAct(in_channels = 3,
                              out_channels = model_cnf[0][4],
                              kernel_size = 3,
                              stride = 2,
                              norm_layer = norm_layer)  # Default Act is SiLu
        
        #======================================================== * Stage 1-6 * ========================================================#
        total_blocks = sum([i[0] for i in model_cnf])
        block_id = 0
        blocks = []       
        for cnf in model_cnf:
            repeats = cnf[0]
            Operator_FMBConv = FusedMBConv if cnf[-2] == 0 else MBConv
            for i in range(repeats):
                blocks.append(Operator_FMBConv( kernel_size    = cnf[1],
                                                input_channels = cnf[4] if i == 0 else cnf[5],
                                                out_channels   = cnf[5],
                                                expand_ratio   = cnf[3],
                                                stride         = cnf[2] if i == 0 else 1,
                                                se_ratio       = cnf[-1],
                                                drop_rate      = drop_connect_rate * block_id / total_blocks,
                                                norm_layer     = norm_layer))
                block_id += 1                
        self.blocks = nn.Sequential(*blocks)
        
        #========================================================= * Stage 7 * =========================================================#
        
        head = OrderedDict()

        head.update({"project_conv": ConvBNAct(in_channels  = model_cnf[-1][-3],
                                               out_channels = num_features,
                                               kernel_size  = 1,
                                               norm_layer   = norm_layer)})  # Default Act is SiLu

        head.update({"avgpool": nn.AdaptiveAvgPool2d(1)})
        
        head.update({"flatten": nn.Flatten()})

        if dropout_rate > 0:
            head.update({"dropout": nn.Dropout(p = dropout_rate, inplace=True)})
            
        head.update({"classifier": nn.Linear(num_features, num_classes)})

        self.head = nn.Sequential(head)
        
        #====================================================== * Initial Weights * =====================================================#
        
        for m in self.modules():
            
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode = "fan_out")
                if m.bias is not None:
                    nn.init.zeros_(m.bias)
                    
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.ones_(m.weight)
                nn.init.zeros_(m.bias)
                
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.zeros_(m.bias)

                
    def forward(self, x: Tensor) -> Tensor:
        x = self.stem(x)
        x = self.blocks(x)
        x = self.head(x)

        return x

---

# Define EfficientNetV2_S Model

* Image_size:
  * train_size: 300
  * eval_size: 384
* Model_config: repeat, kernel, stride, expansion, in_c, out_c, operator, se_ratio

In [21]:
def efficientnetv2_s(num_classes: int = 1000):

    
    model_config = [[2, 3, 1, 1, 24, 24, 0, 0],
                    [4, 3, 2, 4, 24, 48, 0, 0],
                    [4, 3, 2, 4, 48, 64, 0, 0],
                    [6, 3, 2, 4, 64, 128, 1, 0.25],
                    [9, 3, 1, 6, 128, 160, 1, 0.25],
                    [15, 3, 2, 6, 160, 256, 1, 0.25]]

    model = EfficientNetV2(model_cnf    = model_config,
                           num_classes  = num_classes,
                           dropout_rate = 0.2)
    return model

# Define EfficientNetV2_M Model

* Image_size:
  * train_size: 384
  * eval_size: 480
* Model_config: repeat, kernel, stride, expansion, in_c, out_c, operator, se_ratio

In [22]:
def efficientnetv2_m(num_classes: int = 1000):
    
    model_config = [[3, 3, 1, 1, 24, 24, 0, 0],
                    [5, 3, 2, 4, 24, 48, 0, 0],
                    [5, 3, 2, 4, 48, 80, 0, 0],
                    [7, 3, 2, 4, 80, 160, 1, 0.25],
                    [14, 3, 1, 6, 160, 176, 1, 0.25],
                    [18, 3, 2, 6, 176, 304, 1, 0.25],
                    [5, 3, 1, 6, 304, 512, 1, 0.25]]

    model = EfficientNetV2(model_cnf    = model_config,
                           num_classes  = num_classes,
                           dropout_rate = 0.3)
    return model


# Define EfficientNetV2_L Model

* Image_size:
  * train_size: 384
  * eval_size: 480
* Model_config: repeat, kernel, stride, expansion, in_c, out_c, operator, se_ratio

In [24]:
def efficientnetv2_l(num_classes: int = 1000):
    
    model_config = [[4, 3, 1, 1, 32, 32, 0, 0],
                    [7, 3, 2, 4, 32, 64, 0, 0],
                    [7, 3, 2, 4, 64, 96, 0, 0],
                    [10, 3, 2, 4, 96, 192, 1, 0.25],
                    [19, 3, 1, 6, 192, 224, 1, 0.25],
                    [25, 3, 2, 6, 224, 384, 1, 0.25],
                    [7, 3, 1, 6, 384, 640, 1, 0.25]]

    model = EfficientNetV2(model_cnf     = model_config,
                           num_classes   = num_classes,
                           dropout_rat   = 0.4)
    return model

---

# Define DataSet torch class

The official implementation of **default_collate** can refer to [this](https://github.com/pytorch/pytorch/blob/67b7e751e6b5931a9f45274653f4f653a4e6cdf6/torch/utils/data/_utils/collate.py).

In [27]:
class MyDataSet(Dataset):
    def __init__(self, images_path: list, images_class: list, transform=None):
        self.images_path = images_path
        self.images_class = images_class
        self.transform = transform

    def __len__(self):
        return len(self.images_path)

    def __getitem__(self, item):
        img = Image.open(self.images_path[item])
        # RGB is a color image, L is a grayscale image.
        if img.mode != 'RGB':
            raise ValueError("image: {} isn't RGB mode.".format(self.images_path[item]))
            
        label = self.images_class[item]

        if self.transform is not None:
            img = self.transform(img)

        return img, label
    
    @staticmethod
    def collate_fn(batch):
        images, labels = tuple(zip(*batch))

        images = torch.stack(images, dim=0)
        labels = torch.as_tensor(labels)
        return images, labels

## Define Preprocess Function

In [None]:
def read_split_data(root: str, val_rate: float = 0.2):
    
    # Fixed random value.
    random.seed(0) 
    assert os.path.exists(root), "dataset root: {} does not exist.".format(root)

    # 遍历文件夹，一个文件夹对应一个类别
    flower_class = [cla for cla in os.listdir(root) if os.path.isdir(os.path.join(root, cla))]
    
    # 排序，保证顺序一致
    flower_class.sort()
    
    # 生成类别名称以及对应的数字索引
    class_indices = dict((k, v) for v, k in enumerate(flower_class))
    json_str = json.dumps(dict((val, key) for key, val in class_indices.items()), indent=4)
    with open('class_indices.json', 'w') as json_file:
        json_file.write(json_str)

    train_images_path = []  # 存储训练集的所有图片路径
    train_images_label = []  # 存储训练集图片对应索引信息
    val_images_path = []  # 存储验证集的所有图片路径
    val_images_label = []  # 存储验证集图片对应索引信息
    every_class_num = []  # 存储每个类别的样本总数
    
    supported = [".jpg", ".JPG", ".png", ".PNG"]  # 支持的文件后缀类型
    # 遍历每个文件夹下的文件
    for cla in flower_class:
        cla_path = os.path.join(root, cla)
        # 遍历获取supported支持的所有文件路径
        images = [os.path.join(root, cla, i) for i in os.listdir(cla_path)
                  if os.path.splitext(i)[-1] in supported]
        # 获取该类别对应的索引
        image_class = class_indices[cla]
        # 记录该类别的样本数量
        every_class_num.append(len(images))
        # 按比例随机采样验证样本
        val_path = random.sample(images, k=int(len(images) * val_rate))

        for img_path in images:
            if img_path in val_path:  # 如果该路径在采样的验证集样本中则存入验证集
                val_images_path.append(img_path)
                val_images_label.append(image_class)
            else:  # 否则存入训练集
                train_images_path.append(img_path)
                train_images_label.append(image_class)

    print("{} images were found in the dataset.".format(sum(every_class_num)))
    print("{} images for training.".format(len(train_images_path)))
    print("{} images for validation.".format(len(val_images_path)))

    plot_image = False
    if plot_image:
        # 绘制每种类别个数柱状图
        plt.bar(range(len(flower_class)), every_class_num, align='center')
        # 将横坐标0,1,2,3,4替换为相应的类别名称
        plt.xticks(range(len(flower_class)), flower_class)
        # 在柱状图上添加数值标签
        for i, v in enumerate(every_class_num):
            plt.text(x=i, y=v + 5, s=str(v), ha='center')
        # 设置x坐标
        plt.xlabel('image class')
        # 设置y坐标
        plt.ylabel('number of images')
        # 设置柱状图的标题
        plt.title('flower class distribution')
        plt.show()

    return train_images_path, train_images_label, val_images_path, val_images_label

In [None]:
 data_transform = {
        "train": transforms.Compose([transforms.RandomResizedCrop(img_size[num_model][0]),
                                     transforms.RandomHorizontalFlip(),
                                     transforms.ToTensor(),
                                     transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])]),
        "val": transforms.Compose([transforms.Resize(img_size[num_model][1]),
                                   transforms.CenterCrop(img_size[num_model][1]),
                                   transforms.ToTensor(),
                                   transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])])}