# 基于高度修改的Deeplab V3+模型的遥感图像分割方案

**25_天使天才天王队**

## 摘要

我们采用了**经过我们深度修改的Deeplab V3+模型**，对Decoder部分加入了基于Attention的改进以此增强模型对全局特征的提取能力。以加权的交叉熵和LovaszSoftmaxLoss损失函数训练了96轮，使用余弦学习率衰减策略和CutMix数据增强。达到了65%的mIOU

## 模型介绍

我们采用了经过我们深度修改的Deeplab V3+模型。DeepLab v3+通过encoder-decoder进行多尺度信息的融合，同时保留了Deeplab V3的空洞卷积和ASPP层， 其骨干网络使用了ResNet模型，提高了语义分割的健壮性和运行速率，在 PASCAL VOC 2012 dataset取得了state-of-art performance，89.0mIOU。原版Deeplab V3+的架构如下图

![原版Deeplab V3+](https://ai-studio-static-online.cdn.bcebos.com/31e5d281555f47a4a6114b323c9b74d5ce1cf72ea0304d55a5b4a6d3847740af)


为了更好地适应我们的任务的特点，即前景-背景不均衡、对全局特征提取能力要求较高的特点，我们对该模型做出了相应修改，主要有以下三点：

- 为低阶特征和ASPP模块的输出使用CBAM(Convolutional Block Attention Module)增加Attention机制。

- 在Decoder中层的卷积模块前后加入SE(Squeeze Excitation)模块引入Attention机制。

- 将Decoder末端的卷积模块改为可变卷积(Deformable Convolution)以提高感受野。

作为对比，我们修改后的模型结构如下图

![深度修改的Deeplab V3+](https://ai-studio-static-online.cdn.bcebos.com/406d268bf4a14960bb887c897b7b144e2b540096939a4fa6bb6031f56719dbd1)

通过在Decoder部分大量引入Attention机制并使用可变卷积加大卷积的感受野，我们有效地提高了mIoU。使该模型更好地满足了我们的任务的要求。虽然显著加深的Decoder使得训练过程变慢，但是大大提升了相应的性能，因此我们认为这是值得的。同时我们也尝试了更换backbone等修改，由于效果和速度的权衡，我们并没有最终采纳这些修改。

我们修改的Deeplab V3+模型实现见model/models/deeplabv3p.py，下面给出我们修改的Decoder部分的关键实现：

```python
class Decoder(nn.Layer):
    """
    Decoder module of DeepLabV3P model
    Args:
        num_classes (int): The number of classes.
        in_channels (int): The number of input channels in decoder module.
    """

    def __init__(self,
                 num_classes,
                 in_channels,
                 align_corners,
                 data_format='NCHW'):
        super(Decoder, self).__init__()

        self.data_format = data_format

        self.cbam_low = CBAM(channels=48)
        self.cbam_high = CBAM(channels=256)

        self.conv_bn_relu1 = layers.ConvBNReLU(
            in_channels=in_channels,
            out_channels=48,
            kernel_size=1,
            data_format=data_format)

        self.conv_bn_relu2 = layers.SeparableConvBNReLU(
            in_channels=304,
            out_channels=256,
            kernel_size=3,
            padding=1,
            data_format=data_format)

        self.se_in = SEBlock(channel=256)
        self.conv_bn_relu3 = layers.SeparableConvBNReLU(
            in_channels=256,
            out_channels=256,
            kernel_size=3,
            padding=1,
            data_format=data_format)
        self.se_out= SEBlock(channel=256)

        # self.conv = nn.Conv2D(
        #     in_channels=256,
        #     out_channels=num_classes,
        #     kernel_size=1,
        #     data_format=data_format)
        self.conv = DeformableConvV2(
            in_channels=256,
            out_channels=num_classes,
            kernel_size=1,
            data_format=data_format)
        
        self.align_corners = align_corners

    def forward(self, x, low_level_feat):
        # CBAM
        low_level_feat = self.conv_bn_relu1(low_level_feat)
        low_level_feat = self.cbam_low(low_level_feat)

        if self.data_format == 'NCHW':
            low_level_shape = paddle.shape(low_level_feat)[-2:]
            axis = 1
        else:
            low_level_shape = paddle.shape(low_level_feat)[1:3]
            axis = -1
        
        # CBAM
        x = self.cbam_high(x)
        x = F.interpolate(
            x,
            low_level_shape,
            mode='bilinear',
            align_corners=self.align_corners,
            data_format=self.data_format)
        x = paddle.concat([x, low_level_feat], axis=axis)

        x = self.conv_bn_relu2(x)

        # SE
        x = self.se_in(x)
        x = self.conv_bn_relu3(x)
        x = self.se_out(x)

        # DCN
        x = self.conv(x)

        return x

```

下面介绍我们加入的模块：

### CBMA模块

CBMA出自ECCV 2018的论文[CBAM: Convolutional Block Attention Module](https://openaccess.thecvf.com/content_ECCV_2018/papers/Sanghyun_Woo_Convolutional_Block_Attention_ECCV_2018_paper.pdf)。CBAM作为本文的亮点，将Attention同时运用在Channel和Spatial两个维度上。

![CBMA](https://ai-studio-static-online.cdn.bcebos.com/37cd8f14c5a540e6ab26e363d586ed5b8dc4c96ded2e4bbe97312bcd467f49f4)

如上图，特征图输入后，先进入通道注意力，基于特征图的宽、高进行 GAP、GMP，然后经过 MLP得到通道的注意力权重，然后通过 Sigmoid 函数获得归一化注意力权重，最后通过乘法逐通道加权到原始输入特征图上，完成通道注意力对原始特征的重新标定。

为了获得在空间维度的注意力特征，经通道注意力输出的特征图同样基于特征图的宽度和高度进行全局最大池化和全局平均池化，将特征维度由 H×W 转变成1×1，接着经过卷积核为 7×７的卷积和 Relu 激活函数后降低特征图的维度，然后在经过一次卷积后提升为原来的维度，最后将经过 Sigmoid 激活函数标准化处理后的特征图与通道注意力输出的特征图进行合并，从而在空间和通道两个维度上完成对特征图的重标定。

在空间注意力模块中，全局平均池化和最大池化获得了空间注意力特征，通过两个卷积建立了空间特征间的相关性，同时保持了输入输出维度的不变。通过卷积核为 7×７的卷积操作，极大地减少了参数和计算量，有利于建立高维度的空间特征相关性。经过 CBAM 后，新的特征图将得到通道和空间维度上的注意力权重，大大提高了各个特征在通道和空间上的联系，更有利于提取目标的有效特征。

我们的CBMA实现在model/ops/cbam.py中，下面是详细的代码实现：

```python

import paddle
import paddle.nn as nn

class ChannelAttention(nn.Layer):
    def __init__(self, in_planes, rotio=16):
        super(ChannelAttention, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2D(1)
        self.max_pool = nn.AdaptiveMaxPool2D(1)

        self.sharedMLP = nn.Sequential(
            nn.Conv2D(in_planes, in_planes // rotio , 1, bias_attr=False),
            nn.ReLU(),
            nn.Conv2D(in_planes // rotio, in_planes, 1, bias_attr=False))
        
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        avgout = self.sharedMLP(self.avg_pool(x))
        maxout = self.sharedMLP(self.max_pool(x))
        return self.sigmoid(avgout + maxout)

class SpatialAttention(nn.Layer):
    def __init__(self, kernel_size=7):
        super(SpatialAttention, self).__init__()
        assert kernel_size in (3,7), "kernel size must be 3 or 7"

        padding = 3 if kernel_size == 7 else 1

        self.conv = nn.Conv2D(2,1,kernel_size, padding=padding, bias_attr=False)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        avgout = paddle.mean(x, axis=1, keepdim=True)
        maxout = paddle.max(x, axis=1, keepdim=True)
        x = paddle.concat([avgout, maxout], axis=1)
        x = self.conv(x)
        return self.sigmoid(x)

class CBAM(nn.Layer):
    def __init__(self, channels, reduction=16):
        super(CBAM, self).__init__()
        self.ca = ChannelAttention(channels,rotio=reduction)
        self.sa = SpatialAttention()

    def forward(self, x):
        x = self.ca(x) * x  
        x = self.sa(x) * x 

        return x

```


### SE模块

SE模块出自论文[Squeeze-and-Excitation Networks](https://arxiv.org/abs/1709.01507)。结构如下图

![SE](https://ai-studio-static-online.cdn.bcebos.com/30ea4e45293e49629adde567bad469ec6eb3f786b01a43bb8e14a7dae407f2f3)

SE是一种通道注意力机制。由于特征压缩和FC的存在，其捕获的通道注意力特征是具有全局信息的。它可以自适应的调整各通道的特征响应值，对通道间的内部依赖关系进行建模。SE模块的计算有以下三个步骤：

- Squeeze: 沿着空间维度进行特征压缩，将每个二维的特征通道变成一个数，是具有全局的感受野。
- Excitation: 每个特征通道生成一个权重，用来代表该特征通道的重要程度。
- Reweight：将Excitation输出的权重看做每个特征通道的重要性，通过相乘的方式作用于每一个通道上。

我们的SE模块实现在model/ops/se_block.py中，下面给出具体实现代码

```python
import paddle
import paddle.nn as nn

class SEBlock(nn.Layer):
    def __init__(self, channel, reduction=16):
        super(SEBlock, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2D(1)  # 全局自适应池化
        self.fc = nn.Sequential(
            nn.Linear(channel, channel // reduction, bias_attr=False),
            nn.ReLU(),
            nn.Linear(channel // reduction, channel, bias_attr=False),
            nn.Sigmoid()
        )

    def forward(self, x):
        b, c, _, _ = x.shape
        y = self.avg_pool(x).reshape([b, c]) # squeeze操作
        y = self.fc(y).reshape([b, c, 1, 1]) # FC获取通道注意力权重，是具有全局信息的
        return x * y # 注意力作用每一个通道上

```


### 可变卷积

可变卷积由论文[Deformable Convolutional Networks](https://arxiv.org/pdf/1703.06211.pdf)提出，结构如下图

![](https://ai-studio-static-online.cdn.bcebos.com/1c9dab5d34954fc3be370d2dc19316debc95cfb737ac416c9aad6ee724c0a646)

由于固定的几何结构，CNN固有地限制了对几何形状的感知能力。该论文引入了可变卷积来解决这一问题。它在没有额外监督的情况下，通过额外的偏移量来增加模块中的空间采样位置，并从目标任务中学习偏移量。新模块可以替换CNN中的普通卷积模块，并且能通过标准的反向传播进行端到端训练，得到的成为可变卷积网络。我们借助PaddlePaddle的DeformableConv2D算子实现了一个普通卷积层的“即插即用型”替换模块，在model/ops/dcn.py中，下面给出具体实现：

```python
import paddle
import paddle.nn as nn
import paddle.nn.functional as F

from paddle.vision.ops import DeformConv2D

class DeformableConvV2(nn.Layer):
    def __init__(self,
                 in_channels,
                 out_channels,
                 kernel_size,
                 stride=1,
                 padding=0,
                 dilation=1,
                 groups=1,
                 weight_attr=None,
                 bias_attr=None,
                 regularizer=None,
                 lr_scale=1.,
                 skip_quant=False,
                 dcn_bias_regularizer=paddle.regularizer.L2Decay(0.),
                 dcn_bias_lr_scale=2.,
                 data_format="NCHW"):
        super().__init__()
        self.offset_channel = 2 * kernel_size**2
        self.mask_channel = kernel_size**2

        offset_bias_attr = paddle.ParamAttr(
            initializer=nn.initializer.Constant(0.),
            learning_rate=lr_scale,
            regularizer=regularizer)

        self.conv_offset = nn.Conv2D(
            in_channels,
            3 * kernel_size**2,
            kernel_size,
            stride=stride,
            padding=(kernel_size - 1) // 2,
            weight_attr=paddle.ParamAttr(initializer=nn.initializer.Constant(0.0)),
            bias_attr=offset_bias_attr,
            data_format=data_format)

        if bias_attr:
            # in FCOS-DCN head, specifically need learning_rate and regularizer
            dcn_bias_attr = paddle.ParamAttr(
                initializer=nn.initializer.Constant(value=0),
                regularizer=dcn_bias_regularizer,
                learning_rate=dcn_bias_lr_scale)
        else:
            # in ResNet backbone, do not need bias
            dcn_bias_attr = False
        self.conv_dcn = DeformConv2D(
            in_channels,
            out_channels,
            kernel_size,
            stride=stride,
            padding=(kernel_size - 1) // 2 * dilation,
            dilation=dilation,
            groups=groups,
            weight_attr=weight_attr,
            bias_attr=dcn_bias_attr)

    def forward(self, x):
        offset_mask = self.conv_offset(x)
        offset, mask = paddle.split(
            offset_mask,
            num_or_sections=[self.offset_channel, self.mask_channel],
            axis=1)
        mask = F.sigmoid(mask)
        y = self.conv_dcn(x, offset, mask=mask)
        return y

```




## 模型训练

由于我们的修改使得模型的参数量大大增加，为了满足AI Studio单次运行不超过72小时的限制，我们采用了相对复杂的训练方案，具体如下：


### 训练策略


1. 先使用原版Deeplab V3+网络进行80轮训练，并保存相应参数。该过程的学习率为0.05，使用交叉熵损失，余弦学习率衰减。

2. 将参数加载到修改的Deeplan V3+中，冻结Backbone的参数，对Decoder进行32轮的训练。该过程的学习率为0.002，使用加权的交叉熵损失，余弦学习率衰减。

3. 解除对Backbone的冻结，再进行64轮训练，使用加权交叉熵和LovaszSoftmaxLoss混合进行训练，使用多项式学习率衰减。

该过程虽然较为复杂，但有效地平衡了较大的模型对训练时间的需求同AI Studio的限制之间的矛盾，取得了较好的效果。

### 数据增强

我们主要使用CutMix进行数据增强。CutMix将CutOut和Mixup结合。CutMix相比于Cutout就是将区域删除操作变成截取另外一张图片一样大小的区域填充该区域，同时改变新图片的标签。我们的数据集类负责对CutMix进行实现，代码文件为model/data/dataset.py。下面给出 CutMix部分的实现：

```python
    def rand_bbox(self,size, lam):
        W,H=size
        cut_rat = np.sqrt(1. - lam)
        cut_w = np.int(W * cut_rat)
        cut_h = np.int(H * cut_rat)

        # uniform
        cx = np.random.randint(W)
        cy = np.random.randint(H)

        bbx1 = np.clip(cx - cut_w // 2, 0, W)
        bby1 = np.clip(cy - cut_h // 2, 0, H)
        bbx2 = np.clip(cx + cut_w // 2, 0, W)
        bby2 = np.clip(cy + cut_h // 2, 0, H)

        return bbx1, bby1, bbx2, bby2

    def do_cutmix(self,img,lbl):
        if not self.enable_cutmix:
            return img,lbl
        
        if random.uniform(0,1)<self.cutmix_threshold:
            return img,lbl

        idx = random.randrange(1,len(self.files)-1)
        nimg,nlbl=self.do_getitem(idx)
        bbx1, bby1, bbx2, bby2 = self.rand_bbox(self.img_size, self.cutmix_lambda)#随机产生一个box的四个坐标
        img[:, bbx1:bbx2, bby1:bby2] = nimg[:, bbx1:bbx2, bby1:bby2]
        lbl[bbx1:bbx2, bby1:bby2] = nlbl[bbx1:bbx2, bby1:bby2]

        return img,lbl
```

下面是训练流程的实现

In [1]:
!mkdir /home/aistudio/external-libraries
!mkdir /home/aistudio/work/checkpoints

!pip install imgaug -i https://mirror.baidu.com/pypi/simple
!pip install paddleseg -i https://mirror.baidu.com/pypi/simple

import sys
sys.path.append('/home/aistudio/external-libraries')

mkdir: 无法创建目录"/home/aistudio/external-libraries": 文件已存在
mkdir: 无法创建目录"/home/aistudio/work/checkpoints": 文件已存在
Looking in indexes: https://mirror.baidu.com/pypi/simple
You should consider upgrading via the '/opt/conda/envs/python35-paddle120-env/bin/python -m pip install --upgrade pip' command.[0m[33m
[0mLooking in indexes: https://mirror.baidu.com/pypi/simple
You should consider upgrading via the '/opt/conda/envs/python35-paddle120-env/bin/python -m pip install --upgrade pip' command.[0m[33m
[0m



## 训练过程
以下是训练参数，由于CBAM和SE模块使得模型较为难以训练，因此训练72轮。初始学习率为0.005，学习率衰减为余弦衰减。训练过程长约270小时

In [2]:
train_parameters = {
    "test_path":"/home/aistudio/data/data80164/img_test.zip", 
    "train_path":"/home/aistudio/data/data80164/train_and_label.zip", 
    "data_path":"/home/aistudio/data/",                     #要解压的路径
    "label_dict":{},
    "skip_steps": 100,
    "save_steps": 300, 
    "image_size":512,
    "learning_strategy": {                                    #优化函数相关的配置
        "lr": 0.005,                                          #超参数学习率
        "cos_decay_T":100,  
        "step_decay_step":100,
        "workers":4,
        "batch_size":16,
        "epochs":96,
        "train_iters":80000
    },
    "learning_strategy_warmup": {                               
        "lr": 0.001,                                     
        "cos_decay_T":100,  
        "step_decay_step":100,
        "workers":4,
        "batch_size":8,
        "epochs":5,
        "train_iters":80000
    },
    "max_items":80000,
    "checkpoints": "/home/aistudio/work/checkpoints",    
    "pretrained": "/home/aistudio/work/pretrained"         
}

In [3]:
!mkdir /home/aistudio/work/checkpoints
!mkdir /home/aistudio/work/pretrained

mkdir: 无法创建目录"/home/aistudio/work/checkpoints": 文件已存在
mkdir: 无法创建目录"/home/aistudio/work/pretrained": 文件已存在


## 数据处理
将数据写入文本文件中，并准备数据集

In [4]:
import os
import zipfile
import random
import json
import paddle
import sys
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
from paddle.io import Dataset
import paddle.fluid as fluid
from pathlib import Path
import imgaug.augmenters as iaa

  from collections import MutableMapping
  from collections import Iterable, Mapping
  from collections import Sized


In [5]:
from datetime import datetime
import time

random.seed(datetime.now())
np.random.seed(int(time.time()))

In [6]:
def unzip_data(src_path,target_path,postfix):
    if not os.path.isdir(target_path + postfix):     
        z = zipfile.ZipFile(src_path, 'r')
        z.extractall(path=target_path + postfix)
        z.close()

data_path=train_parameters['data_path']

unzip_data(train_parameters['train_path'],data_path,"train")
unzip_data(train_parameters['test_path'],data_path,"test")

In [7]:
datas = []

path_base=train_parameters['data_path']+'train/'

image_base = path_base +'img_train'   # 训练集原图路径
annos_base = path_base +'lab_train'   # 训练集标签路径

ids_ = [v.split('.')[0] for v in os.listdir(image_base)]

# 将训练集的图像集和标签路径写入datas中
for id_ in ids_:
    img_pt0 = os.path.join(image_base, '{}.jpg'.format(id_))
    img_pt1 = os.path.join(annos_base, '{}.png'.format(id_))
    datas.append((img_pt0.replace(path_base, ''), img_pt1.replace(path_base, '')))
    if os.path.exists(img_pt0) and os.path.exists(img_pt1):
        pass
    else:
        raise "path invalid!"

# 打印datas的长度和具体存储例子
print('total:', len(datas))
print(datas[0][0])
print(datas[0][1])
print(datas[10])

total: 66652
img_train/T124582.jpg
lab_train/T124582.png
('img_train/T039627.jpg', 'lab_train/T039627.png')


In [8]:
import numpy as np

# 四类标签，这里用处不大，比赛评测是以0、1、2、3类来对比评测的
labels = ['建筑', '耕地', '林地',  '其他']

# 将labels写入标签文件
with open('labels.txt', 'w') as f:
    for v in labels:
        f.write(v+'\n')

MAX_ITEMS=train_parameters['max_items']
EVAL_PORTION=0.015
EVAL_MAX=180

if len(datas)>MAX_ITEMS:
    datas=random.sample(datas,MAX_ITEMS)

np.random.seed(5)
np.random.shuffle(datas)

# 验证集与训练集的划分，0.05表示5%为训练集，95%为训练集
split_num = int(EVAL_PORTION*len(datas))
split_num = EVAL_MAX if split_num>EVAL_MAX else split_num

# 划分训练集和验证集
train_data = datas[:-split_num]
valid_data = datas[-split_num:]

# 写入训练集list
with open('train_list.txt', 'w') as f:
    for img, lbl in train_data:
        f.write(img + ' ' + lbl + '\n')

# 写入验证集list
with open('valid_list.txt', 'w') as f:
    for img, lbl in valid_data:
        f.write(img + ' ' + lbl + '\n')

# 打印训练集和测试集大小
print('train:', len(train_data))
print('valid:', len(valid_data))

train: 66472
valid: 180


## 数据集
定义数据集，数据集实现了CutMix数据增强，实现见model/data/dataset.py

In [9]:
from model.data.dataset import SegmentDataset

img_size_tuple=(train_parameters["image_size"],train_parameters["image_size"])

train_dataset=SegmentDataset(base_path=path_base,
    file_list='train_list.txt',
    img_size=img_size_tuple)

eval_dataset=SegmentDataset(base_path=path_base,
    file_list='valid_list.txt',
    img_size=img_size_tuple)

In [10]:
def assert_is_label(lbl):
    assert len(lbl[lbl==0])+len(lbl[lbl==1])+len(lbl[lbl==2])+len(lbl[lbl==3])+len(lbl[lbl==255])==len(lbl.flatten().ravel())

In [11]:
print(len(train_dataset))
print(len(eval_dataset))

import matplotlib.pyplot as plt
img,lbl=train_dataset[random.randint(0,len(train_dataset)-1)]

assert_is_label(lbl)

# plt.imshow(np.transpose(img, (1, 2, 0)))
# plt.show()
# plt.imshow(lbl)
# plt.show()


66472
180


In [12]:
def find_lastest():
    path=train_parameters['checkpoints']+"/"+'best.pdparames'
    
    if not os.path.exists(train_parameters['checkpoints']):
        return None

    if not os.path.exists(path):
        return None
    
    return path
    

In [13]:
find_lastest()

In [14]:
!wget -O work/checkpoints/best.pdparams https://bj.bcebos.com/v1/ai-studio-online/netdisk/40f121450fab4a439525932234ca0d48d91f544fb164434cbb4cb06fe0724037?responseContentDisposition=attachment%3B%20filename%3Dbest0615.pdparames&authorization=bce-auth-v1%2F0ef6765c1e494918bc0d4c3ca3e5c6d1%2F2022-06-15T12%3A10%3A24Z%2F-1%2F%2F8ed73e87997df85c38f1e7d90615232e5978e54f0c2b2bbfe3bc5355d413ffaa
!wget -O work/pretrained/resnet.tar.gz https://bj.bcebos.com/paddleseg/dygraph/resnet50_vd_ssld_v2.tar.gz

## 模型定义

我们采用Deeplab v3+模型。同时在模型的decoder部分加入了Squeeze-Excitation模块和CBAM模块，并将最后一个卷积层换成可变卷积。
具体实现见model/models/deeplabv3p.py


我们进一步对decoder部分进行改进，通过加入CBAM和SE模块来引入Attention机制，加强了模型对全局特征的提取能力，同时可变卷积的引入大大增强了最后一层卷积的感受野。


In [15]:
import paddleseg.core

from model.models.deeplabv3p import DeepLabV3P
from model.backbones.resnetvd import ResNet50_vd

# model:
#   type: DeepLabV3P
#   backbone:
#     type: ResNet50_vd
#     output_stride: 8
#     multi_grid: [1, 2, 4]
#     pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet50_vd_ssld_v2.tar.gz
#   num_classes: 19
#   backbone_indices: [0, 3]
#   aspp_ratios: [1, 12, 24, 36]
#   aspp_out_channels: 256
#   align_corners: False
#   pretrained: null


def get_backbone():
    backbone = ResNet50_vd(output_stride=8,
        multi_grid=[1, 2, 4],
        pretrained="https://bj.bcebos.com/paddleseg/dygraph/resnet50_vd_ssld_v2.tar.gz")
    return backbone


def get_new_model():
    backbone =get_backbone()

    model = DeepLabV3P(
        num_classes=4,
        backbone=backbone,
        backbone_indices=[0, 3],
        aspp_ratios=[1, 12, 24, 36],
        aspp_out_channels=256,
        align_corners=True
    )
    return model

def get_trained_model():
    trained_path=find_lastest()
    if trained_path is None:
        return get_new_model()

    if not os.path.exists(trained_path):
        return get_new_model()

    backbone = get_backbone()

    model = DeepLabV3P(
        num_classes=4,
        backbone=backbone,
        backbone_indices=[0, 3],
        aspp_ratios=[1, 12, 24, 36],
        aspp_out_channels=256,
        align_corners=True,
        pretrained=trained_path
    )

    if trained_path is not None:
        print("Load previous state")
        model.set_state_dict(paddle.load(trained_path))
    
    return model

def save_model(model,miou):
    print('Model of mIoU {} saving...'.format(miou))
    paddle.save(model.state_dict(),  train_parameters['checkpoints']+"/"+'best.pdparames')

def save_checkpoint(model,epoch):
    print('Checkpoint {} saving...'.format(epoch))
    paddle.save(model.state_dict(),  train_parameters['checkpoints']+"/"+'epoch{}.pdparames'.format(epoch))

In [16]:
train_config = {
    "optimizer":"sgd",
    "scheduler":"cosine",
    "scheduler_verbose":True
}

def get_scheduler(model,strategy):
    if train_config["scheduler"]=="cosine":
        return paddle.optimizer.lr.CosineAnnealingDecay(learning_rate=strategy['lr'],
            T_max=strategy['cos_decay_T'],
            verbose=train_config["scheduler_verbose"])
    elif train_config["scheduler"]=="step":
        return paddle.optimizer.lr.StepDecay(learning_rate=strategy['lr'],
            step_size=strategy['step_decay_step'],
            gamma=0.8,
            verbose=train_config["scheduler_verbose"])
    elif train_config["scheduler"]=="linear":
        return paddle.optimizer.lr.LinearWarmup(
            learning_rate=strategy['lr'],
            warmup_steps=20,
            start_lr=strategy['lr']/100.0,
            end_lr=strategy['lr'],
            verbose=train_config["scheduler_verbose"])
    elif train_config["scheduler"]=="poly":
        return paddle.optimizer.lr.PolynomialDecay(
            learning_rate=strategy['lr'],
            decay_steps=strategy['step_decay_step'],
            end_lr=strategy['lr']/5.0,
            verbose=train_config["scheduler_verbose"])

def get_optimizer(model,scheduler):
    if train_config["optimizer"]=="sgd":
        return paddle.optimizer.Momentum(learning_rate=scheduler,
                        use_nesterov=True,
                        weight_decay= paddle.regularizer.L2Decay(4.0e-5),
                        parameters=model.parameters())
    elif train_config["optimizer"]=="adamw":
        return paddle.optimizer.AdamW(learning_rate=scheduler,
                        parameters=model.parameters())

## 损失函数
我们使用加权的交叉熵损失和LovaszSoftmaxLoss混合训练，权重是7：3

In [17]:
from paddleseg.models.losses import LovaszSoftmaxLoss,OhemCrossEntropyLoss,SemanticConnectivityLoss,MixedLoss
from model.loss.focal import MultiClassFocalLoss

def get_losses():
    ce = paddle.nn.CrossEntropyLoss(axis=1,ignore_index=255)

    weights = paddle.to_tensor([0.26,0.21,0.25,0.28],dtype='float32')
    wce = paddle.nn.CrossEntropyLoss(weight=weights, axis=1,ignore_index=255)

    oce = OhemCrossEntropyLoss(ignore_index=255)
    
    lsl = LovaszSoftmaxLoss(ignore_index=255)

    # SCL（Semantic Connectivity-aware Learning）框架，它引入了SC Loss (Semantic Connectivity-aware Loss)，从连通性的角度提升分割结果的质量。支持多类别分割。
    sce = SemanticConnectivityLoss(ignore_index=255)

    # mixed = MixedLoss([wce,sce,lsl],[0.2,0.4,0.4])
    # return mixed
    mixed = MixedLoss([wce,lsl],[0.7,0.3])
    return mixed

def loss_combine(losses):
    if not isinstance(losses,list):
        return losses 
    else:
        step_loss = paddle.zeros_like(losses[0])
        for l in losses:
            step_loss+=l
        return step_loss

## 训练过程

以下定义了训练的过程，并输出MIOU。mIoU测度具体计算的实现见metrics/mIoU.py，这里给出关键部分:

```python
    def _fast_hist(self, label_pred, label_true):
        # 找出标签中需要计算的类别,去掉背景
        mask = (label_true >= 0) & (label_true < self.num_classes)
        # # np.bincount计算了从0到n**2-1这n**2个数中每个数出现的次数，返回值形状(n, n)
        hist = np.bincount(
            self.num_classes * label_true[mask].astype(int) +
            label_pred[mask], minlength=self.num_classes ** 2).reshape(self.num_classes, self.num_classes)
        return hist

    # 输入：预测值和真实值
    # 语义分割的任务是为每个像素点分配一个label
    def evaluate(self, predictions, gts):
        for lp, lt in zip(predictions, gts):
            assert len(lp.flatten()) == len(lt.flatten())
            self.hist += self._fast_hist(lp.flatten(), lt.flatten())
            
        # miou
        iou = np.diag(self.hist) / (self.hist.sum(axis=1) + self.hist.sum(axis=0) - np.diag(self.hist))
        miou = np.nanmean(iou) 

        return iou,miou
```

## 具体训练流程

In [18]:
from metrics.mIoU import IOUMetric

from tqdm import tqdm

miou = IOUMetric(4)

def train_one_epoch(model,epoch, epochs,optimizer,scheduler,loss,train_loader):
    print('Start Training...')
    print('Epoch/Epochs:{}/{}'.format(epoch, epochs))
    print('Train...')
    train_loss = 0
    train_miou = 0
    model.train()
    for batch_id, (img, label) in tqdm(enumerate(train_loader)):
        pred = model(img)
        pred=pred[0]
        step_loss = loss_combine(loss(pred, label))
        train_loss += step_loss.numpy()[0]

        # 计算miou, pred: num_loss * NCHW -> NHW 
        mask = np.argmax(pred.numpy(), axis=1)
        iou,step_miou = miou.evaluate(mask[0], label.numpy()[0])
        for i in range(1,mask.shape[0]):
            # print(mask[i].shape, label.shape)
            clsiou,alliou=miou.evaluate(mask[i], label.numpy()[i])
            iou+=clsiou
            step_miou+=alliou
            
        step_miou /= mask.shape[0]
        iou/=mask.shape[0]
        train_miou += step_miou

        step_loss.backward()
        optimizer.step()
        if (batch_id + 1) % 100 == 0:
            scheduler.step()
            print('Epoch/Epochs:{}/{} Batch/Batchs:{}/{} Step Loss:{} Step Miou:{} Class Miou:{}'.format(epoch, epochs, batch_id+1, len(train_loader), \
                                                                                            step_loss.numpy(), step_miou,iou))
        optimizer.clear_grad()
    
    print('Train Loss:{} Train Miou:{}'.format(train_loss/len(train_loader), train_miou/len(train_loader)))

def eval_one_epoch(model,epoch, epochs,optimizer,loss,eval_loader):
    print('Star Evalution...')
    val_loss = 0
    val_miou = 0
    val_iou=None
    model.eval()
    for batch_id, (img, label) in tqdm(enumerate(eval_loader)):
        pred = model(img)
        pred=pred[0]
        step_loss = loss_combine(loss(pred, label))
        val_loss += step_loss.numpy()[0]

        # 计算miou, pred: num_loss * NCHW -> NHW 
        mask = np.argmax(pred.numpy(), axis=1)
        iou,step_miou = miou.evaluate(mask[0], label.numpy()[0])
        for i in range(mask.shape[0]):
            # print(mask[i].shape, label.shape)
            clsiou,alliou=miou.evaluate(mask[i], label.numpy()[i])
            iou+=clsiou
            step_miou+=alliou
        step_miou /= mask.shape[0]
        iou /= mask.shape[0]
        val_miou += step_miou

        if val_iou is None:
            val_iou=iou
        else:
            val_iou+=iou

    print('Val Loss:{} Val Miou:{}, Class Miou:{}'.format(val_loss/len(eval_loader), val_miou/len(eval_loader),val_iou/len(eval_loader)))

    return val_miou/len(eval_loader)


def train(model, mode='train',epoches_override=None):
    key='learning_strategy'
    if mode=='warmup':
        key+='_warmup'

    strategy=train_parameters[key]

    scheduler = get_scheduler(model, strategy)
    assert scheduler

    optimizer = get_optimizer(model, scheduler)
    assert optimizer

    train_loader = paddle.io.DataLoader(train_dataset,
                        batch_size=strategy['batch_size'],
                        shuffle=True,
                        drop_last=True,
                        num_workers=strategy["workers"])

    eval_loader = paddle.io.DataLoader(eval_dataset,
                        batch_size=strategy['batch_size'],
                        shuffle=True,
                        drop_last=True,
                        num_workers=strategy["workers"])


    loss = get_losses()

    epoches=strategy['epochs']
    if epoches_override is not None:
        epoches = epoches_override

    best_miou=0
    for epoch in range(1,epoches+1):
        # with paddle.amp.auto_cast():
        train_one_epoch(model,epoch,epoches,optimizer,scheduler,loss,train_loader)
        
        save_checkpoint(model,epoch)
        
        with paddle.no_grad():
            iou=eval_one_epoch(model,epoch,epoches,optimizer,loss,eval_loader)
            if iou>=best_miou:
                save_model(model,iou)
                best_miou=iou



## 调用以上函数进行训练

In [None]:
from model.utils.freezer import Freezer

model=get_trained_model()

train(model)

W0618 22:55:46.885257 16705 gpu_context.cc:278] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 10.1
W0618 22:55:46.890126 16705 gpu_context.cc:306] device: 0, cuDNN Version: 7.6.


2022-06-18 22:55:50 [INFO]	Loading pretrained model from https://bj.bcebos.com/paddleseg/dygraph/resnet50_vd_ssld_v2.tar.gz
2022-06-18 22:55:51 [INFO]	There are 275/275 variables loaded into ResNet_vd.
Epoch 0: CosineAnnealingDecay set learning rate to 0.005.
Start Training...
Epoch/Epochs:1/96
Train...


  "When training, we now always track global mean and variance.")
  format(lhs_dtype, rhs_dtype, lhs_dtype))
55it [01:42,  1.88s/it]

## 生成提交结果
对测试集进行预测并打包

In [None]:
from tqdm import tqdm
import cv2

model = get_trained_model()

state_path = find_lastest()
assert state_path is not None
model.set_state_dict(paddle.load(state_path))

model.eval()

test_base = train_parameters['data_path']+"test/"+'img_testA/'    # 测试集路径
out_base = 'data/result/'        # 预测结果保存路径

# 是否存在结果保存路径，如不存在，则创建该路径
if not os.path.exists(out_base):
    os.makedirs(out_base)

# 模型预测并保存预测图片
for im in tqdm(os.listdir(test_base)):
    if not im.endswith('.jpg'):
        continue
        
    pt = test_base + im
    img = Image.open(pt)

    if img.mode != 'RGB':
        img = img.convert('RGB') 

    img = img.resize((512,512), Image.BILINEAR)
    img = np.array(img).astype('float32')
    img = img.transpose((2, 0, 1)) / 255

    pred = model(paddle.to_tensor([img]))
    result = np.argmax(pred[0].numpy(), axis=1)
    result = result[0]

    assert_is_label(result)
    result = cv2.resize(result,dsize=(256, 256),interpolation=cv2.INTER_NEAREST)
    assert_is_label(result)

    cv2.imwrite(out_base+im.replace('jpg', 'png'), result)

In [None]:
!zip -r result.zip data/result/