# MMSegmentation训练语义分割模型

同济子豪兄 2023-2-13

## 进入MMSegmentation主目录

In [1]:
import os
os.chdir('../mmsegmentation')

In [2]:
os.getcwd()

'/home/featurize/work/MMSegmentation教程/mmsegmentation'

## 导入工具包

In [3]:
import numpy as np
from PIL import Image

import os.path as osp
from tqdm import tqdm

import mmcv
import mmengine
import matplotlib.pyplot as plt
%matplotlib inline

## 定义数据集类

In [9]:
!wget https://zihao-openmmlab.obs.cn-east-3.myhuaweicloud.com/20230130-mmseg/Dubai/DubaiDataset.py -P mmseg/datasets

--2023-02-15 17:32:35--  https://zihao-openmmlab.obs.cn-east-3.myhuaweicloud.com/20230130-mmseg/Dubai/DubaiDataset.py
正在连接 172.16.0.13:5848... 已连接。
已发出 Proxy 请求，正在等待回应... 200 OK
长度： 867 [binary/octet-stream]
正在保存至: “mmseg/datasets/DubaiDataset.py.1”


2023-02-15 17:32:35 (13.1 MB/s) - 已保存 “mmseg/datasets/DubaiDataset.py.1” [867/867])



In [10]:
!wget https://zihao-openmmlab.obs.cn-east-3.myhuaweicloud.com/20230130-mmseg/Dubai/__init__.py -P mmseg/datasets

--2023-02-15 17:33:10--  https://zihao-openmmlab.obs.cn-east-3.myhuaweicloud.com/20230130-mmseg/Dubai/__init__.py
正在连接 172.16.0.13:5848... 已连接。
已发出 Proxy 请求，正在等待回应... 200 OK
长度： 2714 (2.7K) [binary/octet-stream]
正在保存至: “mmseg/datasets/__init__.py.1”


2023-02-15 17:33:10 (82.6 MB/s) - 已保存 “mmseg/datasets/__init__.py.1” [2714/2714])



## 定义预处理

In [11]:
!wget https://zihao-openmmlab.obs.cn-east-3.myhuaweicloud.com/20230130-mmseg/Dubai/DubaiDataset_pipeline.py -P configs/_base_/datasets

--2023-02-15 17:33:39--  https://zihao-openmmlab.obs.cn-east-3.myhuaweicloud.com/20230130-mmseg/Dubai/DubaiDataset_pipeline.py
正在连接 172.16.0.13:5848... 已连接。
已发出 Proxy 请求，正在等待回应... 200 OK
长度： 2268 (2.2K) [binary/octet-stream]
正在保存至: “configs/_base_/datasets/DubaiDataset_pipeline.py”


2023-02-15 17:33:39 (48.7 MB/s) - 已保存 “configs/_base_/datasets/DubaiDataset_pipeline.py” [2268/2268])



## 下载config配置文件

In [12]:
!wget https://zihao-openmmlab.obs.cn-east-3.myhuaweicloud.com/20230130-mmseg/Dubai/pspnet_r50-d8_4xb2-40k_DubaiDataset.py -P configs/pspnet 

--2023-02-15 17:35:30--  https://zihao-openmmlab.obs.cn-east-3.myhuaweicloud.com/20230130-mmseg/Dubai/pspnet_r50-d8_4xb2-40k_DubaiDataset.py
正在连接 172.16.0.13:5848... 已连接。
已发出 Proxy 请求，正在等待回应... 200 OK
长度： 344 [binary/octet-stream]
正在保存至: “configs/pspnet/pspnet_r50-d8_4xb2-40k_DubaiDataset.py.1”


2023-02-15 17:35:30 (9.95 MB/s) - 已保存 “configs/pspnet/pspnet_r50-d8_4xb2-40k_DubaiDataset.py.1” [344/344])



## 载入config配置文件

In [13]:
from mmengine import Config
cfg = Config.fromfile('./configs/pspnet/pspnet_r50-d8_4xb2-40k_DubaiDataset.py')

## 修改config配置文件

In [14]:
cfg.norm_cfg = dict(type='BN', requires_grad=True) # 只使用GPU时，BN取代SyncBN
cfg.crop_size = (256, 256)
cfg.model.data_preprocessor.size = cfg.crop_size
cfg.model.backbone.norm_cfg = cfg.norm_cfg
cfg.model.decode_head.norm_cfg = cfg.norm_cfg
cfg.model.auxiliary_head.norm_cfg = cfg.norm_cfg
# modify num classes of the model in decode/auxiliary head
cfg.model.decode_head.num_classes = 6
cfg.model.auxiliary_head.num_classes = 6

cfg.train_dataloader.batch_size = 8

cfg.test_dataloader = cfg.val_dataloader

# 工作目录
cfg.work_dir = './work_dirs/tutorial'

# 训练迭代次数
cfg.train_cfg.max_iters = 1600
# 评估模型间隔
cfg.train_cfg.val_interval = 400
# 日志记录间隔
cfg.default_hooks.logger.interval = 100
# 模型权重保存间隔
cfg.default_hooks.checkpoint.interval = 400

# 随机数种子
cfg['randomness'] = dict(seed=0)

## 查看完整config配置文件

In [15]:
# print(cfg.pretty_text)

## 保存config配置文件

In [17]:
cfg.dump('pspnet-DubaiDataset_20230215.py')

## 准备训练

In [18]:
from mmengine.runner import Runner
from mmseg.utils import register_all_modules

# register all modules in mmseg into the registries
# do not init the default scope here because it will be init in the runner
register_all_modules(init_default_scope=False)
runner = Runner.from_cfg(cfg)

02/15 17:36:43 - mmengine - [4m[37mINFO[0m - 
------------------------------------------------------------
System environment:
    sys.platform: linux
    Python: 3.7.10 (default, Jun  4 2021, 14:48:32) [GCC 7.5.0]
    CUDA available: True
    numpy_random_seed: 0
    GPU 0: NVIDIA RTX A4000
    CUDA_HOME: /usr/local/cuda
    NVCC: Cuda compilation tools, release 11.2, V11.2.152
    GCC: gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
    PyTorch: 1.10.1+cu113
    PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX512
  - CUDA Runtime 11.3
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code

  'Default ``avg_non_ignore`` is False, if you would like to '


02/15 17:36:50 - mmengine - [4m[37mINFO[0m - Distributed training is not used, all SyncBatchNorm (SyncBN) layers in the model will be automatically reverted to BatchNormXd layers if they are used.
02/15 17:36:50 - mmengine - [4m[37mINFO[0m - Hooks will be executed in the following order:
before_run:
(VERY_HIGH   ) RuntimeInfoHook                    
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
before_train:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
before_train_epoch:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(NORMAL      ) DistSamplerSeedHook                
 -------------------- 
before_train_iter:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
 -------------------- 
after_train_iter:
(VERY_HIGH   ) Runti



## 开始训练

如果遇到报错`CUDA out of memeory`，重启实例或使用显存更高的实例即可。

In [19]:
runner.train()





02/15 17:38:15 - mmengine - [4m[37mINFO[0m - load model from: open-mmlab://resnet50_v1c
02/15 17:38:15 - mmengine - [4m[37mINFO[0m - Loads checkpoint by openmmlab backend from path: open-mmlab://resnet50_v1c

unexpected key in source state_dict: fc.weight, fc.bias

02/15 17:38:16 - mmengine - [4m[37mINFO[0m - Checkpoints will be saved to /home/featurize/work/MMSegmentation教程/mmsegmentation/work_dirs/tutorial.
02/15 17:38:19 - mmengine - [4m[37mINFO[0m - Exp name: pspnet_r50-d8_4xb2-40k_DubaiDataset_20230215_173642
02/15 17:38:38 - mmengine - [4m[37mINFO[0m - Iter(train) [ 100/1600]  lr: 9.9779e-03  eta: 0:05:41  time: 0.2100  data_time: 0.0075  memory: 5948  loss: 0.1080  decode.loss_ce: 0.0757  decode.acc_seg: 75.7721  aux.loss_ce: 0.0322  aux.acc_seg: 71.5393
02/15 17:39:00 - mmengine - [4m[37mINFO[0m - Iter(train) [ 200/1600]  lr: 9.9557e-03  eta: 0:05:08  time: 0.2130  data_time: 0.0075  memory: 3522  loss: 0.1019  decode.loss_ce: 0.0726  decode.acc_seg: 57.7820  

EncoderDecoder(
  (data_preprocessor): SegDataPreProcessor()
  (backbone): ResNetV1c(
    (stem): Sequential(
      (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
      (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU(inplace=True)
      (6): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (7): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (8): ReLU(inplace=True)
    )
    (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (layer1): ResLayer(
      (0): Bottleneck(
        (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-0