<a href="https://colab.research.google.com/github/Eclipse-01/MMSegDamnMan/blob/main/demo/MMSegmentation_Tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# MMSegmentation 教程
欢迎来到 MMSegmentation！

在本教程中，我们将演示：
* 如何使用 MMSeg 训练权重进行推理
* 如何在自己的数据集上训练并可视化结果。

## 安装 MMSegmentation
此步骤可能需要几分钟。

本教程使用 PyTorch 1.10 和 CUDA 11.1。你可以通过更改 pip install 命令中的版本号来安装其他版本。

In [1]:
# 检查 nvcc 版本
!nvcc -V
# 检查 GCC 版本
!gcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Jun__6_02:18:23_PDT_2024
Cuda compilation tools, release 12.5, V12.5.82
Build cuda_12.5.r12.5/compiler.34385749_0
gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.



In [None]:
# 安装 PyTorch
!pip install torch==1.12.0 torchvision --extra-index-url https://download.pytorch.org/whl/cu113
# 安装 MMCV
!pip install openmim
!mim install mmcv-full==1.6.0

Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu113
[31mERROR: Could not find a version that satisfies the requirement torch==1.12.0 (from versions: 1.13.0, 1.13.1, 2.0.0, 2.0.1, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1, 2.2.2, 2.3.0, 2.3.1, 2.4.0, 2.4.1, 2.5.0, 2.5.1, 2.6.0, 2.7.0)[0m[31m
[0m[31mERROR: No matching distribution found for torch==1.12.0[0m[31m
Looking in links: https://download.openmmlab.com/mmcv/dist/cu124/torch2.6.0/index.html
Collecting mmcv-full==1.6.0
  Downloading mmcv-full-1.6.0.tar.gz (554 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m554.9/554.9 kB[0m [31m13.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting addict (from mmcv-full==1.6.0)
  Downloading addict-2.4.0-py3-none-any.whl.metadata (1.0 kB)
Collecting yapf (from mmcv-full==1.6.0)
  Downloading yapf-0.43.0-py3-none-any.whl.metadata (46 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [3

In [None]:
!rm -rf mmsegmentation
!git clone https://github.com/open-mmlab/mmsegmentation.git
%cd mmsegmentation
!pip install -e .

In [None]:
# 检查 Pytorch 安装
import torch, torchvision
print(torch.__version__, torch.cuda.is_available())

# 检查 MMSegmentation 安装
import mmseg
print(mmseg.__version__)

## 使用 MMSeg 训练权重进行推理

In [None]:
# 创建 checkpoints 文件夹并下载预训练权重
!mkdir checkpoints
!wget https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r50-d8_512x1024_40k_cityscapes/pspnet_r50-d8_512x1024_40k_cityscapes_20200605_003338-2966598c.pth -P checkpoints

In [None]:
# 导入 MMSegmentation 推理和可视化相关 API
from mmseg.apis import inference_segmentor, init_segmentor, show_result_pyplot
from mmseg.core.evaluation import get_palette

In [None]:
# 配置文件和权重文件路径
config_file = 'configs/pspnet/pspnet_r50-d8_512x1024_40k_cityscapes.py'
checkpoint_file = 'checkpoints/pspnet_r50-d8_512x1024_40k_cityscapes_20200605_003338-2966598c.pth'

In [None]:
# 从配置文件和权重文件构建模型
model = init_segmentor(config_file, checkpoint_file, device='cuda:0')

In [None]:
# 测试单张图片
img = 'demo/demo.png'
result = inference_segmentor(model, img)

In [None]:
# 显示分割结果
show_result_pyplot(model, img, result, get_palette('cityscapes'))

## 在新数据集上训练语义分割模型

要在自定义数据集上训练，需要完成以下步骤：
1. 添加新的数据集类。
2. 相应地创建配置文件。
3. 执行训练和评估。

### 添加新数据集

MMSegmentation 中的数据集要求图像和语义分割标注图放在具有相同前缀的文件夹下。为了支持新数据集，可能需要修改原始文件结构。

本教程以数据集转换为例。更多关于数据集重组的细节可参考[官方文档](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/en/tutorials/customize_datasets.md#customize-datasets-by-reorganizing-data)。

我们以 [Stanford Background Dataset](http://dags.stanford.edu/projects/scenedataset.html) 为例。该数据集包含 715 张图片，选自 [LabelMe](http://labelme.csail.mit.edu)、[MSRC](http://research.microsoft.com/en-us/projects/objectclassrecognition)、[PASCAL VOC](http://pascallin.ecs.soton.ac.uk/challenges/VOC) 和 [Geometric Context](http://www.cs.illinois.edu/homes/dhoiem/) 等公开数据集。图片主要为户外场景，每张约为 320x240 像素。
本教程中，我们使用区域标注作为标签。共有 8 个类别，分别为：天空、树、道路、草地、水、建筑、山、前景物体。

In [None]:
# 下载并解压数据集
!wget http://dags.stanford.edu/data/iccv09Data.tar.gz -O stanford_background.tar.gz
!tar xf stanford_background.tar.gz

In [None]:
# 查看数据集样例
import mmcv
import matplotlib.pyplot as plt

img = mmcv.imread('iccv09Data/images/6000124.jpg')
plt.figure(figsize=(8, 6))
plt.imshow(mmcv.bgr2rgb(img))
plt.show()

我们需要将标注转换为图像格式的语义分割图。

In [None]:
import os.path as osp
import numpy as np
from PIL import Image
# 将数据集标注转换为语义分割图像
data_root = 'iccv09Data'
img_dir = 'images'
ann_dir = 'labels'
# 定义类别和调色板以便更好地可视化
classes = ('sky', 'tree', 'road', 'grass', 'water', 'bldg', 'mntn', 'fg obj')
palette = [[128, 128, 128], [129, 127, 38], [120, 69, 125], [53, 125, 34],
           [0, 11, 123], [118, 20, 12], [122, 81, 25], [241, 134, 51]]
for file in mmcv.scandir(osp.join(data_root, ann_dir), suffix='.regions.txt'):
  seg_map = np.loadtxt(osp.join(data_root, ann_dir, file)).astype(np.uint8)
  seg_img = Image.fromarray(seg_map).convert('P')
  seg_img.putpalette(np.array(palette, dtype=np.uint8))
  seg_img.save(osp.join(data_root, ann_dir, file.replace('.regions.txt',
                                                         '.png')))

In [None]:
# 查看我们得到的分割图
import matplotlib.patches as mpatches
img = Image.open('iccv09Data/labels/6000124.png')
plt.figure(figsize=(8, 6))
im = plt.imshow(np.array(img.convert('RGB')))

# 为每种颜色创建图例
patches = [mpatches.Patch(color=np.array(palette[i])/255.,
                          label=classes[i]) for i in range(8)]
# 将图例添加到图中
plt.legend(handles=patches, bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,
           fontsize='large')

plt.show()

In [None]:
# 随机划分训练集和验证集
split_dir = 'splits'
mmcv.mkdir_or_exist(osp.join(data_root, split_dir))
filename_list = [osp.splitext(filename)[0] for filename in mmcv.scandir(
    osp.join(data_root, ann_dir), suffix='.png')]
with open(osp.join(data_root, split_dir, 'train.txt'), 'w') as f:
  # 前 4/5 作为训练集
  train_length = int(len(filename_list)*4/5)
  f.writelines(line + '\n' for line in filename_list[:train_length])
with open(osp.join(data_root, split_dir, 'val.txt'), 'w') as f:
  # 后 1/5 作为验证集
  f.writelines(line + '\n' for line in filename_list[train_length:])

下载数据后，我们需要在新的数据集类 `StanfordBackgroundDataset` 中实现 `load_annotations` 函数。

In [None]:
from mmseg.datasets.builder import DATASETS
from mmseg.datasets.custom import CustomDataset

@DATASETS.register_module()
class StanfordBackgroundDataset(CustomDataset):
  CLASSES = classes
  PALETTE = palette
  def __init__(self, split, **kwargs):
    super().__init__(img_suffix='.jpg', seg_map_suffix='.png',
                     split=split, **kwargs)
    assert osp.exists(self.img_dir) and self.split is not None



### 创建配置文件
下一步，我们需要修改训练用的配置文件。为了加快训练过程，我们将从已有权重进行微调。

In [None]:
from mmcv import Config
cfg = Config.fromfile('configs/pspnet/pspnet_r50-d8_512x1024_40k_cityscapes.py')

由于给定的配置文件用于在 cityscapes 数据集上训练 PSPNet，我们需要根据新数据集进行相应修改。

In [None]:
from mmseg.apis import set_random_seed
from mmseg.utils import get_device

# 由于只用一张 GPU，BN 替代 SyncBN
cfg.norm_cfg = dict(type='BN', requires_grad=True)
cfg.model.backbone.norm_cfg = cfg.norm_cfg
cfg.model.decode_head.norm_cfg = cfg.norm_cfg
cfg.model.auxiliary_head.norm_cfg = cfg.norm_cfg
# 修改 decode/auxiliary head 的类别数
cfg.model.decode_head.num_classes = 8
cfg.model.auxiliary_head.num_classes = 8

# 修改数据集类型和路径
cfg.dataset_type = 'StanfordBackgroundDataset'
cfg.data_root = data_root

cfg.data.samples_per_gpu = 8
cfg.data.workers_per_gpu=8

cfg.img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
cfg.crop_size = (256, 256)
cfg.train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations'),
    dict(type='Resize', img_scale=(320, 240), ratio_range=(0.5, 2.0)),
    dict(type='RandomCrop', crop_size=cfg.crop_size, cat_max_ratio=0.75),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(type='PhotoMetricDistortion'),
    dict(type='Normalize', **cfg.img_norm_cfg),
    dict(type='Pad', size=cfg.crop_size, pad_val=0, seg_pad_val=255),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_semantic_seg']),
]

cfg.test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(320, 240),
        # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75],
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(type='Normalize', **cfg.img_norm_cfg),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ])
]


cfg.data.train.type = cfg.dataset_type
cfg.data.train.data_root = cfg.data_root
cfg.data.train.img_dir = img_dir
cfg.data.train.ann_dir = ann_dir
cfg.data.train.pipeline = cfg.train_pipeline
cfg.data.train.split = 'splits/train.txt'

cfg.data.val.type = cfg.dataset_type
cfg.data.val.data_root = cfg.data_root
cfg.data.val.img_dir = img_dir
cfg.data.val.ann_dir = ann_dir
cfg.data.val.pipeline = cfg.test_pipeline
cfg.data.val.split = 'splits/val.txt'

cfg.data.test.type = cfg.dataset_type
cfg.data.test.data_root = cfg.data_root
cfg.data.test.img_dir = img_dir
cfg.data.test.ann_dir = ann_dir
cfg.data.test.pipeline = cfg.test_pipeline
cfg.data.test.split = 'splits/val.txt'

# 仍然可以使用预训练的 Mask RCNN 权重，虽然不需要 mask 分支
cfg.load_from = 'checkpoints/pspnet_r50-d8_512x1024_40k_cityscapes_20200605_003338-2966598c.pth'

# 设置工作目录用于保存文件和日志
cfg.work_dir = './work_dirs/tutorial'

cfg.runner.max_iters = 200
cfg.log_config.interval = 10
cfg.evaluation.interval = 200
cfg.checkpoint_config.interval = 200

# 设置随机种子以便复现结果
cfg.seed = 0
set_random_seed(0, deterministic=False)
cfg.gpu_ids = range(1)
cfg.device = get_device()

# 查看最终用于训练的配置
print(f'Config:\n{cfg.pretty_text}')

### 训练与评估

In [None]:
from mmseg.datasets import build_dataset
from mmseg.models import build_segmentor
from mmseg.apis import train_segmentor

# 构建数据集
datasets = [build_dataset(cfg.data.train)]

# 构建分割模型
model = build_segmentor(cfg.model)
# 添加类别属性，方便可视化
model.CLASSES = datasets[0].CLASSES

# 创建工作目录
mmcv.mkdir_or_exist(osp.abspath(cfg.work_dir))
train_segmentor(model, datasets, cfg, distributed=False, validate=True,
                meta=dict())

使用训练好的模型进行推理

In [None]:
img = mmcv.imread('iccv09Data/images/6000124.jpg')

model.cfg = cfg
result = inference_segmentor(model, img)
plt.figure(figsize=(8, 6))
show_result_pyplot(model, img, result, palette)

In [None]:
# 结束