This is an introduction to the initial version of GFBN. GFBN is used for dual-branch multi-modal semantic segmentation. The author's main innovation lies in proposing the Cross-attention correction module and the feature fusion module of large convolution kernel. The code is mainly based on CMNext and CMX. If you want to use this code, please quote the two articles in the last page.
- 010/2023, init repository.
Prepare three datasets:
- NYU Depth V2, for RGB-Depth semantic segmentation.
- MFNet, for RGB-Thermal semantic segmentation.
- MCubeS, for multimodal material segmentation with RGB-A-D-N modalities.
Then, all datasets are structured as:
data/
├── NYUDepthv2
│ ├── RGB
│ ├── HHA
│ └── Label
├── MFNet
│ ├── rgb
│ ├── ther
│ └── labels
├── MCubeS
│ ├── polL_color
│ ├── polL_aolp
│ ├── polL_dolp
│ ├── NIR_warped
│ └── SS
Will come soon!
Before training, please download pre-trained SegFormer, such as checkpoints/pretrained/segformer/mit_b2.pth
.
checkpoints/pretrained/segformer
├── mit_b2.pth
└── mit_b4.pth
To train GFBN model, please use change yaml file for --cfg
. Several training examples using 4 A100 GPUs are:
cd path/to/GFBN
export PYTHONPATH="path/to/GFBN"
python -m torch.distributed.launch --nproc_per_node=4 --use_env tools/train_mm.py --cfg configs/nyu_rgbd.yaml
python -m torch.distributed.launch --nproc_per_node=4 --use_env tools/train_mm.py --cfg configs/mcubes_rgbadn.yaml
To evaluate GFBN models, please download respective model weights (GoogleDrive) as:
output/
├── MCubeS
│ ├── cmnext_b2_mcubes_rgb.pth
│ ├── cmnext_b2_mcubes_rgba.pth
│ ├── cmnext_b2_mcubes_rgbad.pth
│ └── cmnext_b2_mcubes_rgbadn.pth
Then, modify `--cfg` to respective config file, and run:
```bash
cd path/to/GFBN
export PYTHONPATH="path/to/GFBN"
CUDA_VISIBLE_DEVICES=0 python tools/val_mm.py --cfg configs/mcubes_rgbadn.yaml
This repository is under the Apache-2.0 license. For commercial use, please contact with the authors.
If you use GFBN model, please cite the following works:
- DeLiVER & CMNeXt [PDF]
@article{zhang2023delivering,
title={Delivering Arbitrary-Modal Semantic Segmentation},
author={Zhang, Jiaming and Liu, Ruiping and Shi, Hao and Yang, Kailun and Reiß, Simon and Peng, Kunyu and Fu, Haodong and Wang, Kaiwei and Stiefelhagen, Rainer},
journal={arXiv preprint arXiv:2303.01480},
year={2023}
}
- CMX [PDF]
@article{liu2022cmx,
title={CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers},
author={Liu, Huayao and Zhang, Jiaming and Yang, Kailun and Hu, Xinxin and Stiefelhagen, Rainer},
journal={arXiv preprint arXiv:2203.04838},
year={2022}
}