Multi-domain Evaluation of Semantic Segmentation (MESS) with ZSSeg

This directory contains the code for the MESS evaluation of ZSSeg. Please see the commits for our changes of the model.

Setup

Create a conda environment zsseg and install the required packages. See mess/README.md for details.

 bash mess/setup_env.sh

Prepare the datasets by following the instructions in mess/DATASETS.md. The zsseg env can be used for the dataset preparation. If you evaluate multiple models with MESS, you can change the dataset_dir argument and the DETECTRON2_DATASETS environment variable to a common directory (see mess/DATASETS.md and mess/eval.sh, e.g., ../mess_datasets).

Download the ZSSeg weights (see https://github.com/MendelXu/zsseg.baseline)

mkdir weights
conda activate zsseg
# Python code for downloading the weights from GDrive. Link: https://drive.google.com/file/d/1pb6UeXoMPy5xdEBtFcQYLOBKZt0xufKY/view
python -c "import gdown; gdown.download(f'https://drive.google.com/uc?export=download&confirm=pbef&id=1pb6UeXoMPy5xdEBtFcQYLOBKZt0xufKY', output='weights/model_final.pth')"

Evaluation

To evaluate the ZSSeg model on the MESS datasets, run

bash mess/eval.sh

# for evaluation in the background:
nohup bash mess/eval.sh > eval.log &
tail -f eval.log

Note that the mask threshold was changed from 0.5 to 0.4 in commit aa327d4. We used the original threshold of 0.5 in the evaluation.

For evaluating a single dataset, select the DATASET from mess/DATASETS.md, the DETECTRON2_DATASETS path, and run

conda activate zsseg
export DETECTRON2_DATASETS="datasets"
DATASET=<dataset_name>

python train_net.py --num-gpus 1 --eval-only --config-file configs/ade20k-150/cross_dataset_test_only.yaml MODEL.WEIGHTS weights/model_final.pth OUTPUT_DIR output/ZSSeg/$DATASET DATASETS.TEST \(\"$DATASET\",\) MODEL.CLIP_ADAPTER.MASK_THR 0.5

--- Original ZSSeg README.md ---

[ECCV2022] A Simple Baseline for Open Vocabulary Semantic Segmentation with Pre-trained Vision-language Model

This repo is for our ECCV2022 paper A Simple Baseline for Open Vocabulary Semantic Segmentation with Pre-trained Vision-language Model. It is based on the official repo of MaskFormer.

🎉[News] Our new work Side Adapter Network for Open-Vocabulary Semantic Segmentation was accepted as CVPR2023 hightlight paper. Code has been released.

@article{xu2021,
  title={A Simple Baseline for Open Vocabulary Semantic Segmentation with Pre-trained Vision-language Model},
  author={Mengde Xu, Zheng Zhang, Fangyun Wei, Yutong Lin,Yue Cao, Han Hu, and Xiang Bai},
  journal={Proceedings of the IEEE/CVF European Conference on Computer Vision (ECCV)},
  year={2022}
}

Guideline

Enviroment

torch==1.8.0
torchvision==0.9.0
detectron2==0.6 #Following https://detectron2.readthedocs.io/en/latest/tutorials/install.html to install it and some required packages
mmcv==1.3.14

FurtherMore, install the modified clip package.

cd third_party/CLIP
python -m pip install -Ue .

Data Preparation

In our experiments, five datasets are used.

For Cityscapes and ADE20k, follow the tutorial in MaskFormer.

For Pascal Context:

Download data from the official website and extract it like below.

Datasets/
    pcontext/
        #http://host.robots.ox.ac.uk/pascal/VOC/voc2010/VOCtrainval_03-May-2010.tar
        JPEGImages/
        #https://codalabuser.blob.core.windows.net/public/trainval_merged.json     
        trainval_merged.json

Format the data to d2 style. install detail packpage from https://github.com/zhanghang1989/detail-api and then

python datasets/prepare_pcontext_sem_seg.py --ori_root_dir datasets/pcontext-59 --save_dir datasets/pcontext-59

For COCO Stuff 164k:

Download data from the official dataset website and extract it like below.

Datasets/
     coco/
          #http://images.cocodataset.org/zips/train2017.zip
          train2017/ 
          #http://images.cocodataset.org/zips/val2017.zip
          val2017/   
          #http://images.cocodataset.org/annotations/annotations_trainval2017.zip
          annotations/ 
          #http://calvin.inf.ed.ac.uk/wp-content/uploads/data/cocostuffdataset/stuffthingmaps_trainval2017.zip
          stuffthingmaps/

Format the data to d2 style and split it into Seen (Base) subset and Unseen (Novel) subset.

python datasets/prepare_coco_stuff_164k_sem_seg.py datasets/coco

python tools/mask_cls_collect.py datasets/coco/stuffthingmaps_detectron2/train2017_base datasets/coco/stuffthingmaps_detectron2/train2017_base_label_count.pkl

python tools/mask_cls_collect.py datasets/coco/stuffthingmaps_detectron2/val2017 datasets/coco/stuffthingmaps_detectron2/val2017_label_count.pkl

For Pascal VOC 11k:

Download data from the offical dataset website and extract it like below.

datasets/
   VOC2012/
        #download http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar and extract it here.
        JPEGImages/
        # copy `ImageSets/Segmentation/val.txt` here.
        val.txt
        #Download auged annotations from http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/semantic_contours/benchmark.tgz and convert the original mat file to png format. Or download it from https://www.dropbox.com/s/oeu149j8qtbs1x0/SegmentationClassAug.zip?dl=0 (Provided in https://github.com/DrSleep/tensorflow-deeplab-resnet#evaluation).
        SegmentationClassAug/
        #https://gist.githubusercontent.com/sun11/2dbda6b31acc7c6292d14a872d0c90b7/raw/5f5a5270089239ef2f6b65b1cc55208355b5acca/trainaug.txt
        train.txt

Format the data to d2 style and split it into Seen (Base) subset and Unseen (Novel) subset.

python datasets/prepare_voc_sem_seg.py datasets/VOC2012

python tools/mask_cls_collect.py datasets/VOC2012/annotations_detectron2/train_base datasets/VOC2012/annotations_detectron2/train_base_label_count.json

python tools/mask_cls_collect.py datasets/VOC2012/annotations_detectron2/val datasets/VOC2012/annotations_detectron2/val_label_count.json

Training and Evaluation

Before training and evaluation, see the tutorial in detectron2. For example, to training a zero shot semantic segmentation model on COCO Stuff:

Training with manually designed prompts:

# single prompt
python train_net.py --config-file configs/coco-stuff-164k-156/zero_shot_maskformer_R101c_single_prompt_bs32_60k.yaml
# imagenet prompt
python train_net.py --config-file configs/coco-stuff-164k-156/zero_shot_maskformer_R101c_imagenet_prompt_bs32_60k.yaml
# vild prompt
python train_net.py --config-file configs/coco-stuff-164k-156/zero_shot_maskformer_R101c_vild_prompt_bs32_60k.yaml

Training with learned prompts:

# Training prompts
python train_net.py --config-file configs/coco-stuff-164k-156/zero_shot_proposal_classification_learn_prompt_bs32_10k.yaml --num-gpus 8 
# Training seg model
python train_net.py --config-file configs/coco-stuff-164k-156/zero_shot_maskformer_R101c_bs32_60k.yaml --num-gpus 8 MODEL.CLIP_ADAPTER.PROMPT_CHECKPOINT ${TRAINED_PROMPTS}

Note: the prompts training will be affected by the random seed. It is better to run it multiple times.

For evaluation, add --eval-only flag to the traing command.

Trained Model

For cross-dataset setting ( Train on seen classes and test on different datasets.) Trained Model

# TRAINED_MODEL_PATH: the path of your downlaoded model file.
# train on coco 156 class, test on other dataset
# Trained with 
# python train_net.py --resume --config-file configs/coco-stuff-164k-156/zero_shot_maskformer_R101c_vild_prompt_bs32_60k.yaml --num-gpus 8 MODEL.CLIP_ADAPTER.CLIP_ENSEMBLE True SOLVER.MAX_ITER 120000
# DATASET: the name of other datset, can be ade20k-150, ade20k-847, cityscapes-19, pcontext-59-59. Don't test on pascal voc as it overlaps with coco stuff largely.
python train_net.py --eval-only --resume --config-file configs/${DATASET}/cross_dataset_test_only.yaml --num-gpus 8 MODEL.WEIGHTS ${TRAINED_MODEL_PATH}

For zero-shot setting ( Train on seen classes and test on unseen classes of the same dataset.) Trained Model

# TRAINED_MODEL_PATH: the path of your downloaded model file.
# Trained with learned prompts
python train_net.py --eval-only --config-file configs/coco-stuff-164k-156/zero_shot_maskformer_R101c_bs32_60k.yaml --num-gpus 8 MODEL.WEIGHTST ${TRAINED_MODEL_PATH}

Note: For both setting, the model are only trained on coco stuff 156 classes for convenient.

For visualization

You can use https://github.com/facebookresearch/detectron2/blob/main/tools/visualize_json_results.py to visualize the segmentation result.
Other information:
- COCO Stuff thing and stuff classes split ( copied from https://github.com/nightrome/cocostuff)
- ADE20k thing and stuff classes split ( following the same definition in [1], we also provide a text file containing all stuff classes):

[1] COCO-Stuff: Thing and Stuff Classes in Context H. Caesar, J. Uijlings, V. Ferrari, In Computer Vision and Pattern Recognition (CVPR), 2018.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
configs		configs
datasets		datasets
mask_former		mask_former
mess		mess
resources		resources
third_party/CLIP		third_party/CLIP
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
train_net.py		train_net.py

License

blumenstiel/zsseg.baseline-MESS

Folders and files

Latest commit

History

Repository files navigation

Multi-domain Evaluation of Semantic Segmentation (MESS) with ZSSeg

Setup

Evaluation

--- Original ZSSeg README.md ---

[ECCV2022] A Simple Baseline for Open Vocabulary Semantic Segmentation with Pre-trained Vision-language Model

🎉[News] Our new work Side Adapter Network for Open-Vocabulary Semantic Segmentation was accepted as CVPR2023 hightlight paper. Code has been released.

Guideline

Enviroment

Data Preparation

Training and Evaluation

About

Resources

License

Stars

Watchers

Forks

Languages