Multi-domain Evaluation of Semantic Segmentation (MESS) with X-Decoder

This directory contains the code for the MESS evaluation of X-Decoder. Please see the commits for our changes of the model.

Setup

Create a conda environment xdecoder and install the required packages. See mess/README.md for details.

 bash mess/setup_env.sh

Prepare the datasets by following the instructions in mess/DATASETS.md. The xdecoder env can be used for the dataset preparation. If you evaluate multiple models with MESS, you can change the dataset_dir argument and the DETECTRON2_DATASETS environment variable to a common directory (see mess/DATASETS.md and mess/eval.sh, e.g., ../mess_datasets).

Download the X-Decoder weights (see https://eval.ai/web/challenges/challenge-page/1931/overview). Note that the Focal-Tiny model is the official model from the paper. The Focal-Large model was released by the authors in their SEEM project and differs from the large, non-public model in the X-Decoder paper.

mkdir weights
wget https://huggingface.co/xdecoder/X-Decoder/resolve/main/xdecoder_focalt_best_openseg.pt -O weights/xdecoder_focalt_best_openseg.pt
wget https://huggingface.co/xdecoder/X-Decoder/resolve/main/xdecoder_focall_last.pt -O weights/xdecoder_focall_last.pt

Evaluation

To evaluate the X-Decoder model on the MESS datasets, run

bash mess/eval.sh

# for evaluation in the background:
nohup bash mess/eval.sh > eval.log &
tail -f eval.log

For evaluating a single dataset, select the DATASET from mess/DATASETS.md, the DETECTRON2_DATASETS path, and run

conda activate xdecoder
export DETECTRON2_DATASETS="datasets"
DATASET=<dataset_name>

# Tiny model
python eval.py evaluate --conf_files configs/xdecoder/svlp_focalt_lang.yaml  --config_overrides {\"WEIGHT\":\"weights/xdecoder_focalt_best_openseg.pt\", \"DATASETS.TEST\":[\"$DATASET\"], \"SAVE_DIR\":\"output/xDecoder_tiny/$DATASET\"}
# Large model (not official model from paper)
python eval.py evaluate --conf_files configs/xdecoder/focall_lang.yaml  --config_overrides {\"WEIGHT\":\"weights/xdecoder_focall_last.pt\", \"DATASETS.TEST\":[\"$DATASET\"], \"SAVE_DIR\":\"output/xDecoder_large/$DATASET\"}

--- Original X-Decoder README.md ---

X-Decoder: Generalized Decoding for Pixel, Image, and Language (CVPR2023)

[Project Page] [Paper] [HuggingFace All-in-One Demo] [HuggingFace Instruct Demo] [Video]

by Xueyan Zou*, Zi-Yi Dou*, Jianwei Yang*, Zhe Gan, Linjie Li, Chunyuan Li, Xiyang Dai, Harkirat Behl, Jianfeng Wang, Lu Yuan, Nanyun Peng, Lijuan Wang, Yong Jae Lee^, Jianfeng Gao^.

🔥 News

[2023.04.14] We are releasing SEEM, a new universal interactive interface for image segmentation! You can use it for any segmentation tasks, way beyond what X-Decoder can do!

[2023.03.20] As an aspiration of our X-Decoder, we developed OpenSeeD ([Paper][Code]) to enable open-vocabulary segmentation and detection with a single model, Check it out!
[2023.03.14] We release X-GPT which is an conversational version of our X-Decoder through GPT-3 langchain!
[2023.03.01] The Segmentation in the Wild Challenge had been launched and ready for submitting results!
[2023.02.28] We released the SGinW benchmark for our challenge. Welcome to build your own models on the benchmark!
[2023.02.27] Our X-Decoder has been accepted by CVPR 2023!
[2023.02.07] We combine X-Decoder (strong image understanding), GPT-3 (strong language understanding) and Stable Diffusion (strong image generation) to make an instructional image editing demo, check it out!
[2022.12.21] We release inference code of X-Decoder.
[2022.12.21] We release Focal-T pretrained checkpoint.
[2022.12.21] We release open-vocabulary segmentation benchmark.

🖌️ DEMO

🔺[X-GPT] 🔺[Instruct X-Decoder]

🎶 Introduction

X-Decoder is a generalized decoding model that can generate pixel-level segmentation and token-level texts seamlessly!

It achieves:

State-of-the-art results on open-vocabulary segmentation and referring segmentation on eight datasets;
Better or competitive finetuned performance to generalist and specialist models on segmentation and VL tasks;
Friendly for efficient finetuning and flexible for novel task composition.

It supports:

One suite of parameters pretrained for Semantic/Instance/Panoptic Segmentation, Referring Segmentation, Image Captioning, and Image-Text Retrieval;
One model architecture finetuned for Semantic/Instance/Panoptic Segmentation, Referring Segmentation, Image Captioning, Image-Text Retrieval and Visual Question Answering (with an extra cls head);
Zero-shot task composition for Region Retrieval, Referring Captioning, Image Editing.

Getting Started

Installation

pip3 install torch==1.13.1 torchvision==0.14.1 --extra-index-url https://download.pytorch.org/whl/cu113
python -m pip install 'git+https://github.com/MaureenZOU/detectron2-xyz.git'
pip install git+https://github.com/cocodataset/panopticapi.git
python -m pip install -r requirements.txt
sh install_cococapeval.sh
export DATASET=/pth/to/dataset

To prepare the dataset: DATASET.md

Open Vocabulary Segmentation

mpirun -n 8 python eval.py evaluate --conf_files configs/xdecoder/svlp_focalt_lang.yaml  --overrides WEIGHT /pth/to/ckpt

Note: Due to zero-padding, filling a single gpu with multiple images may decrease the performance.

Inference Demo

# For Segmentation Tasks
python demo/demo_semseg.py evaluate --conf_files configs/xdecoder/svlp_focalt_lang.yaml  --overrides WEIGHT /pth/to/xdecoder_focalt_best_openseg.pt
# For VL Tasks
python demo/demo_captioning.py evaluate --conf_files configs/xdecoder/svlp_focalt_lang.yaml  --overrides WEIGHT /pth/to/xdecoder_focalt_last_novg.pt

Model Zoo

		ADE			ADE-full	SUN	SCAN		SCAN40	Cityscape			BDD
model	ckpt	PQ	AP	mIoU	mIoU	mIoU	PQ	mIoU	mIoU	PQ	mAP	mIoU	PQ	mIoU
X-Decoder	BestSeg Tiny	19.1	10.1	25.1	6.2	35.7	30.3	38.4	22.4	37.7	18.5	50.2	16.9	47.6

X-Decoder NoVG Tiny
X-Decoder Last Tiny

Additional Results

Finetuned ADE 150 (32 epochs)

Model	Task	Log	PQ	mAP	mIoU
X-Decoder (davit-d5,Deformable)	PanoSeg	log	52.4	38.7	59.1

Acknowledgement

We appreciate the contructive dicussion with Haotian Zhang
We build our work on top of Mask2Former
We build our demos on HuggingFace 🤗 with sponsored GPUs
We appreciate the discussion with Xiaoyu Xiang during rebuttal

Citation

@article{zou2022xdecoder,
  author      = {Zou*, Xueyan and Dou*, Zi-Yi and Yang*, Jianwei and Gan, Zhe and Li, Linjie and Li, Chunyuan and Dai, Xiyang and Wang, Jianfeng and Yuan, Lu and Peng, Nanyun and Wang, Lijuan and Lee*, Yong Jae and Gao*, Jianfeng},
  title       = {Generalized Decoding for Pixel, Image and Language},
  publisher   = {arXiv},
  year        = {2022},
}

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
configs/xdecoder		configs/xdecoder
datasets		datasets
demo		demo
images		images
mess		mess
utils		utils
xdecoder		xdecoder
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
DATASET.md		DATASET.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
__init__.py		__init__.py
eval.py		eval.py
install_cococapeval.sh		install_cococapeval.sh
requirements.txt		requirements.txt

License

blumenstiel/X-Decoder-MESS

Folders and files

Latest commit

History

Repository files navigation

Multi-domain Evaluation of Semantic Segmentation (MESS) with X-Decoder

Setup

Evaluation

--- Original X-Decoder README.md ---

X-Decoder: Generalized Decoding for Pixel, Image, and Language (CVPR2023)

🔥 News

🖌️ DEMO

🎶 Introduction

Getting Started

Installation

Open Vocabulary Segmentation

Inference Demo

Model Zoo

Additional Results

Acknowledgement

Citation

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Languages