Skip to content

blumenstiel/X-Decoder-MESS

 
 

Repository files navigation

Multi-domain Evaluation of Semantic Segmentation (MESS) with X-Decoder

[Website] [arXiv] [GitHub]

This directory contains the code for the MESS evaluation of X-Decoder. Please see the commits for our changes of the model.

Setup

Create a conda environment xdecoder and install the required packages. See mess/README.md for details.

 bash mess/setup_env.sh

Prepare the datasets by following the instructions in mess/DATASETS.md. The xdecoder env can be used for the dataset preparation. If you evaluate multiple models with MESS, you can change the dataset_dir argument and the DETECTRON2_DATASETS environment variable to a common directory (see mess/DATASETS.md and mess/eval.sh, e.g., ../mess_datasets).

Download the X-Decoder weights (see https://eval.ai/web/challenges/challenge-page/1931/overview). Note that the Focal-Tiny model is the official model from the paper. The Focal-Large model was released by the authors in their SEEM project and differs from the large, non-public model in the X-Decoder paper.

mkdir weights
wget https://huggingface.co/xdecoder/X-Decoder/resolve/main/xdecoder_focalt_best_openseg.pt -O weights/xdecoder_focalt_best_openseg.pt
wget https://huggingface.co/xdecoder/X-Decoder/resolve/main/xdecoder_focall_last.pt -O weights/xdecoder_focall_last.pt

Evaluation

To evaluate the X-Decoder model on the MESS datasets, run

bash mess/eval.sh

# for evaluation in the background:
nohup bash mess/eval.sh > eval.log &
tail -f eval.log 

For evaluating a single dataset, select the DATASET from mess/DATASETS.md, the DETECTRON2_DATASETS path, and run

conda activate xdecoder
export DETECTRON2_DATASETS="datasets"
DATASET=<dataset_name>

# Tiny model
python eval.py evaluate --conf_files configs/xdecoder/svlp_focalt_lang.yaml  --config_overrides {\"WEIGHT\":\"weights/xdecoder_focalt_best_openseg.pt\", \"DATASETS.TEST\":[\"$DATASET\"], \"SAVE_DIR\":\"output/xDecoder_tiny/$DATASET\"}
# Large model (not official model from paper)
python eval.py evaluate --conf_files configs/xdecoder/focall_lang.yaml  --config_overrides {\"WEIGHT\":\"weights/xdecoder_focall_last.pt\", \"DATASETS.TEST\":[\"$DATASET\"], \"SAVE_DIR\":\"output/xDecoder_large/$DATASET\"}

--- Original X-Decoder README.md ---

X-Decoder: Generalized Decoding for Pixel, Image, and Language (CVPR2023)

[Project Page] [Paper] [HuggingFace All-in-One Demo] [HuggingFace Instruct Demo] [Video]

by Xueyan Zou*, Zi-Yi Dou*, Jianwei Yang*, Zhe Gan, Linjie Li, Chunyuan Li, Xiyang Dai, Harkirat Behl, Jianfeng Wang, Lu Yuan, Nanyun Peng, Lijuan Wang, Yong Jae Lee^, Jianfeng Gao^.

🔥 News

  • [2023.04.14] We are releasing SEEM, a new universal interactive interface for image segmentation! You can use it for any segmentation tasks, way beyond what X-Decoder can do!

  • [2023.03.20] As an aspiration of our X-Decoder, we developed OpenSeeD ([Paper][Code]) to enable open-vocabulary segmentation and detection with a single model, Check it out!
  • [2023.03.14] We release X-GPT which is an conversational version of our X-Decoder through GPT-3 langchain!
  • [2023.03.01] The Segmentation in the Wild Challenge had been launched and ready for submitting results!
  • [2023.02.28] We released the SGinW benchmark for our challenge. Welcome to build your own models on the benchmark!
  • [2023.02.27] Our X-Decoder has been accepted by CVPR 2023!
  • [2023.02.07] We combine X-Decoder (strong image understanding), GPT-3 (strong language understanding) and Stable Diffusion (strong image generation) to make an instructional image editing demo, check it out!
  • [2022.12.21] We release inference code of X-Decoder.
  • [2022.12.21] We release Focal-T pretrained checkpoint.
  • [2022.12.21] We release open-vocabulary segmentation benchmark.

🖌️ DEMO

🔺[X-GPT] 🔺[Instruct X-Decoder]

demo

🎶 Introduction

github_figure

X-Decoder is a generalized decoding model that can generate pixel-level segmentation and token-level texts seamlessly!

It achieves:

  • State-of-the-art results on open-vocabulary segmentation and referring segmentation on eight datasets;
  • Better or competitive finetuned performance to generalist and specialist models on segmentation and VL tasks;
  • Friendly for efficient finetuning and flexible for novel task composition.

It supports:

  • One suite of parameters pretrained for Semantic/Instance/Panoptic Segmentation, Referring Segmentation, Image Captioning, and Image-Text Retrieval;
  • One model architecture finetuned for Semantic/Instance/Panoptic Segmentation, Referring Segmentation, Image Captioning, Image-Text Retrieval and Visual Question Answering (with an extra cls head);
  • Zero-shot task composition for Region Retrieval, Referring Captioning, Image Editing.

Getting Started

Installation

pip3 install torch==1.13.1 torchvision==0.14.1 --extra-index-url https://download.pytorch.org/whl/cu113
python -m pip install 'git+https://github.com/MaureenZOU/detectron2-xyz.git'
pip install git+https://github.com/cocodataset/panopticapi.git
python -m pip install -r requirements.txt
sh install_cococapeval.sh
export DATASET=/pth/to/dataset

To prepare the dataset: DATASET.md

Open Vocabulary Segmentation

mpirun -n 8 python eval.py evaluate --conf_files configs/xdecoder/svlp_focalt_lang.yaml  --overrides WEIGHT /pth/to/ckpt

Note: Due to zero-padding, filling a single gpu with multiple images may decrease the performance.

Inference Demo

# For Segmentation Tasks
python demo/demo_semseg.py evaluate --conf_files configs/xdecoder/svlp_focalt_lang.yaml  --overrides WEIGHT /pth/to/xdecoder_focalt_best_openseg.pt
# For VL Tasks
python demo/demo_captioning.py evaluate --conf_files configs/xdecoder/svlp_focalt_lang.yaml  --overrides WEIGHT /pth/to/xdecoder_focalt_last_novg.pt

Model Zoo

ADE ADE-full SUN SCAN SCAN40 Cityscape BDD
model ckpt PQ AP mIoU mIoU mIoU PQ mIoU mIoU PQ mAP mIoU PQ mIoU
X-Decoder BestSeg Tiny 19.1 10.1 25.1 6.2 35.7 30.3 38.4 22.4 37.7 18.5 50.2 16.9 47.6

Additional Results

  • Finetuned ADE 150 (32 epochs)
Model Task Log PQ mAP mIoU
X-Decoder (davit-d5,Deformable) PanoSeg log 52.4 38.7 59.1

Acknowledgement

  • We appreciate the contructive dicussion with Haotian Zhang
  • We build our work on top of Mask2Former
  • We build our demos on HuggingFace 🤗 with sponsored GPUs
  • We appreciate the discussion with Xiaoyu Xiang during rebuttal

Citation

@article{zou2022xdecoder,
  author      = {Zou*, Xueyan and Dou*, Zi-Yi and Yang*, Jianwei and Gan, Zhe and Li, Linjie and Li, Chunyuan and Dai, Xiyang and Wang, Jianfeng and Yuan, Lu and Peng, Nanyun and Wang, Lijuan and Lee*, Yong Jae and Gao*, Jianfeng},
  title       = {Generalized Decoding for Pixel, Image and Language},
  publisher   = {arXiv},
  year        = {2022},
}

About

MESS Evaluation of X-Decoder

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.6%
  • TeX 1.1%
  • Shell 0.3%