Kaixin Cai*, Pengzhen Ren*, Jianhua Han, Yi Zhu, Hang Xu, Jianzhuang Liu, Xiaodan Liang📧
* equal contribution. 📧 corresponding author.
[arxiv.]
- 2026-03: Our MagicSeg has been officially accepted by TPAMI 2026 🎉🎉🎉.
- 2026-01: We have released the code of MagicSeg 🤗.
Each row shows the original image, counterfactual image, and corresponding mask:
| Original Image | Counterfactual Image | Mask |
|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
-
Training Code: Based on Zegclip
-
Generation Code: Pipeline for text, image, and mask generation(base on Grounded-SAM)
MaigcSeg/
├── example/ # Example data
│ ├── images/ # Original images
│ ├── images_neg/ # Negative images (filename + '-neg')
│ └── masks/ # Generated masks
├── generate/ # Generation pipeline
│ ├── generate_texts.py # Text generation using GPT
│ ├── generate_images.py # Image generation using SD1.5
│ └── mask_generate.py # Mask generation using GroundingDINO+SAM
└── train_code/ # Training framework
└── ZegCLIP-main/ # Modified ZegCLIP codebase
The training framework has been enhanced with the following features:
- File:
models/segmentor/zegclip.py - Function:
forward_train()method - Features:
- Extracts class names from image filenames (max 2 classes separated by '_')
- Samples additional classes to reach 100 total classes per image
- Constructs new text_feat with shape [bs, 100, dim]
- File:
models/decode_heads/decode_seg.py - Function: Added cosine similarity loss
- Formula:
max(0, cos(cls_token, cls_token_neg)) - Integration: Added to losses dictionary
bash dist_train.sh configs/magicseg/vpt_seg_fully_vit-b_512x512_20k_12_10.py Path/to/magicseg/fully
File: generate/generate_texts.py
python generate_texts.pyFile: generate/generate_images.py
python generate_images.pyFile: generate/mask_generate.py
ref to Grounded-SAM
ref to Zegclip
If you use MagicSeg in your research, please cite:
@misc{cai2026magicsegopenworldsegmentationpretraining,
title={MagicSeg: Open-World Segmentation Pretraining via Counterfactural Diffusion-Based Auto-Generation},
author={Kaixin Cai and Pengzhen Ren and Jianhua Han and Yi Zhu and Hang Xu and Jianzhuang Liu and Xiaodan Liang},
year={2026},
eprint={2603.19575},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2603.19575},
}
This project is built upon ZegCLIP, Grounged-Segment-Anything. Please refer to the original repository for licensing information.












