FlowDIS enables highly accurate foreground segmentation, optionally guided by a text prompt. When ambiguity prevents the model from producing the desired result, the user can specify which elements to retain in the foreground.
- [06/05/2026] 💻 Project page released.
- [06/05/2026] 📄 Paper has been released on arXiv.
- [21/02/2026] 🔥 FlowDIS has been accepted to CVPR 2026.
- Python 3.12 (the project is tested on 3.12)
- A CUDA-capable GPU (multi-GPU inference is supported)
Clone the repository and install the package:
git clone https://github.com/Picsart-AI-Research/FlowDIS
cd FlowDIS
pip install -e .Model weights are hosted on the Hugging Face Hub at PAIR/FlowDIS. They are downloaded automatically on first run and cached under ~/.cache/huggingface/hub.
To pre-download manually:
from flowdis.util import download_from_hf_hub
root_model_dir = download_from_hf_hub("PAIR/FlowDIS")
print(root_model_dir)Run inference on a directory of images. If multiple GPUs are available, the workload is automatically split across them.
python inference.py \
--images-dir /path/to/images \
--output-dir /path/to/output \
--prompts-json /path/to/prompts.json \
--num-steps 2 \
--resolution 1024To use local weights instead of auto-downloading, pass --root-model-dir:
python inference.py \
--root-model-dir /path/to/models \
--images-dir /path/to/images \
--output-dir /path/to/output| Argument | Required | Default | Description |
|---|---|---|---|
--images-dir |
yes | – | Directory of input images (.jpg, .jpeg, .png); searched recursively. |
--output-dir |
yes | – | Directory where predicted masks (.png) are written. |
--root-model-dir |
no | None |
Root directory of pre-downloaded weights. If omitted, weights are fetched from PAIR/FlowDIS on the Hugging Face Hub. |
--prompts-json |
no | None |
JSON mapping { "image_filename": "prompt" }. If omitted, empty prompts are used. |
--num-steps |
no | 2 |
Number of flow-matching sampling steps. |
--resolution |
no | 1024 |
Image resolution used for inference. |
--num-samples |
no | -1 |
Limit the number of images processed (-1 means all). |
{
"image_001.jpg": "a red sports car",
"image_002.png": "a golden retriever sitting on grass"
}Pre-generated language prompts for the DIS dataset are available here. Precomputed results for reproducing the paper can be downloaded here.
An interactive Gradio demo is included under demo/:
python demo/app.pyHardware requirements:
- At least 48 GB of GPU memory for inference at 1024×1024px.
- 80 GB of GPU memory is required for inference at higher resolutions (such as 2048x2048px).
from PIL import Image
from flowdis import flowdis_predict, load_models
models = load_models(device="cuda")
input_img_path = "path/to/input.jpg" # Input image path
output_mask_path = "path/to/output.png" # Path to save the output mask
image = Image.open(input_img_path).convert("RGB")
mask = flowdis_predict(
image=image,
prompt="", # Text prompt
models=models,
resolution=1024,
num_inference_steps=2,
device="cuda",
)
mask.save(output_mask_path)FlowDIS is licensed under the PicsArt Inc. FlowDIS Model License.
This project is built on top of FLUX.1 [schnell] and DIS5K.
If you use our work in your research, please cite our publication:
@article{sargsyan2026flowdis,
title={{FlowDIS: Language-Guided Dichotomous Image Segmentation with Flow Matching}},
author={Sargsyan, Andranik and Navasardyan, Shant},
journal={arXiv preprint arXiv:2605.05077},
year={2026},
eprint={2605.05077},
archivePrefix={arXiv},
url={https://arxiv.org/abs/2605.05077}
}