Official implementation of CAB-2 and CAB-3 for stable low-NFE sampling in diffusion and flow-matching models.
This repository contains modified implementations for:
-
HuggingFace Diffusers
https://github.com/huggingface/diffusers
CAB/
├── README.md
├── gaussian_diffusion.py
├── generate_cab.py
├── pipeline_qwenimage.py
├── qwen.py
├── sample.py
└── scheduling_cab.py
Used for EDM experiments.
Dataset:
https://www.cs.toronto.edu/~kriz/cifar.html
Used for EDM ImageNet-64 experiments.
Dataset:
https://image-net.org/download-images.php
Used for DiT experiments.
Dataset:
https://image-net.org/index.php
DiT preprocessing / setup:
https://github.com/facebookresearch/DiT
Used for Qwen-Image text-to-image evaluation.
Dataset:
https://cocodataset.org/#download
Captions:
https://cocodataset.org/#captions-2015
Used for HunyuanVideo text-to-video evaluation.
Repository:
https://github.com/evalcrafter/EvalCrafter
Paper:
https://arxiv.org/abs/2310.11440
Base repository:
In original EDM file use:
generate_cab.py
provided in this repository.
python generate_cab.py \
--outdir=out_ab2 \
--seeds=0-63 \
--batch=64 \
--order=2 \
--theta=0.9 \
--network=https://nvlabs-fi-cdn.nvidia.com/edm/pretrained/edm-cifar10-32x32-cond-vp.pklpython generate_cab.py \
--outdir=out_ab3 \
--seeds=0-63 \
--batch=64 \
--order=3 \
--theta=0.9 \
--network=https://nvlabs-fi-cdn.nvidia.com/edm/pretrained/edm-cifar10-32x32-cond-vp.pklBase repository:
https://github.com/facebookresearch/DiT
Inside the original DiT repository:
diffusion/gaussian_diffusion.py
with given:
gaussian_diffusion.py
from this repository.
sample.py
with given:
sample.py
from this repository.
python sample.py \
--image-size 256 \
--seed 1 \
--order 2 \
--theta 0.9python sample.py \
--image-size 256 \
--seed 1 \
--order 3 \
--theta 0.9Base repository:
https://github.com/huggingface/diffusers
Inside the Diffusers repository:
src/diffusers/schedulers/
add:
scheduling_cab.py
from this repository.
After minimal change in __init__.py the scheduler can then be imported as:
from diffusers import CABSchedulerInside:
src/diffusers/pipelines/qwenimage/
replace:
pipeline_qwenimage.py
with the modified version provided in this repository:
pipeline_qwenimage.py
Example script:
qwen.py
is provided in this repository.
import os
import sys
import torch
# Use local diffusers source
sys.path.insert(0, "path/to/diffusers/src")
from diffusers import QwenImagePipeline, CABScheduler
model_name = "Qwen/Qwen-Image"
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.bfloat16 if device == "cuda" else torch.float32
pipe = QwenImagePipeline.from_pretrained(
model_name,
torch_dtype=dtype,
)
pipe.scheduler = CABScheduler.from_config(
pipe.scheduler.config,
solver_order=2,
theta=0.2,
prediction_type="flow_prediction",
algorithm_type="cab",
use_flow_sigmas=True,
)
pipe = pipe.to(device)
prompt = "A beautiful music room."
image = pipe(
prompt=prompt,
width=1024,
height=1024,
num_inference_steps=10,
true_cfg_scale=4.0,
generator=torch.Generator(device=device).manual_seed(42),
).images[0]
os.makedirs("outputs", exist_ok=True)
image.save("outputs/cab_qwen_music_room.png")