MARBLE: Multi-Aspect Reward BaLancE for Diffusion RL

Canyu Zhao¹, Hao Chen¹, Yunze Tong¹, Yu Qiao², Jiacheng Li², Chunhua Shen^1,3,✉

¹Zhejiang University ²Hithink ³Zhejiang University of Technology

TL;DR

MARBLE is a multi-reward RL fine-tuning framework for diffusion models that preserves per-reward gradient information end-to-end and combines them in gradient space. Built on DiffusionNFT, MARBLE simultaneously improves different dimensions within a single model, without sequential training.

Why scalar reward aggregation fails. Specialist rollouts — prompts that only carry signal for a subset of rewards (a "cat on sofa" image carries no OCR signal) — get their reward-specific advantage diluted by averaging. Weighted-sum and sequential fine-tuning systematically trade specialist rewards for general ones. MARBLE solves this in gradient space, not by reward engineering.

Status

🚧 Code release in progress. Inference + checkpoints first, training code later. ⭐ the repo to get notified.

Inference + 🤗 LoRA checkpoints
Training code, configs, multi-node scripts
MARBLE for Video Model

Highlights


🎯 One model, all rewards at once.	No sequential fine-tuning — joint training across all reward dimensions.
🧩 Specialist rewards survive.	OCR / GenEval no longer collapse under joint training.
⚡ ~1× single-reward cost.	Amortized MGDA + EMA-smoothed $\alpha$.
📈 Composite +1.12.	Best on every reward axis vs. all baselines (Table 1).

See the project page for the full results table, training-dynamics curves, EMA-decay $\rho$ ablation, qualitative comparisons and human-evaluation results.

Citation

If you find MARBLE useful, please cite:

@article{zhao2026marblemultiaspectrewardbalance,
      title={MARBLE: Multi-Aspect Reward Balance for Diffusion RL},
      author={Canyu Zhao and Hao Chen and Yunze Tong and Yu Qiao and Jiacheng Li and Chunhua Shen},
      year={2026},
      eprint={2605.06507},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2605.06507},
}

Acknowledgement

MARBLE builds on DiffusionNFT. We thank the authors for their excellent open-source releases.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MARBLE: Multi-Aspect Reward BaLancE for Diffusion RL

TL;DR

Status

Highlights

Citation

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

MARBLE: Multi-Aspect Reward BaLancE for Diffusion RL

TL;DR

Status

Highlights

Citation

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages