Highlights
- topk fused kernel for OPD
- precompute metadata in dataloader for VLM
- Qwen-Image support with ckpt/lora/inference
- rework moe load balance monitor
- NPU OpSlot fixes
What's Changed
- [model] feat: support qwen-image by @FoolPlayer in #770
- [model, data, trainer] feat: precompute multimodal forward metadata in dataloader by @TimYangst in #772
- [model, ops] feat: chunked fused-linear top-k forward-KL distillation kernel by @Luosuu in #771
- [omni] chore: add Omni Molde inference script by @FoolPlayer in #777
- [model, data, ci] refactor: collapse multimodal metadata to a single grid_thw key by @TimYangst in #778
- [parallel] feat: disable HSDP gradient all-reduce during gradient accumulation by @nono-Sang in #781
- [ops, model] refactor: shorten loss-wrapper return to (loss, logits, fused_linear_aux) by @Luosuu in #780
- [model, ops] refactor: add NPU support and OpSlot guard for Qwen3/VL/MoE, Qwen3.5/MoE by @yanghw116 in #710
- [model, omni] feat: add Qwen-Image lora config by @FoolPlayer in #784
- [model, ci, agent] feat: wire qwen2-family ViT to the multimodal metadata precompute hook by @TimYangst in #779
- [trainer, ops] feat: rework MoE load-balance monitor (model-agnostic, EP/DP-aware) by @Luosuu in #787
- [ckpt, lora] feat: Save lora ckpt and add omni-infer with lora by @FoolPlayer in #785
- [model, omni] feat: Update Qwen-Image & Add veomni fsdp state API by @FoolPlayer in #786
- [model, ops] fix: complete #780 3-tuple loss-wrapper migration by @TimYangst in #790
- [release] chore: bump version to 0.1.11 by @Luosuu in #792
New Contributors
- @yanghw116 made their first contribution in #710
Full Changelog: v0.1.10...v0.1.11