v0.1.11

Latest

Latest

Luosuu released this 26 May 07:42

· 32 commits to main since this release

f90b3dc

Highlights

topk fused kernel for OPD
precompute metadata in dataloader for VLM
Qwen-Image support with ckpt/lora/inference
rework moe load balance monitor
NPU OpSlot fixes

What's Changed

[model] feat: support qwen-image by @FoolPlayer in #770
[model, data, trainer] feat: precompute multimodal forward metadata in dataloader by @TimYangst in #772
[model, ops] feat: chunked fused-linear top-k forward-KL distillation kernel by @Luosuu in #771
[omni] chore: add Omni Molde inference script by @FoolPlayer in #777
[model, data, ci] refactor: collapse multimodal metadata to a single grid_thw key by @TimYangst in #778
[parallel] feat: disable HSDP gradient all-reduce during gradient accumulation by @nono-Sang in #781
[ops, model] refactor: shorten loss-wrapper return to (loss, logits, fused_linear_aux) by @Luosuu in #780
[model, ops] refactor: add NPU support and OpSlot guard for Qwen3/VL/MoE, Qwen3.5/MoE by @yanghw116 in #710
[model, omni] feat: add Qwen-Image lora config by @FoolPlayer in #784
[model, ci, agent] feat: wire qwen2-family ViT to the multimodal metadata precompute hook by @TimYangst in #779
[trainer, ops] feat: rework MoE load-balance monitor (model-agnostic, EP/DP-aware) by @Luosuu in #787
[ckpt, lora] feat: Save lora ckpt and add omni-infer with lora by @FoolPlayer in #785
[model, omni] feat: Update Qwen-Image & Add veomni fsdp state API by @FoolPlayer in #786
[model, ops] fix: complete #780 3-tuple loss-wrapper migration by @TimYangst in #790
[release] chore: bump version to 0.1.11 by @Luosuu in #792

New Contributors

@yanghw116 made their first contribution in #710

Full Changelog: v0.1.10...v0.1.11

Contributors

TimYangst, nono-Sang, and 3 other contributors

Assets 2