Release v0.1.8 · ByteDance-Seed/VeOmni

This release significantly expands VeOmni’s model coverage, distributed training capabilities, and training workflow support since v0.1.7. Major updates include broader Qwen family support with Qwen3.5 and Transformers v5 integrations, the introduction of Extra Parallel and multiple EP/FSDP/SP improvements, new DPO and DiT/WAN training workflows, stronger MoE and checkpoint compatibility, and much deeper test, CI, and documentation coverage across both GPU and NPU environments.

Highlights

Model Coverage & Training Workflows

Added Qwen3.5 MoE language support and extended Qwen3.5 ViT coverage, significantly broadening the Qwen3.5 training path (#547, #552, #602).
Upgraded Qwen2 and Qwen2-VL integrations to the Transformers v5 stack (#526, #543).
Added DPOTrainer together with Qwen3 DPO configs and examples (#558, #583).
Added support for DiT / WAN training workflows, including new task entrypoints, configs, and examples (#570).
Fixed freeze_vit behavior for Qwen3.5 models (#616).
Improved compatibility for DeepSeek V3 rollout and FA4-related paths (#609, #582).
Added FLOPs counting support for qwen3_5 and qwen3_5_moe (#561).
Improved hub-kernel loader compatibility with Transformers v5.3.0+ (#633).

Distributed Training & Parallelism

Introduced Extra Parallel, a breaking upgrade to VeOmni’s distributed parallel abstraction (#429).
Added mixed_precision support in fsdp_config and CPU parameter loading for FSDP1 (#627, #612).
Extended EP support to merged FC1 and quack GEMM backends, and improved EP/FSDP behavior for expert modules (#588, #577).
Added a fused Triton kernel for MoE load-balancing loss, improving distributed MoE training efficiency (#560).
Improved Sequence Parallel stability with a zero-division guard in ReduceLoss and new roll_with_sequence_parallel utilities (#618, #608).

Data Pipeline

Added worker-side multi-source dynamic batching resume, a breaking change for resume behavior in multi-source data loading (#603).
Added support for non-divisible frame alignment with frame_factor_remainder and improved video preprocessing behavior (#587, #585).
Improved data transform robustness for source_name handling and multimodal edge cases (#553, #554, #550).

Checkpointing & Compatibility

Added runtime checkpoint tensor conversion for Qwen3-MoE Transformers v5 fused expert weights (#589).
Added a DCP consolidation patch for HDFS FUSE compatibility and fixed duplicate checkpoint saves (#536, #595).
Preserved tokenizer config in the MoE merge script and fixed checkpoint writer filtering logic (#622, #593).

Testing, CI & Documentation

Added dummy forward and FSDP equivalence tests to improve distributed correctness coverage (#620).
Expanded and aligned NPU CI coverage with GPU, including additional unit and end-to-end cases (#566, #567, #597, #623).
Added stronger testing documentation and consolidated test helpers (#631).
Updated Ascend TorchCodec installation docs and added a helper installation script (#613).
Added a CI verifier to ensure patchgen outputs stay in sync (#559).

Breaking Changes

Extra Parallel is now the main abstraction for expert-style parallelism, with follow-up API/doc updates from ep_plan to extra_parallel_plan (#429, #579).
Worker-side multi-source dynamic batching resume changes resume semantics and should be validated carefully before upgrading existing jobs (#603).

New Contributors

@hjshi84 made their first contribution in #536
@JorgenWan made their first contribution in #429
@deerlu made their first contribution in #552
@nono-Sang made their first contribution in #611
@xzzWZY made their first contribution in #595
@whisylan made their first contribution in #608
@cls1206 made their first contribution in #623

Full Changelog: v0.1.7...v0.1.8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.1.8

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Highlights

Model Coverage & Training Workflows

Distributed Training & Parallelism

Data Pipeline

Checkpointing & Compatibility

Testing, CI & Documentation

Breaking Changes

New Contributors

Contributors

Uh oh!