Release v0.1.2 · ByteDance-Seed/VeOmni

What's Changed

[misc] shift bytecheckpoint to optional dependency by @Luosuu in #92
[misc] revert ckpt default to avoid internal exceptions by @Luosuu in #93
[dist] minor fixes by @Luosuu in #94
[misc] feat: add GITBUG ISSUE TEMPLETE by @Fazziekey in #95
[data] feat: support megatron-energon dataset by @ziqi-wlb in #62
[data] add interleaved dataset by @Coach257 in #90
fix:remove a failing assertion by @KaijingOfficial in #97
[config] clean gitignore by @Luosuu in #99
[dist] fix: DCP auto load by @Luosuu in #106
[model] fix: Switch qwen3 and seed_oss to veomni defined GradientCheckpointingLayer by @piyifan123 in #109
[misc] feat: Add uv support to allow simple uv sync based python package management by @piyifan123 in #110
[BREAKING][dist] feat: Unified dcp saving for model and optimizer by @Luosuu in #107
[misc] feat: add skip_ulysses flag to bypass Ulysses logic in flash_attention_forward by @Juntian777 in #111
[dist] fix: remove unnecessary assert by @Luosuu in #112
[misc] feat: option to profile rank0 only or all the ranks by @Luosuu in #113
[misc] fix: remove buggy memory timeline export by @Luosuu in #114
[config] feat: add allow_cuda_launch_blocking by @Luosuu in #115
[ckpt] fix: remove unnecessary path joining for dcp by @Luosuu in #121
[ckpt][BREAKING] fix unnecessary wrapping for model and optimizer states by @Luosuu in #122
fix: qwen2 vl yaml by @Ziyi-Wang in #127
[data] fix :fix data collator for sp with cu_seq_lens_q and max_length_q by @Fazziekey in #126
[data] fix: dataset call hdfs api by @Ziyi-Wang in #128
[misc] fix: update asomeworks by @Fazziekey in #135
[model] fix: remove patch for npu by @heidongxianhua in #134
[dist] feat: faster weight loading through broadcasting from rank0 by @Luosuu in #123
[data] feat: support correct cu_seqlens handling for SP and non-SP by @Juntian777 in #136
[ckpt] fix: rank for get last iteraton for non-dcp path by @Luosuu in #140
[model] fix: deepseek-v3 by @Luosuu in #139
[model] fix: remove Qwen3-MoE redundant flashattention prep and fix input_ids access bug by @Juntian777 in #141
[data] fix: remove hf dependency on prepare_fa_kwargs_from_position_ids by @Juntian777 in #144
fix: wan_attnetion_missing_config_issue by @JeffryLee in #133
[fsdp] feat: support broadcast large weight by chunk. by @ZZWHU in #142
[core] fix: use flash_attention_2 backend by @KKZ20 in #124

New Contributors

@ziqi-wlb made their first contribution in #62
@KaijingOfficial made their first contribution in #97
@Juntian777 made their first contribution in #111
@Ziyi-Wang made their first contribution in #127
@heidongxianhua made their first contribution in #134
@JeffryLee made their first contribution in #133
@ZZWHU made their first contribution in #142

Full Changelog: v0.1.1...v0.1.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.1.2

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Changed

New Contributors

Contributors

Uh oh!