v0.1.2
Pre-release
Pre-release
What's Changed
- [misc] shift bytecheckpoint to optional dependency by @Luosuu in #92
- [misc] revert ckpt default to avoid internal exceptions by @Luosuu in #93
- [dist] minor fixes by @Luosuu in #94
- [misc] feat: add GITBUG ISSUE TEMPLETE by @Fazziekey in #95
- [data] feat: support megatron-energon dataset by @ziqi-wlb in #62
- [data] add interleaved dataset by @Coach257 in #90
- fix:remove a failing assertion by @KaijingOfficial in #97
- [config] clean gitignore by @Luosuu in #99
- [dist] fix: DCP auto load by @Luosuu in #106
- [model] fix: Switch qwen3 and seed_oss to veomni defined GradientCheckpointingLayer by @piyifan123 in #109
- [misc] feat: Add uv support to allow simple
uv syncbased python package management by @piyifan123 in #110 - [BREAKING][dist] feat: Unified dcp saving for model and optimizer by @Luosuu in #107
- [misc] feat: add skip_ulysses flag to bypass Ulysses logic in flash_attention_forward by @Juntian777 in #111
- [dist] fix: remove unnecessary assert by @Luosuu in #112
- [misc] feat: option to profile rank0 only or all the ranks by @Luosuu in #113
- [misc] fix: remove buggy memory timeline export by @Luosuu in #114
- [config] feat: add allow_cuda_launch_blocking by @Luosuu in #115
- [ckpt] fix: remove unnecessary path joining for dcp by @Luosuu in #121
- [ckpt][BREAKING] fix unnecessary wrapping for model and optimizer states by @Luosuu in #122
- fix: qwen2 vl yaml by @Ziyi-Wang in #127
- [data] fix :fix data collator for sp with cu_seq_lens_q and max_length_q by @Fazziekey in #126
- [data] fix: dataset call hdfs api by @Ziyi-Wang in #128
- [misc] fix: update asomeworks by @Fazziekey in #135
- [model] fix: remove patch for npu by @heidongxianhua in #134
- [dist] feat: faster weight loading through broadcasting from rank0 by @Luosuu in #123
- [data] feat: support correct cu_seqlens handling for SP and non-SP by @Juntian777 in #136
- [ckpt] fix: rank for get last iteraton for non-dcp path by @Luosuu in #140
- [model] fix: deepseek-v3 by @Luosuu in #139
- [model] fix: remove Qwen3-MoE redundant flashattention prep and fix input_ids access bug by @Juntian777 in #141
- [data] fix: remove hf dependency on prepare_fa_kwargs_from_position_ids by @Juntian777 in #144
- fix: wan_attnetion_missing_config_issue by @JeffryLee in #133
- [fsdp] feat: support broadcast large weight by chunk. by @ZZWHU in #142
- [core] fix: use flash_attention_2 backend by @KKZ20 in #124
New Contributors
- @ziqi-wlb made their first contribution in #62
- @KaijingOfficial made their first contribution in #97
- @Juntian777 made their first contribution in #111
- @Ziyi-Wang made their first contribution in #127
- @heidongxianhua made their first contribution in #134
- @JeffryLee made their first contribution in #133
- @ZZWHU made their first contribution in #142
Full Changelog: v0.1.1...v0.1.2