Skip to content

v0.1.2

Pre-release
Pre-release

Choose a tag to compare

@Luosuu Luosuu released this 17 Oct 18:08
· 444 commits to main since this release
600fe6d

What's Changed

  • [misc] shift bytecheckpoint to optional dependency by @Luosuu in #92
  • [misc] revert ckpt default to avoid internal exceptions by @Luosuu in #93
  • [dist] minor fixes by @Luosuu in #94
  • [misc] feat: add GITBUG ISSUE TEMPLETE by @Fazziekey in #95
  • [data] feat: support megatron-energon dataset by @ziqi-wlb in #62
  • [data] add interleaved dataset by @Coach257 in #90
  • fix:remove a failing assertion by @KaijingOfficial in #97
  • [config] clean gitignore by @Luosuu in #99
  • [dist] fix: DCP auto load by @Luosuu in #106
  • [model] fix: Switch qwen3 and seed_oss to veomni defined GradientCheckpointingLayer by @piyifan123 in #109
  • [misc] feat: Add uv support to allow simple uv sync based python package management by @piyifan123 in #110
  • [BREAKING][dist] feat: Unified dcp saving for model and optimizer by @Luosuu in #107
  • [misc] feat: add skip_ulysses flag to bypass Ulysses logic in flash_attention_forward by @Juntian777 in #111
  • [dist] fix: remove unnecessary assert by @Luosuu in #112
  • [misc] feat: option to profile rank0 only or all the ranks by @Luosuu in #113
  • [misc] fix: remove buggy memory timeline export by @Luosuu in #114
  • [config] feat: add allow_cuda_launch_blocking by @Luosuu in #115
  • [ckpt] fix: remove unnecessary path joining for dcp by @Luosuu in #121
  • [ckpt][BREAKING] fix unnecessary wrapping for model and optimizer states by @Luosuu in #122
  • fix: qwen2 vl yaml by @Ziyi-Wang in #127
  • [data] fix :fix data collator for sp with cu_seq_lens_q and max_length_q by @Fazziekey in #126
  • [data] fix: dataset call hdfs api by @Ziyi-Wang in #128
  • [misc] fix: update asomeworks by @Fazziekey in #135
  • [model] fix: remove patch for npu by @heidongxianhua in #134
  • [dist] feat: faster weight loading through broadcasting from rank0 by @Luosuu in #123
  • [data] feat: support correct cu_seqlens handling for SP and non-SP by @Juntian777 in #136
  • [ckpt] fix: rank for get last iteraton for non-dcp path by @Luosuu in #140
  • [model] fix: deepseek-v3 by @Luosuu in #139
  • [model] fix: remove Qwen3-MoE redundant flashattention prep and fix input_ids access bug by @Juntian777 in #141
  • [data] fix: remove hf dependency on prepare_fa_kwargs_from_position_ids by @Juntian777 in #144
  • fix: wan_attnetion_missing_config_issue by @JeffryLee in #133
  • [fsdp] feat: support broadcast large weight by chunk. by @ZZWHU in #142
  • [core] fix: use flash_attention_2 backend by @KKZ20 in #124

New Contributors

Full Changelog: v0.1.1...v0.1.2