Skip to content

v0.1.10

Choose a tag to compare

@hjshi84 hjshi84 released this 21 May 07:21
· 46 commits to main since this release
6ab293e

What's Changed

  • [model] fix: dit + lora related fix by @Coach257 in #735
  • [docker] feat: add dockerfile and image for 9.0.0 cann base image by @phdddd in #742
  • [parallel, perf] fix: sp gather && optimize: input embeds fusing by @JorgenWan in #681
  • [lora] feat: add lora for veomni by @Coach257 in #739
  • [optim] feat: support muon by @FoolPlayer in #744
  • [model, misc] refactor: drop qwen3_5(_moe) vision encoder host-syncs by @TimYangst in #752
  • [model] feat: update seed_oss v5 by @Coach257 in #754
  • [config, ci, docker] feat: make transformers v5.2.0 the default install by @Luosuu in #751
  • [ci, model] test: add v5 loader-path test and glm_moe_dsa coverage by @TimYangst in #727
  • [dist, config, docs] feat: drop FSDP1 by @Luosuu in #756
  • [model] feat: add MoE router replay hook for RL training frameworks by @hjshi84 in #719
  • [model, ci] feat: deepseekv3_update v5 & clean v4 ci by @Coach257 in #755
  • [dist] fix: update dtensor_factory to use partial for local split in paralle… by @kahlun in #757
  • [docs] fix: fix docs for ascend by @phdddd in #738
  • [parallel] feat: add cpu offload for fsdp2 by @nono-Sang in #753
  • [model] fix: read tie_word_embeddings from inner text/decoder config by @TimYangst in #693
  • [model] feat: update qwen2.5omni & llama to v5 by @Coach257 in #767
  • [model, ci] test: add implicit CUDA sync gate for qwen3_5 generated modeling by @TimYangst in #760
  • [docs] fix: fix doc build by @FoolPlayer in #769
  • [BREAKING][ci, model] feat: cleanup v4 by @Coach257 in #768
  • [model, perf] fix: remove production-path CPU syncs from Qwen3.5 / Qwen3.5-MoE patches by @TimYangst in #762
  • [model, perf] fix: remove production-path CPU syncs from Qwen3-VL / VL-MoE / Omni-MoE patches by @TimYangst in #764
  • [release] chore: release v0.1.10 by @Luosuu in #759

Full Changelog: v0.1.9a5...v0.1.10