v0.1.10
What's Changed
- [model] fix: dit + lora related fix by @Coach257 in #735
- [docker] feat: add dockerfile and image for 9.0.0 cann base image by @phdddd in #742
- [parallel, perf] fix: sp gather && optimize: input embeds fusing by @JorgenWan in #681
- [lora] feat: add lora for veomni by @Coach257 in #739
- [optim] feat: support muon by @FoolPlayer in #744
- [model, misc] refactor: drop qwen3_5(_moe) vision encoder host-syncs by @TimYangst in #752
- [model] feat: update seed_oss v5 by @Coach257 in #754
- [config, ci, docker] feat: make transformers v5.2.0 the default install by @Luosuu in #751
- [ci, model] test: add v5 loader-path test and glm_moe_dsa coverage by @TimYangst in #727
- [dist, config, docs] feat: drop FSDP1 by @Luosuu in #756
- [model] feat: add MoE router replay hook for RL training frameworks by @hjshi84 in #719
- [model, ci] feat: deepseekv3_update v5 & clean v4 ci by @Coach257 in #755
- [dist] fix: update dtensor_factory to use partial for local split in paralle… by @kahlun in #757
- [docs] fix: fix docs for ascend by @phdddd in #738
- [parallel] feat: add cpu offload for fsdp2 by @nono-Sang in #753
- [model] fix: read tie_word_embeddings from inner text/decoder config by @TimYangst in #693
- [model] feat: update qwen2.5omni & llama to v5 by @Coach257 in #767
- [model, ci] test: add implicit CUDA sync gate for qwen3_5 generated modeling by @TimYangst in #760
- [docs] fix: fix doc build by @FoolPlayer in #769
- [BREAKING][ci, model] feat: cleanup v4 by @Coach257 in #768
- [model, perf] fix: remove production-path CPU syncs from Qwen3.5 / Qwen3.5-MoE patches by @TimYangst in #762
- [model, perf] fix: remove production-path CPU syncs from Qwen3-VL / VL-MoE / Omni-MoE patches by @TimYangst in #764
- [release] chore: release v0.1.10 by @Luosuu in #759
Full Changelog: v0.1.9a5...v0.1.10