v0.13.1
This release brings support for PyTorch 2.9.1, expands our ecosystem with new experiment trackers (SwanLab and Trackio), and introduces support for a wide range of new models including Olmo3, Ministral 3, InternVL 3.5, and Kimi. We’ve also included significant improvements to quantization workflows and metrics logging.
🎉 New Features
Expanded Model Support
We’ve added support for more models!
- Olmo3: including Olmo and Olmo2. (#3275 by @NanoCode012)
- Ministral 3 (#3297, #3300 by @NanoCode012)
- InternVL 3.5: (#3141 by @NanoCode012)
- Kimi: using experimental training code. (#3257 by @NanoCode012)
- Trinity: by ArceeAI. (#3292 by @NanoCode012)
- Exaone 4: (#3279 by @nayohan)
- MiMo & Plano: (#3332 by @NanoCode012)
New Experiment Tracking Integrations
- SwanLab: You can now use SwanLab for experiment tracking. (#3334 by @PraMamba)
- Trackio: Added Trackio validation integration. (#3253 by @abidlabs)
Training & PEFT Improvements
- Liger Kernel for DPO: added liger support kernel for DPO training. (#3302 by @ved1beta)
- Distributed Muon: Added support for distributed Muon optimizer. (#3264 by @salmanmohammadi)
- Weight Tying Safety: Added
peft_ensure_weight_tyingto ensure correct parameter handling in PEFT. (#3278 by @NanoCode012) - Adapter Dtypes: Added
peft_autocast_adapter_dtypeconfig option for fine-grained control. (#3311 by @xzuyn) - Cheap PPL Metric: A new metric calculation for Perplexity that is less computationally expensive. (#3317 by @xzuyn)
- Scaled Softmax: Scales softmax calculation by
s * log(n) + b. (#3338 by @ved1beta )
⚠️ Deprecations & Warnings
PyTorch 2.7.1 Deprecation
Support for PyTorch 2.7.1 has been deprecated. We recommend upgrading to newer supported versions.
🔧 Fixes & Improvements
Quantization & CLI
- Save Processor: The quantizer CLI now properly saves the processor alongside the model. (#3290 by @salmanmohammadi)
- FP8 Checks: Fixed checks for FP8 capability and
load_in_8bitconfigurations. (#3324, #3327) - NVFP4 Configs: Added QAT NVFP4 configs for reference. (#3280 by @salmanmohammadi)
Logging & Metrics
- Metric Rounding: You can now set the env
AXOLOTL_METRIC_PRECISION=(5 default) to control the rounding of logged metrics. (#3325 by @ved1beta) - Token Logging: Fixed total/trainable tokens logging logic. (#3344 #3293 by @ved1beta)
- Evaluation Loss: Fixed evaluation loss calculation in the KD trainer. (#3271 by @roycho96)
Data Processing
- Long Sequence Handling: Feature added to raise an error on long sequence drops to prevent silent data loss. (#3321 by @kallewoof)
- Qwen3 Tokenization: Fixed an off-by-a-few-tokens issue in Qwen3 Jinja tokenization. (#3295 by @NanoCode012)
📦 Dependency & Infrastructure Updates
- PyTorch 2.9.1: Added base images and support for PyTorch 2.9.x and xformers wheels. (#3268, #3273, #3308 by @winglian)
- CUDA 13.0: Added initial test matrices and wheel support for CUDA 13.0. (#3343, #3342 by @winglian)
- Liger Kernel: Upgraded to 0.6.4. (#3289 by @NanoCode012)
- Pydantic: Upgraded to 2.12. (#3328 by @winglian)
- General Deps: End of year dependency upgrades. (#3299 by @winglian)
- Scikit-learn: Removed unused dependency
scikit-learn==1.4.2. (#3277 by @ved1beta) - Transformers and peft: Upgrade to 4.57.6 and 0.17.1 respectively (#3358 , #3361 by @winglian)
Others
- fix bin size by @ved1beta in #3307
- pre-commit hooks update. (#3287 by @github-actions[bot])
PYTORCH_CUDA_ALLOC_CONFdeprecation fix to ensure compatibility with future PyTorch versions. (#3313 by @NanoCode012)
- fix: Fix evaluation loss in KD trainer by @roycho96 in #3271
- Fix typo in densemixer RuntimeError by @bethrezen in #3327
- fix preview docs failing due to running out of disk by @winglian in #3326
- feature: raise on long sequence drop by @kallewoof in #3321
- feat: cleanup old flex mask patch, suppress Matmul bnb warn, and misc by @NanoCode012 in #3330
- docs for checkpiont saving by @ved1beta in #3335
- fix: gemma3_text model loading vision config by @NanoCode012 in #3354
- fix syntax for secrets in gha yaml by @winglian in #3355
- Update PR template by @salmanmohammadi in #3349
- fix amd64 and set 2.9.1 as latest cloud image by @winglian in #3356
- don't install deepspeed in arm64 images by @winglian in #3357
- don't install xformers in for arm64 by @winglian in #3359
- set version to v0.13.1 by @winglian in #3363
New Contributors
- @nayohan made their first contribution in #3279
- @roycho96 made their first contribution in #3271
- @bethrezen made their first contribution in #3327
- @abidlabs made their first contribution in #3253
- @PraMamba made their first contribution in #3334
- @1sand0s made their first contribution in #3346
Full Changelog: v0.13.0...v0.13.1