Release v0.13.1 · axolotl-ai-cloud/axolotl

This release brings support for PyTorch 2.9.1, expands our ecosystem with new experiment trackers (SwanLab and Trackio), and introduces support for a wide range of new models including Olmo3, Ministral 3, InternVL 3.5, and Kimi. We’ve also included significant improvements to quantization workflows and metrics logging.

🎉 New Features

Expanded Model Support

We’ve added support for more models!

Olmo3: including Olmo and Olmo2. (#3275 by @NanoCode012)
Ministral 3 (#3297, #3300 by @NanoCode012)
InternVL 3.5: (#3141 by @NanoCode012)
Kimi: using experimental training code. (#3257 by @NanoCode012)
Trinity: by ArceeAI. (#3292 by @NanoCode012)
Exaone 4: (#3279 by @nayohan)
MiMo & Plano: (#3332 by @NanoCode012)

New Experiment Tracking Integrations

SwanLab: You can now use SwanLab for experiment tracking. (#3334 by @PraMamba)
Trackio: Added Trackio validation integration. (#3253 by @abidlabs)

Training & PEFT Improvements

Liger Kernel for DPO: added liger support kernel for DPO training. (#3302 by @ved1beta)
Distributed Muon: Added support for distributed Muon optimizer. (#3264 by @salmanmohammadi)
Weight Tying Safety: Added peft_ensure_weight_tying to ensure correct parameter handling in PEFT. (#3278 by @NanoCode012)
Adapter Dtypes: Added peft_autocast_adapter_dtype config option for fine-grained control. (#3311 by @xzuyn)
Cheap PPL Metric: A new metric calculation for Perplexity that is less computationally expensive. (#3317 by @xzuyn)
Scaled Softmax: Scales softmax calculation by s * log(n) + b. (#3338 by @ved1beta )

⚠️ Deprecations & Warnings

PyTorch 2.7.1 Deprecation

Support for PyTorch 2.7.1 has been deprecated. We recommend upgrading to newer supported versions.

Contributed by @winglian in #3339.

🔧 Fixes & Improvements

Quantization & CLI

Save Processor: The quantizer CLI now properly saves the processor alongside the model. (#3290 by @salmanmohammadi)
FP8 Checks: Fixed checks for FP8 capability and load_in_8bit configurations. (#3324, #3327)
NVFP4 Configs: Added QAT NVFP4 configs for reference. (#3280 by @salmanmohammadi)

Logging & Metrics

Metric Rounding: You can now set the env AXOLOTL_METRIC_PRECISION= (5 default) to control the rounding of logged metrics. (#3325 by @ved1beta)
Token Logging: Fixed total/trainable tokens logging logic. (#3344 #3293 by @ved1beta)
Evaluation Loss: Fixed evaluation loss calculation in the KD trainer. (#3271 by @roycho96)

Data Processing

Long Sequence Handling: Feature added to raise an error on long sequence drops to prevent silent data loss. (#3321 by @kallewoof)
Qwen3 Tokenization: Fixed an off-by-a-few-tokens issue in Qwen3 Jinja tokenization. (#3295 by @NanoCode012)

📦 Dependency & Infrastructure Updates

PyTorch 2.9.1: Added base images and support for PyTorch 2.9.x and xformers wheels. (#3268, #3273, #3308 by @winglian)
CUDA 13.0: Added initial test matrices and wheel support for CUDA 13.0. (#3343, #3342 by @winglian)
Liger Kernel: Upgraded to 0.6.4. (#3289 by @NanoCode012)
Pydantic: Upgraded to 2.12. (#3328 by @winglian)
General Deps: End of year dependency upgrades. (#3299 by @winglian)
Scikit-learn: Removed unused dependency scikit-learn==1.4.2. (#3277 by @ved1beta)
Transformers and peft: Upgrade to 4.57.6 and 0.17.1 respectively (#3358 , #3361 by @winglian)

Others

fix bin size by @ved1beta in #3307
pre-commit hooks update. (#3287 by @github-actions[bot])
PYTORCH_CUDA_ALLOC_CONF deprecation fix to ensure compatibility with future PyTorch versions. (#3313 by @NanoCode012)

fix: Fix evaluation loss in KD trainer by @roycho96 in #3271
Fix typo in densemixer RuntimeError by @bethrezen in #3327
fix preview docs failing due to running out of disk by @winglian in #3326
feature: raise on long sequence drop by @kallewoof in #3321
feat: cleanup old flex mask patch, suppress Matmul bnb warn, and misc by @NanoCode012 in #3330
docs for checkpiont saving by @ved1beta in #3335
fix: gemma3_text model loading vision config by @NanoCode012 in #3354
fix syntax for secrets in gha yaml by @winglian in #3355
Update PR template by @salmanmohammadi in #3349
fix amd64 and set 2.9.1 as latest cloud image by @winglian in #3356
don't install deepspeed in arm64 images by @winglian in #3357
don't install xformers in for arm64 by @winglian in #3359
set version to v0.13.1 by @winglian in #3363

New Contributors

@nayohan made their first contribution in #3279
@roycho96 made their first contribution in #3271
@bethrezen made their first contribution in #3327
@abidlabs made their first contribution in #3253
@PraMamba made their first contribution in #3334
@1sand0s made their first contribution in #3346

Full Changelog: v0.13.0...v0.13.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.13.1

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

🎉 New Features

Expanded Model Support

New Experiment Tracking Integrations

Training & PEFT Improvements

⚠️ Deprecations & Warnings

PyTorch 2.7.1 Deprecation

🔧 Fixes & Improvements

Quantization & CLI

Logging & Metrics

Data Processing

📦 Dependency & Infrastructure Updates

Others

New Contributors

Contributors

Uh oh!