Release v0.9.1 · axolotl-ai-cloud/axolotl

Efficiency & Performance

Axolotl delivers demonstrably faster training due to improved packing and kernels, allowing you to accomplish more in less time. Our benchmarks show that we outperform the next faster trainer by 30% on real world workloads that translate directly to increased productivity and faster time-to-results.

Axolotl delivers this performance advantage while consuming less VRAM and eliminating resource spikes throughout your training runs.

Get started on Google Colab

Fine-tune your own Qwen3-14B for free on Google Colab: https://colab.research.google.com/drive/1EscYgLM38dWMcG5IyJz1qHl3VO7a2hZz

🚨 Breaking Changes & Deprecations

PyTorch 2.4.1 Support Removed

Support for torch==2.4.1 has been officially removed. Please upgrade to a newer version of PyTorch. (by @winglian in #2582)

🎉 New Features

Greatly Improved Sample Packing

We've implemented an improved Parallel Bin Packing algorithm that achieves ~99% packing efficiency on most datasets. This can improve your workload throughput by up to 10%. (by @winglian in #2631)
pad_to_sequence_len is now automatically enabled when using sample packing for better performance and stability. (by @winglian in #2607)

Xformers Attention for Packed Sequences

Added support for using xformers optimized attention with packed sequences in fp16, boosting training speed even further. (by @winglian in #2619)

Support fine-tuning Text-to-Speech model with LLM backbone

Axolotl now supports training a Text-to-Speech (TTS) model on top of an LLM. (by @mhenrichsen in #2614)

Automatic LoRA Kernel Activation

LoRA kernels are now automatically enabled where possible, providing a hands-free performance boost for LoRA training. This feature is automatically disabled for RL training to ensure stability. (by @djsaunde in #2589 and @winglian in #2600)

CAME Optimizer Support

You can now use the CAME optimizer, a memory-efficient optimizer designed for large language models. (by @xzuyn in #2385)

General User Experience

DeepSpeed Config Logging: Your DeepSpeed configuration is now automatically saved to Weights & Biases, making your runs easier to reproduce and debug. (by @winglian in #2593)
Automatic Reasoning Dataset Splitting: Axolotl can now automatically split existing reasoning datasets to leverage new chat templates with reasoning/tool-use turns. (by @winglian in #2591)

📦 Dependency Updates

liger-kernel bumped to 0.5.9. (by @winglian in #2640)
vllm bumped to 0.8.5 for Qwen2 support. (by @winglian in #2583)
datasets and other Hugging Face libraries have been updated.

🔧 Major Fixes

Model and Training Stability

Qwen2 Models: Fixed multiple issues with packing and kernel support for the Qwen2 and Qwen2-MoE model families, ensuring they train correctly. (by @winglian in #2588, #2612, #2622 and @NanoCode012 in #2596)
Evaluation: Fixed a bug that could cause evaluation runs to fail. (by @djsaunde in #2586)
DeepSpeed: Resolved an issue where the learning rate was passed as a tensor instead of a float, causing errors. (by @winglian in #2595). This was also pushed upstream at huggingface/transformers#37704 by @NanoCode012 and huggingface/transformers#37881 by @winglian.
DPO Trainer: Fixed an issue where evaluation steps were incorrectly overridden. (by @winglian in #2628)

Other Improvements

📚 Documentation & Examples

Added documentation for the new split_thinking feature. (by @NanoCode012 in #2613)
Corrected documentation for multimodal datasets and Llama 4 delinearization. (by @NanoCode012 in #2575, #2644)

⚙️ Internal & Plugin Enhancements

Plugins can now provide their own custom lr_scheduler. (by @alexdremov in #2584)
Plugins are now able to return their own fully processed datasets. (by @winglian in #2617)
Improved error messaging when a dataset fails to load. (by @NanoCode012 in #2637)

Full changelog

remove torch 2.4.1 CI as part of support deprecation by @winglian in #2582
Post release fixes by @winglian in #2581
set config on the PluginManager for callback access by @winglian in #2587
Fix eval + add smoke test by @djsaunde in #2586
support for qwen3 with lora kernels by @winglian in #2588
bump vllm==0.8.5 for qwen3 support by @winglian in #2583
fix(doc): update key used to point to url in multimodal doc by @NanoCode012 in #2575
auto-enable lora kernels where possible by @djsaunde in #2589
Plugins create_lr_scheduler support by @alexdremov in #2584
patch to convert LR from tensor to float when using DS by @winglian in #2595
feat: add qwen3 moe block for ds3 by @NanoCode012 in #2596
upload the deepspeed json to wandb by @winglian in #2593
Handle other reasoning trace dataset formats by @winglian in #2591
only import vllm serve cli if its being called by @winglian in #2597
don't automatically enable lora kernels for RL training by @winglian in #2600
ensure we pass axolotl extras to the Dockerfile so vllm is included in shipped images by @winglian in #2599
replace zero_only with simpler if statement by @winglian in #2592
additional args for grpo config/trainer by @winglian in #2598
use latest hf-xet and don't install vllm for torch 2.7.0 by @winglian in #2603
Add num_completions_to_print for trl and grpo by @dhruvmullick in #2604
add missing init for lr monkeypatch fix by @winglian in #2609
Logging config for colab by @winglian in #2611
fix: run preview-docs only when md/qmd changes by @NanoCode012 in #2606
automatically set pad_to_sequence_len when use packing by @winglian in #2607
remove keys to incoporate changes for the trl update by @aitechguy0105 in #2616
qwen3 and qwen3_moe support for liger kernels by @winglian in #2612
setup hf transfer too and fix auto bf16 when fp16 enabled by @winglian in #2620
include multipack support for qwen3 family by @winglian in #2622
Fix logging deprecation warnings by @emmanuel-ferdman in #2623
Adds example for training a TTS model on top of a LLM. by @mhenrichsen in #2614
repop cache by @winglian in #2639
make sure gc_steps is used for all trainers by @winglian in #2638
fix dpo eval override to call grandparent instead of the broken super by @winglian in #2628
Print axolotl art if train is called outside of cli by @winglian in #2627
Update lr_scheduler options in config.qmd to include additional sched… by @mhenrichsen in #2636
bump liger dep to 0.5.9 by @winglian in #2640
feat(doc): add split_thinking docs by @NanoCode012 in #2613
allow plugins to return their own dataset by @winglian in #2617
Multipack parallel bin packing by @winglian in #2631
xformers attention with packing by @winglian in #2619
Add missing init.py file to cut_cross_entropy integration by @BitPhinix in #2642
Configurable embeddings upcast by @winglian in #2621
Fix: improve error message on failed dataset load by @NanoCode012 in #2637
fix(doc): clarify instruction to delinearize llama4 similar to cli doc by @NanoCode012 in #2644
Add CAME Optimizer by @xzuyn in #2385
swap tinymodels that have safetensors for some ci tests by @winglian in #2641

New Contributors

@alexdremov made their first contribution in #2584
@rahul-tuli made their first contribution in #2479
@aitechguy0105 made their first contribution in #2616
@emmanuel-ferdman made their first contribution in #2623
@BitPhinix made their first contribution in #2642

Full Changelog: v0.9.0...v0.9.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.9.1

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Efficiency & Performance

Get started on Google Colab

🚨 Breaking Changes & Deprecations

PyTorch 2.4.1 Support Removed

🎉 New Features

Greatly Improved Sample Packing

Xformers Attention for Packed Sequences

Support fine-tuning Text-to-Speech model with LLM backbone

Automatic LoRA Kernel Activation

CAME Optimizer Support

General User Experience

📦 Dependency Updates

🔧 Major Fixes

Model and Training Stability

Other Improvements

📚 Documentation & Examples

⚙️ Internal & Plugin Enhancements

Full changelog

New Contributors

Contributors

Uh oh!