Skip to content

v0.9.1

Choose a tag to compare

@github-actions github-actions released this 08 May 11:31
· 764 commits to main since this release

Efficiency & Performance

Axolotl delivers demonstrably faster training due to improved packing and kernels, allowing you to accomplish more in less time. Our benchmarks show that we outperform the next faster trainer by 30% on real world workloads that translate directly to increased productivity and faster time-to-results.

Screenshot 2025-05-07 at 9 09 58 PM

Axolotl delivers this performance advantage while consuming less VRAM and eliminating resource spikes throughout your training runs.

Screenshot 2025-05-07 at 8 49 15 PM

Get started on Google Colab

Fine-tune your own Qwen3-14B for free on Google Colab: https://colab.research.google.com/drive/1EscYgLM38dWMcG5IyJz1qHl3VO7a2hZz

🚨 Breaking Changes & Deprecations

PyTorch 2.4.1 Support Removed

  • Support for torch==2.4.1 has been officially removed. Please upgrade to a newer version of PyTorch. (by @winglian in #2582)

🎉 New Features

Greatly Improved Sample Packing

  • We've implemented an improved Parallel Bin Packing algorithm that achieves ~99% packing efficiency on most datasets. This can improve your workload throughput by up to 10%. (by @winglian in #2631)
  • pad_to_sequence_len is now automatically enabled when using sample packing for better performance and stability. (by @winglian in #2607)

Xformers Attention for Packed Sequences

Added support for using xformers optimized attention with packed sequences in fp16, boosting training speed even further. (by @winglian in #2619)

Support fine-tuning Text-to-Speech model with LLM backbone

Axolotl now supports training a Text-to-Speech (TTS) model on top of an LLM. (by @mhenrichsen in #2614)

Automatic LoRA Kernel Activation

LoRA kernels are now automatically enabled where possible, providing a hands-free performance boost for LoRA training. This feature is automatically disabled for RL training to ensure stability. (by @djsaunde in #2589 and @winglian in #2600)

CAME Optimizer Support

You can now use the CAME optimizer, a memory-efficient optimizer designed for large language models. (by @xzuyn in #2385)

General User Experience

  • DeepSpeed Config Logging: Your DeepSpeed configuration is now automatically saved to Weights & Biases, making your runs easier to reproduce and debug. (by @winglian in #2593)
  • Automatic Reasoning Dataset Splitting: Axolotl can now automatically split existing reasoning datasets to leverage new chat templates with reasoning/tool-use turns. (by @winglian in #2591)

📦 Dependency Updates

  • liger-kernel bumped to 0.5.9. (by @winglian in #2640)
  • vllm bumped to 0.8.5 for Qwen2 support. (by @winglian in #2583)
  • datasets and other Hugging Face libraries have been updated.

🔧 Major Fixes

Model and Training Stability

Other Improvements

📚 Documentation & Examples

  • Added documentation for the new split_thinking feature. (by @NanoCode012 in #2613)
  • Corrected documentation for multimodal datasets and Llama 4 delinearization. (by @NanoCode012 in #2575, #2644)

⚙️ Internal & Plugin Enhancements

  • Plugins can now provide their own custom lr_scheduler. (by @alexdremov in #2584)
  • Plugins are now able to return their own fully processed datasets. (by @winglian in #2617)
  • Improved error messaging when a dataset fails to load. (by @NanoCode012 in #2637)

Full changelog

New Contributors

Full Changelog: v0.9.0...v0.9.1