Skip to content

v0.13.1

Choose a tag to compare

@github-actions github-actions released this 20 Jan 13:59
· 261 commits to main since this release
6e42def

This release brings support for PyTorch 2.9.1, expands our ecosystem with new experiment trackers (SwanLab and Trackio), and introduces support for a wide range of new models including Olmo3, Ministral 3, InternVL 3.5, and Kimi. We’ve also included significant improvements to quantization workflows and metrics logging.

🎉 New Features

Expanded Model Support

We’ve added support for more models!

New Experiment Tracking Integrations

  • SwanLab: You can now use SwanLab for experiment tracking. (#3334 by @PraMamba)
  • Trackio: Added Trackio validation integration. (#3253 by @abidlabs)

Training & PEFT Improvements

  • Liger Kernel for DPO: added liger support kernel for DPO training. (#3302 by @ved1beta)
  • Distributed Muon: Added support for distributed Muon optimizer. (#3264 by @salmanmohammadi)
  • Weight Tying Safety: Added peft_ensure_weight_tying to ensure correct parameter handling in PEFT. (#3278 by @NanoCode012)
  • Adapter Dtypes: Added peft_autocast_adapter_dtype config option for fine-grained control. (#3311 by @xzuyn)
  • Cheap PPL Metric: A new metric calculation for Perplexity that is less computationally expensive. (#3317 by @xzuyn)
  • Scaled Softmax: Scales softmax calculation by s * log(n) + b. (#3338 by @ved1beta )

⚠️ Deprecations & Warnings

PyTorch 2.7.1 Deprecation

Support for PyTorch 2.7.1 has been deprecated. We recommend upgrading to newer supported versions.

🔧 Fixes & Improvements

Quantization & CLI

  • Save Processor: The quantizer CLI now properly saves the processor alongside the model. (#3290 by @salmanmohammadi)
  • FP8 Checks: Fixed checks for FP8 capability and load_in_8bit configurations. (#3324, #3327)
  • NVFP4 Configs: Added QAT NVFP4 configs for reference. (#3280 by @salmanmohammadi)

Logging & Metrics

  • Metric Rounding: You can now set the env AXOLOTL_METRIC_PRECISION= (5 default) to control the rounding of logged metrics. (#3325 by @ved1beta)
  • Token Logging: Fixed total/trainable tokens logging logic. (#3344 #3293 by @ved1beta)
  • Evaluation Loss: Fixed evaluation loss calculation in the KD trainer. (#3271 by @roycho96)

Data Processing

  • Long Sequence Handling: Feature added to raise an error on long sequence drops to prevent silent data loss. (#3321 by @kallewoof)
  • Qwen3 Tokenization: Fixed an off-by-a-few-tokens issue in Qwen3 Jinja tokenization. (#3295 by @NanoCode012)

📦 Dependency & Infrastructure Updates

Others

  • fix bin size by @ved1beta in #3307
  • pre-commit hooks update. (#3287 by @github-actions[bot])
  • PYTORCH_CUDA_ALLOC_CONF deprecation fix to ensure compatibility with future PyTorch versions. (#3313 by @NanoCode012)

New Contributors

Full Changelog: v0.13.0...v0.13.1