Skip to content

v0.5.1 — Fix: `save_pretrained_merged` on a quantized base (issue #15)

Latest

Choose a tag to compare

@ARahim3 ARahim3 released this 31 May 09:37

Patch release.

If you fine-tuned a 4-bit base (e.g. Llama-3.2-1B-Instruct-4bit) and saved with model.save_pretrained_merged(...), the reloaded merged model could behave like the base model — the fine-tune was lost — even though inference right after training looked correct. Fixed. If you hit this, just re-merge after upgrading; your adapters were fine all along.

Upgrade

uv pip install -U mlx-tune

No API changes. The default save_method="merged_16bit" now does the right thing.

Affected before fix

Merging a LoRA adapter into a quantized base with save_pretrained_merged — most likely to bite with light fine-tunes. A strong fine-tune (higher LR / more steps) often survived the 4-bit round-trip, which is why it was intermittent.

Unaffected

Adapter-only saves (save_pretrained + load_adapter), merging into a non-quantized (bf16/fp16) base, and the VLM / STT / Embedding merge paths (separate code).

Full Changelog: https://github.com/arahim3/mlx-tune/commits/v0.5.1