v0.3.3
[0.3.3] - 2026-03-12
Added
- Self-contained binary:
mlx.metallibis now gzip-compressed and embedded into thepmetalbinary at build time viabuild.rs+include_bytes!. On first run it extracts to~/.cache/pmetal/lib/if not already present.cargo install pmetal-clinow produces a fully self-contained binary with no external metallib dependency (~31MB added to binary, 70% smaller than the raw 102MB metallib) - Adaptive LR rollback: When divergence is detected and
rollback_enabled = true, the adaptive LR controller emitsLrEvent::RollbackTriggered— the training loop restores LoRA weights from the best in-memory EMA snapshot, resets optimizer momentum, and continues with a halved LR multiplier - Early-stop on repeated divergence: After
max_rollbacksexhausted rollbacks, the controller emitsLrEvent::EarlyStop— the training loop saves a final checkpoint and exits cleanly instead of spiraling deeper into loss divergence - In-memory LoRA snapshot:
TrainingLoopholds the best LoRA weight snapshot in RAM viasnapshot_best_weights()/restore_best_weights(). LoRA params are typically 1–20 MB, making this negligible overhead vs checkpoint I/O AdaptiveActionenum:apply_adaptive_lr()now returnsAdaptiveAction::Continue | Rollback | EarlyStopso training loops can react to controller decisions without re-parsing event strings
Fixed
apply_adaptive_lrreturn type: Previously returned(), discarding rollback/early-stop events — callers had no way to react. Now returnsAdaptiveAction- Divergence rollback vs plain reduction ambiguity: Divergence path now checks
rollback_enabledandhas_best_snapshotbefore deciding between rollback and plain LR reduction — prevents silent rollback when no snapshot exists - EMA state reset on rollback: Spike EMA and variance are reset alongside LR multiplier on rollback so z-score anomaly detection re-stabilizes correctly after weight restoration
total_stepsin metrics:run_standard()andrun_jit_compiled()computedtotal_steps: max_steps.unwrap_or(0)— now estimates fromdataset.len() / batch_size * epochswhenmax_stepsisNone, giving accurate progress in the TUIstats_summarymissing rollback count:AdaptiveLrController::stats_summary()now includesrollbacks=Nin its output string
Improved
- Rollback tests: Four new unit tests —
test_rollback_triggered_on_divergence,test_early_stop_after_max_rollbacks,test_rollback_disabled_falls_through_to_divergence,test_should_snapshot_best_tracks_ema_improvement
Full Changelog: v0.3.2...v0.3.3