v0.2.0 - Exl3 Converter on Apple Silicon
Inference
- MiniCPM5-1B EXL3 support (
model_typellama) - ~152 tok/s greedy decode on M5 Max; ~0.9 GB resident
Converter (ponyexl3-convert)
- HF → EXL3 conversion on Metal: trellis search, Hessian/LDLQ, regularization, calibration, allocation
- Full-model MiniCPM5-1B in ~7 min (direct path)
- KLD vs bf16 matches turboderp/MiniCPM5-1B-exl3 4.00bpw (KLD 0.0422 vs 0.0428)
Install
pip install "ponyexl3 @ git+https://github.com/beamivalice/PonyExl3.git@v0.2.0"Full changelog: CHANGELOG.md