Summary
- One-command converter —
ponyexl3-convert --in-dir SOURCE --out-dir OUT --bits 4.15 runs plan → calibration → measured bit allocation → LDLQ → resumable HF shards
- Self-converted Qwen3.6-27B @ 4.15bpw — KLD parity vs bf16; better ΔPPL (+0.015 vs +0.169) and p99 (0.548 vs 0.592) than UnstableLlama 4.15bpw
ponyexl3-convert-advanced — low-level/oracle path; ponyexl3-convert-e2e is a deprecated alias
- GPU-residency: MLX LDLQ, sibling batching, parallel measurement, layer reuse
Install
pip install "ponyexl3 @ git+https://github.com/beamivalice/PonyExl3.git@v0.3.0"
Convert
ponyexl3-convert --in-dir /path/to/Qwen3.6-27B \
--out-dir /path/to/Qwen3.6-27B-PonyExl3-4.15bpw --bits 4.15
Test plan