v0.1.7

beamivalice released this 15 Jun 17:27

· 48 commits to master since this release

32a09c4

Changelog

All notable changes to PonyExl3 are documented here.

[0.1.7] — 2026-06-13

Gemma4-26B-A4B EXL3 support (model_type gemma4 / gemma4_text)
Gemma4 MoE: EXL3Gemma4MoEBlock (compiled router + stacked experts; shared MLP separate)
Gemma4 routed experts use GeGLU (gelu_approx) in MoE kernels, matching exllamav3
Gemma4 sibling fusion: attn qkv + full-layer qk (40 MB threshold; MLP gate+up unfused)
Fusion parity test vs unfused logits (tests/test_gemma4_model.py)
Fix Gemma4 generation stop: merge top-level + text_config eos_token_id (honors <turn|>)

[0.1.6] — 2026-06-13

CLI validation: model dir, Metal, context limits, empty prompts, spec-flag warnings
ponyexl3-generate-bench: text-repeat prefill padding, cache clear between rows
Generation guards for prefill_chunk, num_draft, and max_position_embeddings
CLI edge-case tests (tests/test_cli.py, tests/test_generate_validation.py)

[0.1.5] — 2026-06-13

ponyexl3-generate-bench: prefill sweep (1k–32k) with 128-token decode per row
Shared generate CLI setup (--mtp, --dflash, --eagle3, --lookup, engines, etc.)
Default prompt file: README.md (--prompt-file to override)

[0.1.3] — 2026-06-13

MTP speculative decoding: temperature-aware verify (Leviathan–Chen rejection sampling)
README benchmark tables (M5 Max, M1 Max, RTX 4090 comparison)

[0.1.2] — 2026-06-13

Fix load transient memory / MLX buffer cache growth on 32 GB Macs
Wired-memory cap via PONYEXL3_MEM_LIMIT_GB (92% of device recommended working set)
M1 Max benchmark numbers in README

[0.1.1] — 2026-06-13

First 32 GB memory fix (load peak ~27.5 GB for 27B 4.15bpw)

[0.1.0] — 2026-06-13

Initial public release: EXL3 inference on Apple Silicon via MLX
CPU ref/ golden codec + MLX Metal runtime
Model loader for Qwen3.5 / Qwen3.6 dense and MoE
Speculative decoding: MTP, DFlash, EAGLE-3, n-gram lookup (verify-gated)
CLIs: ponyexl3-generate, ponyexl3-compare-layer, ponyexl3-compare-engines
Cross-platform reference export/compare scripts (ponyexl3/reference/)

Assets 4