turboquant-vllm v1.3.0 — Seven validated model families #58
Pinned
Alberto-Codes
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
v1.3.0 expands model support from Molmo2-only to seven validated families. Fused paged kernels (v1.2.0) provide the performance foundation; v1.3.0 adds the kernel-level changes for architectural diversity.
What's new since v1.1.0
This release covers v1.2.0 through v1.3.0 — fused kernels, model expansion, and two production hotfixes.
python -m turboquant_vllm.verify --model <name> --bits 4checks any model in ~30 seconds.Benchmarks
Install / Upgrade
pip install turboquant-vllm[vllm]>=1.3.0What's Next
Full changelog: v1.2.0 | v1.2.1 | v1.2.2 | v1.3.0
Blog post: From one model to seven — making TurboQuant model-portable
Docs: alberto-codes.github.io/turboquant-vllm
Beta Was this translation helpful? Give feedback.
All reactions