turboquant-vllm 1.0.0 — First stable release #4
Pinned
Alberto-Codes
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
First open-source TurboQuant implementation as a vLLM plugin. Paper to PyPI in 72 hours.
Google published TurboQuant at ICLR 2026 on March 24. By March 27,
turboquant-vllmwas serving compressed video inference from a stock vLLM container.What shipped
vllm.general_plugins, serves through the OpenAI-compatible APIBenchmarks
Molmo2-4B on RTX 4090 — 11K visual tokens + 256 generation tokens:
This is the first TurboQuant implementation validated on vision-language models with video input.
Install
What's next
If you're working on KV cache compression or VLM inference optimization, I'd love to hear from you — especially if you've hit precision issues at long sequences.
Full changelog: v1.0.0
Deep-dive blog post: Paper to PyPI in 72 hours
Beta Was this translation helpful? Give feedback.
All reactions