Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
-
Updated
Mar 27, 2025 - Python
Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
基于SparkTTS模型,提供高质量中文语音合成与声音克隆服务。
☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!
AI-based search engine done right
A guide to structured generation using constrained decoding
DeepSeek-V3, R1 671B on 8xH100 Throughput Benchmarks
SgLang vs vLLM Comparison
Examples of serving LLM on Modal.
llmd is a LLMs daemonset, it provide model manager and get up and running large language models, it can use llama.cpp or vllm or sglang to running large language models.
Add a description, image, and links to the sglang topic page so that developers can more easily learn about it.
To associate your repository with the sglang topic, visit your repo's landing page and select "manage topics."