Skip to content

nod-ai/sglang

 
 

Repository files navigation

logo

PyPI PyPI - Downloads license issue resolution open issues Ask DeepWiki


| Blog | Documentation | Join Slack | Join Bi-Weekly Development Meeting | Slides |

News

  • [2025/01] 🔥 SGLang provides day one support for DeepSeek V3/R1 models on NVIDIA and AMD GPUs with DeepSeek-specific optimizations. (instructions, AMD blog)
  • [2024/12] 🔥 v0.4 Release: Zero-Overhead Batch Scheduler, Cache-Aware Load Balancer, Faster Structured Outputs (blog).
  • [2024/09] v0.3 Release: 7x Faster DeepSeek MLA, 1.5x Faster torch.compile, Multi-Image/Video LLaVA-OneVision (blog).
  • [2024/07] v0.2 Release: Faster Llama3 Serving with SGLang Runtime (vs. TensorRT-LLM, vLLM) (blog).
More
  • [2024/10] The First SGLang Online Meetup (slides).
  • [2024/02] SGLang enables 3x faster JSON decoding with compressed finite state machine (blog).
  • [2024/01] SGLang provides up to 5x faster inference with RadixAttention (blog).
  • [2024/01] SGLang powers the serving of the official LLaVA v1.6 release demo (usage).

About

SGLang is a fast serving framework for large language models and vision language models. It makes your interaction with models faster and more controllable by co-designing the backend runtime and frontend language. The core features include:

  • Fast Backend Runtime: Provides efficient serving with RadixAttention for prefix caching, zero-overhead CPU scheduler, continuous batching, token attention (paged attention), speculative decoding, tensor parallelism, chunked prefill, structured outputs, quantization (FP8/INT4/AWQ/GPTQ), and multi-lora batching.
  • Flexible Frontend Language: Offers an intuitive interface for programming LLM applications, including chained generation calls, advanced prompting, control flow, multi-modal inputs, parallelism, and external interactions.
  • Extensive Model Support: Supports a wide range of generative models (Llama, Gemma, Mistral, Qwen, DeepSeek, LLaVA, etc.), embedding models (e5-mistral, gte, mcdse) and reward models (Skywork), with easy extensibility for integrating new models.
  • Active Community: SGLang is open-source and backed by an active community with industry adoption.

Getting Started

Benchmark and Performance

Learn more in the release blogs: v0.2 blog, v0.3 blog, v0.4 blog

Roadmap

Development Roadmap (2025 H1)

Adoption and Sponsorship

The project is supported by (alphabetically): AMD, Baseten, Cursor, DataCrunch, Etched, Hyperbolic, Jam & Tea Studios, LinkedIn, LMSYS.org, Meituan, Novita AI, NVIDIA, RunPod, Stanford, UC Berkeley, UCLA, xAI, 01.AI.

Acknowledgment and Citation

We learned the design and reused code from the following projects: Guidance, vLLM, LightLLM, FlashInfer, Outlines, and LMQL. Please cite the paper, SGLang: Efficient Execution of Structured Language Model Programs, if you find the project useful.

About

SGLang is a fast serving framework for large language models and vision language models.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 85.5%
  • Cuda 6.5%
  • C++ 5.9%
  • Rust 1.7%
  • Shell 0.2%
  • HIP 0.1%
  • Other 0.1%