SGLang is a fast serving framework for large language models and vision language models.
-
Updated
Aug 20, 2025 - Python
SGLang is a fast serving framework for large language models and vision language models.
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in performant way.
RTX 5090 & RTX 5060 Docker container with PyTorch + TensorFlow. First fully-tested Blackwell GPU support for ML/AI. CUDA 12.8, Python 3.11, Ubuntu 24.04. Works with RTX 50-series (5090/5080/5070/5060) and RTX 40-series.
Prebuilt DeepSpeed wheels for Windows with NVIDIA GPU support. Supports GTX 10 - RTX 50 series. Compiled with pytorch 2.7, 2.8 and cuda 12.8
Repository for Campbells-Luggs-Blackwells family history web site
Add a description, image, and links to the blackwell topic page so that developers can more easily learn about it.
To associate your repository with the blackwell topic, visit your repo's landing page and select "manage topics."