Stars
vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization
Official inference repo for FLUX.1 models
Finetune Llama 3.3, DeepSeek-R1, Gemma 3 & Reasoning LLMs 2x faster with 70% less memory! 🦥
Official implementation of "Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance" (NeurIPS 2024)
Code release for RICA^2: Rubric-Informed, Calibrated Assessment of Actions (ECCV 2024)
Demo code for "Revolutionizing Collection Service: How AI-Driven Personalization is Transforming Payment Recovery""
Code release for "Deep Learning to Quantify Care Manipulation Activities in Neonatal Intensive Care Units"
Officail Implementation for "ReNoise: Real Image Inversion Through Iterative Noising"
Official Pytorch Implementation for "Splicing ViT Features for Semantic Appearance Transfer" presenting "Splice" (CVPR 2022 Oral)
VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.
OpenTAD is an open-source temporal action detection (TAD) toolbox based on PyTorch.
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
High-speed Large Language Model Serving for Local Deployment
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
The Triton TensorRT-LLM Backend
Structured state space sequence models
Mamba-Chat: A chat LLM based on the state-space model architecture 🐍
✨✨Latest Advances on Multimodal Large Language Models
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
Official implementation of CVPR 2024 paper: "FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition"
Make huge neural nets fit in memory
Official implementation for "Break-A-Scene: Extracting Multiple Concepts from a Single Image" [SIGGRAPH Asia 2023]
HF-NeuS: Improved Surface Reconstruction Using High-Frequency Details (NeurIPS 2022)