Skip to content
View ywang96's full-sized avatar

Organizations

@vllm-project

Block or report ywang96

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A Datacenter Scale Distributed Inference Serving Framework

Rust 3,325 224 Updated Mar 28, 2025

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL

Python 1,754 114 Updated Mar 27, 2025

FlashMLA: Efficient MLA decoding kernels

C++ 11,383 811 Updated Mar 1, 2025

how to optimize some algorithm in cuda.

Cuda 2,047 183 Updated Mar 26, 2025

Entropy Based Sampling and Parallel CoT Decoding

Python 3,343 319 Updated Nov 13, 2024

documentation for content creation

HTML 188 19 Updated Feb 13, 2025

Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI

Python 987 42 Updated Mar 18, 2025

CUDA Templates for Linear Algebra Subroutines

C++ 7,186 1,182 Updated Mar 21, 2025

A fast multimodal LLM for real-time voice

Python 3,771 275 Updated Feb 14, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 42,926 6,515 Updated Mar 28, 2025

Manages vllm-nccl dependency

Python 17 3 Updated Jun 3, 2024

CUDA/Metal accelerated language model inference

C 532 23 Updated Mar 9, 2025
Showing results