AatroxZhou

AatroxZhou AatroxZhou

💻 AI Infra Engineer @JD.com ⚡ LLM Inference: Speculative Decoding, KV Cache Compression, Quantization & Distillation. 🎵 Prev: AI Music Training & Inference

1 follower · 0 following

@JD.com
Beijing

Popular repositories Loading

smoothquant smoothquant Public

Forked from mit-han-lab/smoothquant

SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Python
FasterTransformer FasterTransformer Public

Forked from void-main/FasterTransformer

Transformer related optimization, including BERT, GPT

C++
sglang sglang Public

Forked from jessiewei7/sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

Python

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AatroxZhou AatroxZhou

Block or report AatroxZhou

Popular repositories Loading

Uh oh!