hhy3

Follow

zh Wang hhy3

Follow

49 followers · 37 following

@zilliztech
Shanghai
08:37 - 8h ahead

Achievements

Achievements

Organizations

Starred repositories

rapidsai / cuml

cuML - RAPIDS Machine Learning Library

C++ 4,572 564 Updated Mar 26, 2025

lucidrains / vector-quantize-pytorch

Vector (and Scalar) Quantization, in Pytorch

Python 3,072 246 Updated Mar 24, 2025

HugoZHL / PQCache

[SIGMOD 2025] PQCache: Product Quantization-based KVCache for Long Context LLM Inference

Python 39 9 Updated Feb 3, 2025

HuangOwen / Awesome-LLM-Compression

Awesome LLM compression research papers and tools.

1,445 93 Updated Mar 28, 2025

Aaronhuang-778 / BiLLM

[ICML 2024] BiLLM: Pushing the Limit of Post-Training Quantization for LLMs

Python 211 15 Updated Jan 11, 2025

jy-yuan / KIVI

[ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

Python 283 29 Updated Jan 19, 2025

sgl-project / sgl-learning-materials

Materials for learning SGLang

357 24 Updated Mar 22, 2025

casper-hansen / AutoAWQ

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:

Python 2,042 256 Updated Mar 6, 2025

mit-han-lab / llm-awq

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Python 2,891 241 Updated Mar 25, 2025

turboderp-org / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs

Python 4,077 305 Updated Mar 15, 2025

ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Python 36,270 6,162 Updated Mar 29, 2025

intel / intel-extension-for-pytorch

A Python package for extending the official PyTorch that can easily obtain performance on Intel platform

Python 1,803 264 Updated Mar 24, 2025

Qcompiler / MixQ_Tensorrt_LLM

Mixed precision inference by Tensorrt-LLM

C++ 79 20 Updated Oct 23, 2024

stas00 / ml-engineering

Machine Learning Engineering Open Book

Python 13,256 803 Updated Mar 9, 2025

pytorch / torchrec

Pytorch domain library for recommendation systems

Python 2,073 488 Updated Mar 28, 2025

NVIDIA / accelerated-computing-hub

NVIDIA curated collection of educational resources related to general purpose GPU programming.

Jupyter Notebook 369 67 Updated Mar 19, 2025

ai-dynamo / dynamo

A Datacenter Scale Distributed Inference Serving Framework

Rust 3,353 229 Updated Mar 28, 2025

ademeure / DeeperGEMM

Forked from deepseek-ai/DeepGEMM

DeeperGEMM: crazy optimized version

Cuda 63 Updated Mar 16, 2025

NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 13,483 2,758 Updated Mar 28, 2025

SiriusNEO / Triton-Puzzles-Lite

Puzzles for learning Triton, play it with minimal environment configuration!

Python 267 25 Updated Dec 3, 2024

mit-han-lab / tinyml

Python 858 142 Updated Nov 29, 2023

ahujasid / blender-mcp

Python 8,711 743 Updated Mar 27, 2025

thu-pacman / chitu

High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.

Python 1,048 69 Updated Mar 28, 2025

zhaochenyang20 / Awesome-ML-SYS-Tutorial

My learning notes/codes for ML SYS.

Python 1,603 91 Updated Mar 28, 2025

zhihu / ZhiLight

A highly optimized LLM inference acceleration engine for Llama and its variants.

C++ 881 103 Updated Mar 24, 2025

embeddings-benchmark / mteb

MTEB: Massive Text Embedding Benchmark

Jupyter Notebook 2,357 355 Updated Mar 28, 2025

BBuf / how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Cuda 2,052 183 Updated Mar 26, 2025

Oneflow-Inc / oneflow

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.

C++ 7,648 834 Updated Mar 27, 2025

gpu-mode / lectures

Material for gpu-mode lectures

Jupyter Notebook 4,138 418 Updated Feb 9, 2025

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 2,511 261 Updated Mar 27, 2025

Starred topics

C++

Algorithm