yzhaiustc

Yujia Zhai yzhaiustc

161 followers · 15 following

@NVIDIA
Santa Clara, California
07:56 - 7h behind
https://yzhaiustc.github.io/

Achievements

x2 x2

Achievements

x2 x2

Stars

ai-dynamo / dynamo

A Datacenter Scale Distributed Inference Serving Framework

Rust 3,384 233 Updated Mar 28, 2025

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,112 536 Updated Mar 28, 2025

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 7,332 685 Updated Mar 28, 2025

huggingface / open-r1

Fully open reproduction of DeepSeek-R1

Python 23,495 2,141 Updated Mar 30, 2025

deepseek-ai / DeepSeek-R1

87,771 11,336 Updated Feb 24, 2025

SiriusNEO / Triton-Puzzles-Lite

Puzzles for learning Triton, play it with minimal environment configuration!

Python 267 25 Updated Dec 3, 2024

triton-lang / triton

Development repository for the Triton language and compiler

MLIR 15,022 1,890 Updated Mar 30, 2025

ChenLiu-1996 / CitationMap

A simple pip-installable Python tool to generate your own HTML citation world map from your Google Scholar ID.

Python 516 43 Updated Feb 25, 2025

mit-han-lab / omniserve

[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

C++ 620 42 Updated Mar 6, 2025

bytedance / flux

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 815 53 Updated Mar 19, 2025

tinygrad / tinygrad

You like pytorch? You like micrograd? You love tinygrad! ❤️

Python 28,475 3,266 Updated Mar 30, 2025

xai-org / grok-1

Grok open release

Python 50,250 8,360 Updated Aug 30, 2024

volcengine / veScale

A PyTorch Native LLM Training Framework

Python 763 42 Updated Dec 27, 2024

IST-DASLab / marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 783 63 Updated Sep 4, 2024

google / heir

A compiler for homomorphic encryption

C++ 413 72 Updated Mar 30, 2025

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 10,041 1,272 Updated Mar 29, 2025

iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.

C++ 3,075 679 Updated Mar 30, 2025

AlibabaResearch / flash-llm

Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity

Cuda 203 16 Updated Sep 24, 2023

tlc-pack / libflash_attn

Standalone Flash Attention v2 kernel without libtorch dependency

C++ 108 15 Updated Sep 10, 2024

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 16,615 1,574 Updated Mar 29, 2025

intel / xetla

C++ 61 20 Updated Dec 18, 2024

raulbehl / 100DaysOfRTL

100 Days of RTL

SystemVerilog 357 105 Updated Aug 15, 2024

vosen / ZLUDA

CUDA on non-NVIDIA GPUs

Rust 11,053 708 Updated Mar 17, 2025

hpcaitech / ColossalAI

Making large AI models cheaper, faster and more accessible

Python 40,695 4,488 Updated Mar 28, 2025

eth-cscs / spla

Specialized Parallel Linear Algebra, providing distributed GEMM functionality for specific matrix distributions with optional GPU acceleration.

C++ 28 7 Updated Jun 26, 2024

icl-utk-edu / slate

SLATE is a distributed, GPU-accelerated, dense linear algebra library targetting current and upcoming high-performance computing (HPC) systems. It is developed as part of the U.S. Department of Ene…

C++ 110 23 Updated Jan 11, 2025

syclsparklers / XeHE

PostScript 3 Updated Apr 5, 2023

syclsparklers / directory

1 2 Updated Apr 3, 2023

twitter / the-algorithm

Source code for Twitter's Recommendation Algorithm

Scala 63,058 12,167 Updated Jul 10, 2024

NVIDIA / cuda-quantum

C++ and Python support for the CUDA Quantum programming model for heterogeneous quantum-classical workflows

C++ 657 225 Updated Mar 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yujia Zhai yzhaiustc

Achievements

Achievements

Block or report yzhaiustc

Stars

ai-dynamo / dynamo

deepseek-ai / DeepGEMM

deepseek-ai / DeepEP

huggingface / open-r1

deepseek-ai / DeepSeek-R1

SiriusNEO / Triton-Puzzles-Lite

triton-lang / triton

ChenLiu-1996 / CitationMap

mit-han-lab / omniserve

bytedance / flux

tinygrad / tinygrad

xai-org / grok-1

volcengine / veScale

IST-DASLab / marlin

google / heir

NVIDIA / TensorRT-LLM

iree-org / iree

AlibabaResearch / flash-llm

tlc-pack / libflash_attn

Dao-AILab / flash-attention

intel / xetla

raulbehl / 100DaysOfRTL

vosen / ZLUDA

hpcaitech / ColossalAI

eth-cscs / spla

icl-utk-edu / slate

syclsparklers / XeHE

syclsparklers / directory

twitter / the-algorithm

NVIDIA / cuda-quantum