StudyingShao

Follow

😅

NVJiangShao StudyingShao

😅

Follow

5 followers · 7 following

NVIDIA

Achievements

Achievements

Popular repositories Loading

TensorRT-LLM TensorRT-LLM Public

Forked from NVIDIA/TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 3
grouped_gemm grouped_gemm Public

Forked from fanshiqing/grouped_gemm

PyTorch bindings for CUTLASS grouped GEMM.

C++
marlin marlin Public

Forked from IST-DASLab/marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python
TransformerEngine TransformerEngine Public

Forked from NVIDIA/TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…

Python
QuaRot QuaRot Public

Forked from spcl/QuaRot

Code for QuaRot, an end-to-end 4-bit inference of large language models.

Python
AutoGPTQ AutoGPTQ Public

Forked from AutoGPTQ/AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Python