Popular repositories Loading
-
TensorRT-LLM
TensorRT-LLM PublicForked from NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
C++ 3
-
grouped_gemm
grouped_gemm PublicForked from fanshiqing/grouped_gemm
PyTorch bindings for CUTLASS grouped GEMM.
C++
-
marlin
marlin PublicForked from IST-DASLab/marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
Python
-
TransformerEngine
TransformerEngine PublicForked from NVIDIA/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…
Python
-
QuaRot
QuaRot PublicForked from spcl/QuaRot
Code for QuaRot, an end-to-end 4-bit inference of large language models.
Python
-
AutoGPTQ
AutoGPTQ PublicForked from AutoGPTQ/AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
Python
If the problem persists, check the GitHub status page or contact support.