Bruce-Lee-LY

Follow

Bruce-Lee-LY Bruce-Lee-LY

Follow

LLM Infer, AI Infra, CUDA

98 followers · 0 following

Achievements

BetaSend feedback

Achievements

BetaSend feedback

Block or Report

Block or report Bruce-Lee-LY

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Pinned

cuda_hgemm cuda_hgemm Public

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

Cuda 203 45
cuda_hook cuda_hook Public

Hooked CUDA-related dynamic libraries by using automated code generation tools.

C 100 28
cuda_hgemv cuda_hgemv Public

Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.

Cuda 20 4
matrix_multiply matrix_multiply Public

Several common methods of matrix multiplication are implemented on CPU and Nvidia GPU using C++11 and CUDA.

C++ 12 2
cuda_back2back_hgemm cuda_back2back_hgemm Public

Use tensor core to calculate back-to-back HGEMM (half-precision general matrix multiplication) with MMA PTX instruction.

Cuda 10 2
memory_pool memory_pool Public

Simple and efficient memory pool is implemented with C++11.

C++ 4 4