gemm
Here are 64 public repositories matching this topic...
Some common CUDA kernel implementations (Not the fastest).
-
Updated
Jun 13, 2024 - Cuda
Course Programming on new Architecture-1 (GPU), autumn 2021
-
Updated
Dec 5, 2021 - C++
Fast SGEMM emulation on Tensor Cores
-
Updated
Nov 20, 2023 - Cuda
A lightweight matrix computation software library aim for MCU or embedded system
-
Updated
Feb 24, 2022 - C
Development of deep learning inference code by OpenCL kerenl function.
-
Updated
Jun 1, 2022 - C++
Low Precision Arithmetic for Convolutional Neural Network Inference
-
Updated
Oct 29, 2017 - C++
My attempt of making a GEMM kernel...
-
Updated
Jun 16, 2023 - Cuda
XNOR-Net with binary conv2d kernels with XNOR GEMM op, support both CPU and GPU.
-
Updated
Oct 25, 2022 - C
Manually optimize the GEMM (GEneral Matrix Multiply) operation. There is a long way to go.
-
Updated
Aug 22, 2021 - C++
-
Updated
Feb 4, 2018 - C++
A Flexible and Energy Efficient Accelerator For Sparse Convolution Neural Network
-
Updated
Jun 7, 2024 - Verilog
Improve this page
Add a description, image, and links to the gemm topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the gemm topic, visit your repo's landing page and select "manage topics."