-
Notifications
You must be signed in to change notification settings - Fork 851
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Where can I see examples of WMMA GEMM usage for INT1 (bit 1)? #34
Comments
You can see an example in the perf tests at https://github.com/NVIDIA/cutlass/blob/master/tools/test/perf/gemm/wmma_binary_gemm.cu. |
The implementation is modeled here: cutlass/tools/util/reference/detail/inner_product.h Lines 51 to 61 in ed2ed4d
|
If anyone is interested, I implemented neural network for object detection - XNOR-Yolo model (bit-1 precision) on Darknet framework with Tensor Cores: AlexeyAB/darknet#2365 (comment)
|
@d-k-b Hi, Are there any approximate dates when the Device-Wide bin1_t-GEMM function that uses Tensor Cores will appear in the cutlass? |
Does the CUTLASS 1.2 library really support INT1 (1 bit) GEMM by using Tensor Cores, so can we use it for XNOR neural networks?
Does it perform XNOR
!(a^b)
operations instead of Multiply?Does it perform
C[j][i] = popcnt( A_i_row[x] XNOR B_j_col[x] )
?Should we pack each 32 bits into uint32_t (A along row, B along column) in such a maner as in cuDNN, where we should use
CUDNN_DATA_INT8x32
andCUDNN_TENSOR_NCHW_VECT_C
to use INT8 on Tensor Cores withCUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM
? https://docs.nvidia.com/deeplearning/sdk/cudnn-developer-guide/index.html#tensor-ops-speedup-tipsWhere can I read more about this and where can I see examples of Warp-Level Matrix Operations (WMMA) GEMM usage for INT1 (1 bit)?
I can see only tests for INT8 and INT4: https://github.com/NVIDIA/cutlass/blob/master/tools/test/unit/gemm/wmma_integer_gemm.cu
As written here we can achieve 2088 TOPS for INT1 (1 bit) on GeForce RTX 2080 Ti (TU102): http://on-demand.gputechconf.com/gtc-il/2018/pdf/sil8140-optimizing-cuda-applications-for-the-volta-turing-gpu-architecture.pdf
https://github.com/NVIDIA/cutlass#whats-new-in-cutlass-11
From the last newsletter:
The text was updated successfully, but these errors were encountered: