Where can I see examples of WMMA GEMM usage for INT1 (bit 1)? #34

AlexeyAB · 2018-11-09T20:35:28Z

Does the CUTLASS 1.2 library really support INT1 (1 bit) GEMM by using Tensor Cores, so can we use it for XNOR neural networks?
Does it perform XNOR !(a^b) operations instead of Multiply?
Does it perform C[j][i] = popcnt( A_i_row[x] XNOR B_j_col[x] ) ?
Should we pack each 32 bits into uint32_t (A along row, B along column) in such a maner as in cuDNN, where we should use CUDNN_DATA_INT8x32 and CUDNN_TENSOR_NCHW_VECT_C to use INT8 on Tensor Cores with CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM? https://docs.nvidia.com/deeplearning/sdk/cudnn-developer-guide/index.html#tensor-ops-speedup-tips
Where can I read more about this and where can I see examples of Warp-Level Matrix Operations (WMMA) GEMM usage for INT1 (1 bit)?

I can see only tests for INT8 and INT4: https://github.com/NVIDIA/cutlass/blob/master/tools/test/unit/gemm/wmma_integer_gemm.cu

As written here we can achieve 2088 TOPS for INT1 (1 bit) on GeForce RTX 2080 Ti (TU102): http://on-demand.gputechconf.com/gtc-il/2018/pdf/sil8140-optimizing-cuda-applications-for-the-volta-turing-gpu-architecture.pdf

https://github.com/NVIDIA/cutlass#whats-new-in-cutlass-11

WMMA GEMM targeting TensorCores - INT8, INT4, 1-bit https://github.com/NVIDIA/cutlass/blob/master/tools/test/unit/gemm/wmma_integer_gemm.cu

From the last newsletter:

CUTLASS 1.2, the latest version of the CUDA template library for linear algebra subroutines, includes the following key updates:

Support for Turing Tensor Cores that significantly speedup matrix computations for deep learning inference

Tensor Core optimized WMMA GEMMs for the new INT8, INT4, and INT1 precision modes introduced in Turing

Support for batched strided GEMMs, parallelized GEMM-K reductions, enhanced utilities, and samples

The text was updated successfully, but these errors were encountered:

d-k-b · 2018-11-09T20:44:36Z

You can see an example in the perf tests at https://github.com/NVIDIA/cutlass/blob/master/tools/test/perf/gemm/wmma_binary_gemm.cu.

d-k-b · 2018-11-09T20:47:06Z

The implementation is modeled here:

cutlass/tools/util/reference/detail/inner_product.h

Lines 51 to 61 in ed2ed4d

    
           int inner_product<Vector<bin1_t, 32>, Vector<bin1_t, 32>, int>( 
        
               Vector<bin1_t, 32> a, 
        
               Vector<bin1_t, 32> b, 
        
               int c) { 
        
             int accum = 0; 
        
             for (int bit = 0; bit < 32; bit++) { 
        
               accum += a[bit] ^ b[bit]; 
        
             } 
        
             return accum + c; 
        
           }

.

AlexeyAB · 2019-02-12T20:43:13Z

If anyone is interested, I implemented neural network for object detection - XNOR-Yolo model (bit-1 precision) on Darknet framework with Tensor Cores: AlexeyAB/darknet#2365 (comment)

Model	RTX 2070 `CUDNN_HALF=0`, ms	RTX 2070 `CUDNN_HALF=1`, ms	Speedup X times
yolov3-spp.cfg 608x608 Float-32/16 bit precision	40.9	27.2 (Tensor Cores for floats)	1.5x
yolov3-spp_xnor_obj.cfg.txt 608x608 CC7.5 (Tensor Cores for XNOR) Bit-1 precision	13.5	13.2	1.0x
Speedup X times	3.0x	2.0x	-

XNOR-net training process:

AlexeyAB · 2019-02-12T20:46:08Z

@d-k-b Hi,

Are there any approximate dates when the Device-Wide bin1_t-GEMM function that uses Tensor Cores will appear in the cutlass?

kerrmudgeon closed this as completed Dec 1, 2018

AlexeyAB mentioned this issue Dec 10, 2018

resnet+yolo3head where should I put the route layer AlexeyAB/darknet#2005

Open

gsvgit mentioned this issue May 29, 2019

Sparse boolean matrices for CFPQ JetBrains-Research/CFPQ-on-GPGPU#29

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Where can I see examples of WMMA GEMM usage for INT1 (bit 1)? #34

Where can I see examples of WMMA GEMM usage for INT1 (bit 1)? #34

AlexeyAB commented Nov 9, 2018

d-k-b commented Nov 9, 2018

d-k-b commented Nov 9, 2018 •

edited

Loading

AlexeyAB commented Feb 12, 2019

AlexeyAB commented Feb 12, 2019

Where can I see examples of WMMA GEMM usage for INT1 (bit 1)? #34

Where can I see examples of WMMA GEMM usage for INT1 (bit 1)? #34

Comments

AlexeyAB commented Nov 9, 2018

d-k-b commented Nov 9, 2018

d-k-b commented Nov 9, 2018 • edited Loading

AlexeyAB commented Feb 12, 2019

AlexeyAB commented Feb 12, 2019

d-k-b commented Nov 9, 2018 •

edited

Loading