-
-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
INT8 version of GEMM? #202
Comments
I haven't done the research on INT8 yet, so I don't know of any other GEMM implementations with INT8. Nevertheless, I think INT8 is an interesting topic for CLBlast. Having tackled FP16 already, I'd be willing to spend time on implementing such a feature, but I don't think it's easy, both on the host and device side many things will have to change going from floating-point to fixed-point. Also, what kind of hardware would you run this on? Hardware with native INT8 support? Does ARM Mali support this (given that it's in ARM's compute library)? Or do they pack 4 values together in a 32-bit integer? I'll have to read up on this topic a bit more in other to give you a proper answer. |
thnx for the response. Or do they pack 4 values together in a 32-bit integer? In tensorflow documentation, they highlight the range for mapping float to unsigned char based on experimentation If my understanding is correct, INT8 is not a special datatype, rather it's just an unsigned char value. Also, what kind of hardware would you run this on? Hardware with native INT8 support? |
@SAT8 Edit: I have to mention you'll not have the greatest time performance-wise. |
Do you have any update for this issue now? Or road map?, Thanks. |
No, not really. Not sure if I will ever work on this, other things have priority. But contributors are free to work on this of course. What hardware would you run it on? What use-case do you have? |
Hi, I worked on one kind of miner algo, it needs batchs of size 256 by 256 int8 to int16 matrix multiplication. For nvidia cuda, already done, but amd opencl, seems not have a solution yet |
Well, you could try naibaf7's implementation as mentioned above. But as he says, there is not much support for INT8 multiplications in hardware, so you'll probably won't gain much (or will actually lose) compared to FP32. |
@CNugteren Thanks for you info. Very appreciate it. |
INT8 GEMM is usually as s8s8s32. |
Hi
I am looking for a INT8 version of GEMM in OpenCL. If I am correct, CLBlast does not yet support it. Pls correct me if I am wrong and comment on the usage (perhaps a sample app etc.,).
Supposing INT8 variant is not yet present in CLBlast, have you come across any other works that you may recommend. I did run into this repo https://github.com/strin/gemm-android & then ARM's compute library https://github.com/ARM-software/ComputeLibrary/blob/master/src/core/CL/cl_kernels/gemm.cl
My goal is to extend my project https://github.com/sat8/YoloOCLInference to support INT8 models during inference. I have gathered few initial details on how to go about quantization from tensorflow https://www.tensorflow.org/performance/quantization and would like to implement it in my project but is in need of a INT8 version of GEMM. Tensorflow refers to https://github.com/google/gemmlowp which is a CPU & NEON optimized gemm, a CPU only library.
Any thoughts or comments would be appreciated.
The text was updated successfully, but these errors were encountered: