-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tensor core abstractions #1346
Comments
A quick google search revealed this to me: https://developer.nvidia.com/blog/programming-tensor-cores-cuda-9/ So it looks like programmatic access to tensor cores is given via special API calls and they can essentially only do FMA on 4x4 matrices in single and half precision. That sounds very limited to me. But hey, that's what special purpose hardware is all about! I see potential use for linear algebra. 4x4 matrices are also heavily used in 3D graphics and computational geometry. Still, although these fields were the prime target of GPUs, the need for tensor cores only appeared much much later with deep learning. I have not found the corresponding APIs in HIP, also not in OpenCL or SYCL. So I don't know how AMD exposes them. For CPU targets I guess you have to model these 4x4 matrix FMAs with just normal floats. There is also a new BF16 float type, but that is super new: https://stackoverflow.com/a/49997863/2406044 I think access to tensor cores and reduced precision FP operations are too vendor specific for the moment to design a meaningful API. But please proof me wrong! :) |
AMD calls them |
Isn't it documented also in the CUDA Programming Guide under 7.24. Warp Matrix Functions ? |
In the meeting on 25 May 2021 we discussed having an alpaka abstraction for the various tensor core APIs found in recent versions of CUDA and ROCm. Opening this issue for broader discussion (and avoiding to forget this wish).
The text was updated successfully, but these errors were encountered: