-
Notifications
You must be signed in to change notification settings - Fork 258
Register BFloat16 #1092
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Register BFloat16 #1092
Conversation
|
This will probably require some logic like we have for CUDA.jl/lib/cublas/wrappers.jl Lines 807 to 877 in 92622ed
For non-mutating APIs, we may want to extend this (both for matrix multiplicaton and for the DNN APIs you want to wrap) so that it also figures out an appropriate output type (e.g. depending on the CUDA math mode). But doing all this ad hoc for every mixed-mode API seems bad though, so we probably need a more systematic solution. |
|
Right. I was thinking of following what nvidia suggests for accumulating etc since those are likely the best tested versions of these kernels. I'm a bit unsure of how to choose the math mode still. I'm assuming there's a complementary math mode for bfloats as there is for f32 and 64? |
I'm not sure what you mean. We have a CUDA.jl math mode: Lines 18 to 30 in 92622ed
When performing API calls, we either (for old APIs) convert that math mode to the library-specific ones, or (for new APIs, which 'express' the math mode in terms of which compute type you want the API to use) use it to determine which compute type to use. |
|
I meant the likes of |
That's the 'new-style' math mode, specified per API via the compute type. For older CUBLAS APIs we need to set the per-handle math mode: Lines 31 to 65 in 92622ed
|
73ed8f9 to
5d585c4
Compare
This currently isn't sufficient to run BFloat16 kernels yet, but its a start to get CUDNN's BFloat16 type recognised. Currently this is mapped from BFloat16s.jl which is already a dep for CUDA.jl, but would hopefully be replaced by the language's version when its added.