-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
axpby! support for BFloat16 #1399
Comments
This would be a good fallback definition, and a great addition to GPUArrays.jl in https://github.com/JuliaGPU/GPUArrays.jl/blob/master/src/host/linalg.jl.
FWIW, benchmarking a method that does scalar iteration is not really interesting, as it's known to be slow and just a fallback for correctness. Hence the warning. |
great! i'll work on a PR to GPUArrays. by "fallback" you mean that the proper way to do this would be to add a kernel to CUDA.jl? |
If CUBLAS or so provides an accelerated version for some types, we prefer that one (because it's likely to be faster). In that sense, a generic definition like yours is a fallback for when we can't use vendor libraries. |
axpby!
with BFloat16 falls back to an iterative method. i was hoping CuTensor might support it, but no joy, and it seems slower for Float32 anyway.so i wrote a dumb array-interface method for
axpby!
that supports BFloat16, which is competitively fast. would this code (see method definition near bottom of REPL transcript below), or a better version of it, which perhaps calls a newscal!
andaxpy!
to mimic the CUBLAS dataflow, fit in anywhere? happy to submit a PR with some guidance.The text was updated successfully, but these errors were encountered: