-
-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement cuda kernels for unary ops #341
Comments
Currently the test for
I fixed it by changing line 27 in - float dx = inp[i] == 0.0 ? 0.0 : (signbit(inp[i]) ? 1.0 : -1.0);
+ float dx = inp[i] == 0.0 ? 0.0 : (signbit(inp[i]) ? -1.0 : 1.0); You could also write it a bit less confusingly like this: float dx = inp[i] == 0.0 ? 0.0 : copysignf(1.0, inp[i]); |
If I understand correctly, the current kernels should be written for single-precision floating point numbers ( Here's a list of the available functions: Single Precision Mathematical Functions What are your thoughts on using intrinsics? |
Oooh good catch about the single precision stuff, I had no idea, so glad ya'll know 😀 What is the difference between intrinsics vs the non intrinsics? |
… and #334 (#346) * Add cuda implementations for unary and binary tensor operations * Add cuda kernel for powi; Use fewer 64-bit functions * use copysign in abs kernal, as suggested in #341 (comment)
Thanks for the PR @nkoppel |
Intrinsics are faster than their non-intrinsic counterparts, at the cost of lower precision. Here is documentation for the standard mathematical functions, and here are their intrinsic counterparts. Note that ulp error stands for units in last place. It's probably best to avoid using intrinsics for now, because we can't guarantee that intrinsics will be worth it for every use case, and because these functions will usually only take a small fraction of the runtime anyways. |
Makes sense, it also seems like the |
It may be best to expose |
One big downside of intrinsics other than reduced accuracy is they might handle subnormals (NaNs, infs, etc.) differently. I think it definitely should not be enabled by default but instead behind a feature flag. |
These can mirror abs & exp which are already implemented:
The text was updated successfully, but these errors were encountered: