Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement cuda kernels for unary ops #341

Closed
14 tasks done
Tracked by #278
coreylowman opened this issue Jan 7, 2023 · 8 comments
Closed
14 tasks done
Tracked by #278

Implement cuda kernels for unary ops #341

coreylowman opened this issue Jan 7, 2023 · 8 comments
Labels
gpu Related to GPU support

Comments

@coreylowman
Copy link
Owner

coreylowman commented Jan 7, 2023

These can mirror abs & exp which are already implemented:

  • abs
  • clamp
  • cos
  • exp
  • ln
  • nans_to
  • negate
  • powi
  • powf
  • relu
  • sigmoid
  • sin
  • square
  • tanh
@coreylowman coreylowman added the gpu Related to GPU support label Jan 7, 2023
@coreylowman coreylowman mentioned this issue Jan 7, 2023
47 tasks
@ViliamVadocz
Copy link
Contributor

Currently the test for abs fails on my machine. There is a small issue in the abs_backward implementation.

running 1 test
test tensor_ops::abs::tests::test_abs ... FAILED

failures:

---- tensor_ops::abs::tests::test_abs stdout ----
thread 'tensor_ops::abs::tests::test_abs' panicked at 'assertion failed: `(left == right)`
  left: `[0.2, 0.2, 0.0, -0.2, -0.2]`,
 right: `[-0.2, -0.2, 0.0, 0.2, 0.2]`', src\tensor_ops\abs\mod.rs:53:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

I fixed it by changing line 27 in abs.cu. signbit(x) is true when the x is negative.

-    float dx = inp[i] == 0.0 ? 0.0 : (signbit(inp[i]) ? 1.0 : -1.0);
+    float dx = inp[i] == 0.0 ? 0.0 : (signbit(inp[i]) ? -1.0 : 1.0);

You could also write it a bit less confusingly like this:

float dx = inp[i] == 0.0 ? 0.0 : copysignf(1.0, inp[i]);

@ViliamVadocz
Copy link
Contributor

If I understand correctly, the current kernels should be written for single-precision floating point numbers (float or f32). If so, it might be appropriate to use function specifically for floats such as fabsf or expf to avoid any accidental conversions to doubles and back.

Here's a list of the available functions: Single Precision Mathematical Functions


What are your thoughts on using intrinsics?

@coreylowman
Copy link
Owner Author

Oooh good catch about the single precision stuff, I had no idea, so glad ya'll know 😀 What is the difference between intrinsics vs the non intrinsics?

coreylowman pushed a commit that referenced this issue Jan 10, 2023
… and #334 (#346)

* Add cuda implementations for unary and binary tensor operations

* Add cuda kernel for powi; Use fewer 64-bit functions

* use copysign in abs kernal, as suggested in #341 (comment)
@coreylowman
Copy link
Owner Author

Thanks for the PR @nkoppel

@nkoppel
Copy link
Contributor

nkoppel commented Jan 10, 2023

Intrinsics are faster than their non-intrinsic counterparts, at the cost of lower precision. Here is documentation for the standard mathematical functions, and here are their intrinsic counterparts. Note that ulp error stands for units in last place.

It's probably best to avoid using intrinsics for now, because we can't guarantee that intrinsics will be worth it for every use case, and because these functions will usually only take a small fraction of the runtime anyways.

@coreylowman
Copy link
Owner Author

Makes sense, it also seems like the -use_fast_math flag forces them to compile to intrinsics? So we should use that flag instead of hardcoding to intrinsics in the future if we want to move to them

@nkoppel
Copy link
Contributor

nkoppel commented Jan 10, 2023

It may be best to expose -use_fast_math to users through a cargo feature so that they can keep accuracy and avoid potential weird behavior from using intrinsics when needed.

@ViliamVadocz
Copy link
Contributor

One big downside of intrinsics other than reduced accuracy is they might handle subnormals (NaNs, infs, etc.) differently. I think it definitely should not be enabled by default but instead behind a feature flag.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gpu Related to GPU support
Projects
None yet
Development

No branches or pull requests

3 participants