You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current implementation of joint_exclusive_scan (at least HIP-like; SSCP does not support them yet) does not seem to support in-place operations, even though the standard requires it ("Note that first may be equal to result.")
Also, it allocates a __shared__ scratch storage for the operation (inside of __hipsycl_inclusive_scan_over_group) even if the output is __shared__ too and can safely be used for scratch. Not a bug, just inefficiency
$ acpp -O3 --acpp-targets='omp;cuda:sm_86;hip:gfx1034' test_scan.cpp && ACPP_VISIBILITY_MASK="cuda;hip" ./a.out [a bunch of "loop not vectorized" warnings]4 warnings generated when compiling for host.Running on NVIDIA GeForce RTX 3060Running inplace...Got 8191 failues!Running with a temp buffer...Got 0 failues!Running on hipSYCL OpenMP host deviceRunning inplace...Got 8191 failues!Running with a temp buffer...Got 0 failues!Running on AMD Radeon RX 6400Running inplace...Got 8191 failues!Running with a temp buffer...Got 0 failues!
Expected behavior
A clear and concise description of what you expected to happen.
And NVIDIA Compute Sanitizer reports a potential race on a scratch buffer in __hipsycl_inclusive_scan_over_group. Still not enough.
Both fixes above are in d41f958 for anyone interested.
Upd 2: And the host implementation seems to be pretty broken in the "in-place" case too, but apparently for independent reasons since the implementation is quite different.
Upd 3: Ah, yes, we pass input and output to __hipsycl_inclusive_scan_over_group shifted by one, so we have a nasty overlap there :( At least on top of my head, I don't see an easy way to easily fix the current code (i.e., while keeping exclusive scan implemented on top of internal scan). The easiest would perhaps be to template the functions on whether the scan is inclusive or exclusive. Or just duplicate the code.
Bug summary
The current implementation of
joint_exclusive_scan
(at least HIP-like; SSCP does not support them yet) does not seem to support in-place operations, even though the standard requires it ("Note thatfirst
may be equal toresult
.")Also, it allocates a
__shared__
scratch storage for the operation (inside of__hipsycl_inclusive_scan_over_group
) even if the output is__shared__
too and can safely be used for scratch. Not a bug, just inefficiencyTo Reproduce
Expected behavior
A clear and concise description of what you expected to happen.
Describe your setup
The text was updated successfully, but these errors were encountered: