Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safe to change streams on rocblas_handle without synchronization? #1253

Closed
msimberg opened this issue Jun 2, 2022 · 3 comments
Closed

Safe to change streams on rocblas_handle without synchronization? #1253

msimberg opened this issue Jun 2, 2022 · 3 comments

Comments

@msimberg
Copy link

msimberg commented Jun 2, 2022

I'm wondering if it is safe to change streams on rocblas_handles without intermediate synchronization? We have a library where we use thread-local handles but more than one stream per thread for concurrency. For each successive use of a handle we go round-robin through the streams and set the stream on the handle. However, we're seeing errors in the results with this setup. Browsing through the rocblas source code and the documentation (e.g. https://rocblas.readthedocs.io/en/latest/API_Reference_Guide.html#device-memory-allocation-in-rocblas) everything seems to point at the fact that it's not safe to change the stream because then concurrent work on different streams may end up using the workspace simultaneously.

Can you confirm if this is the case? If yes, is this planned to change? Will the behaviour be explicitly documented? The library is currently already using cuBLAS where it is safe to change the stream to launch concurrent work on the same handle.

@amcamd
Copy link
Contributor

amcamd commented Jun 2, 2022

The following link has documentation on Handle, Stream, and Device management. https://docs.amd.com/bundle/rocBLAS-User-Guide---rocBLAS-documentation/page/Programmers_Guide_15.html#handle-stream-and-device-management

A paragraph from the documentation is below:

When a user changes the stream from one non-default stream to another non-default stream, it is the user’s responsibility to synchronize the old stream before setting the new stream.

Below is sample code from the documentation:

// Synchronize the old stream
if(hipStreamSynchronize(old_stream) != hipSuccess) return EXIT_FAILURE;

// Destroy the old stream (this step is optional but must come after synchronization)
if(hipStreamDestroy(old_stream) != hipSuccess) return EXIT_FAILURE;

// Create a new stream (this step can be done before the steps above)
if(hipStreamCreate(&new_stream) != hipSuccess) return EXIT_FAILURE;

// Set the handle to use the new stream (must come after synchronization)
if(rocblas_set_stream(handle, new_stream) != rocblas_status_success) return EXIT_FAILURE;

Is your request that we remove the requirement for the hipStreamSynchronize in the above sample code?

@msimberg
Copy link
Author

msimberg commented Jun 3, 2022

Thanks @amcamd for the response! I wasn't aware of that documentation, and it answers my question perfectly. Do you think it'd be worth lifting some of that information into more user-facing documentation (e.g. the documentation for rocblas_set_stream?). If I understand correctly, that "Programmer's Guide" looks to me like a guide for someone working on rocBLAS, not using rocBLAS. At least I'm not sure that would've been the first place I would think to look for this kind of information (given other more user-facing sections are available). It seems like all the ROCm documentation is still quite some flux at the moment so maybe things will improve anyway soon.

Is your request that we remove the requirement for the hipStreamSynchronize in the above sample code?

You've answered my primary request, which was just finding out if our understanding of the behaviour is correct.

For consistency with CUDA it would be nice if rocBLAS behaved the same (or at least hipBLAS, but I suppose making hipBLAS behave the same practically means making rocBLAS behave the same...), but I understand that this would likely not be a trivial change. Also, with this knowledge we can already work around this behaviour (with one stream per handle or manually allocated workspaces) so for the time being we're good. From my perspective this issue can be closed (since you've answered my documentation question).

@amcamd
Copy link
Contributor

amcamd commented Jun 3, 2022

Thanks @msimberg for comments on improving documentation. We will plan to move some of the content of the "Programmer's Guide" to make it more accessible by users.

@amcamd amcamd closed this as completed Jun 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants