Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building CuPy with PTDS enabled #3755

Closed
jakirkham opened this issue Aug 10, 2020 · 9 comments
Closed

Building CuPy with PTDS enabled #3755

jakirkham opened this issue Aug 10, 2020 · 9 comments
Labels
cat:enhancement Improvements to existing features pr-ongoing

Comments

@jakirkham
Copy link
Member

jakirkham commented Aug 10, 2020

Is it possible to build CuPy with PTDS enabled? What issues (if any) would one encounter when trying this?

ref: https://developer.nvidia.com/blog/gpu-pro-tip-cuda-7-streams-simplify-concurrency/

@kmaehashi
Copy link
Member

Hint for future readers: PTDS = per-thread default stream

Haven't tested but it seems conflicting with CuPy's default stream mechanism?

@jakirkham
Copy link
Member Author

Is it possible to override generally what stream CuPy uses?

@leofang
Copy link
Member

leofang commented Aug 12, 2020

I guess if we can locate all places that default the stream to 0 and set the Stream constructor's default to PTDS, we might have a chance to make it work. The get_current_stream() mechanism we have does not seem to prevent the change from happening. But, I'd hope this to be an opt-in change, which means we will have a headache when packaging? It seems we will need two versions of wheels (without and with PTDS), am I right?

@leofang
Copy link
Member

leofang commented Aug 12, 2020

ref: numba/numba#5137

@kmaehashi
Copy link
Member

kmaehashi commented Aug 12, 2020

But, I'd hope this to be an opt-in change, which means we will have a headache when packaging? It seems we will need two versions of wheels (without and with PTDS), am I right?

Right for thrust and cub, but other kernels are compiled at runtime using NVRTC but it seems it does not support PTDS option.

@leofang
Copy link
Member

leofang commented Aug 12, 2020

Right for thrust and cub

I thought we also need to pass -DCUDA_API_PER_THREAD_DEFAULT_STREAM to other Python modules even when being compiled by the host compiler? Especially for driver and runtime?

but other kernels are compiled at runtime using NVRTC but it seems it does not support PTDS option.

Good point...Would be nice to get a confirmation for this.

@jakirkham
Copy link
Member Author

I asked offline and was told PTDS should work with NVRTC.

https://docs.nvidia.com/cuda/cuda-driver-api/stream-sync-behavior.html#stream-sync-behavior

@jakirkham
Copy link
Member Author

cc @pentschev (for vis)

@kmaehashi kmaehashi added cat:enhancement Improvements to existing features pr-ongoing and removed issue-checked labels Feb 3, 2021
@leofang
Copy link
Member

leofang commented Feb 13, 2021

@jakirkham Can this be closed now since #4322 is merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cat:enhancement Improvements to existing features pr-ongoing
Projects
None yet
Development

No branches or pull requests

3 participants