Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Providing an option to use a non-default stream #209

Closed
michaeleisel opened this issue Mar 8, 2024 · 4 comments · Fixed by #224
Closed

Providing an option to use a non-default stream #209

michaeleisel opened this issue Mar 8, 2024 · 4 comments · Fixed by #224

Comments

@michaeleisel
Copy link

I'm trying to improve the performance of my candle models via CUDA streams, which I've benchmarked to be helpful. However, I want to be sure that this is safe to do, at least with the restrictions I've imposed on myself (cudarc Device and candle Tensor objects may only be accessed from the thread in which they were created, and candle Tensors may only perform operations with other Tensors of the same Device instance). The problem is that cudarc always uses the default legacy stream. I'd love to instead have an option to use the old behavior of creating a new stream for each new device, if it'd be safe to do so.

@coreylowman
Copy link
Owner

This should already be possible via CudaStream. Though you have to manage the additional stream yourself instead of cudarc managing it for you.

@michaeleisel
Copy link
Author

If I want to run, say, CudaDevice::memset_zeros(), I don't see any way to do this without using the default legacy stream to do it, because I don't see any options to customize the stream.

@coreylowman
Copy link
Owner

Ah gotcha, you're right. Would you want an option on creation of the CudaDevice to use a non-null stream?

@michaeleisel
Copy link
Author

I think that could be an interesting way to do it, yeah. My one concern would be if there are any APIs, like candle, that assume that if two devices have the same ordinal, then their operations are on the same stream. But, maybe that would be considered overreliance on implementation details. For candle, it looks like they use a stricter form of checking with a generated unique ID rather than the ordinal, so there wouldn't be any problems with that approach for that library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants