-
-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi stream support #6
Comments
A possible approach to this is to use type states on CudaRc: struct OffStream;
struct OnStream<const I: usize>;
struct CudaRc<T, State> { ... }
imp<T> CudaRc<T, OffStream> {
fn on_stream<const I: usize>(self) -> CudaRc<T, OnStream<I>> { ... }
}
impl<T, const I: usize> CudaRc<T, OnStream<I>> {
fn sync(self) -> CudaRc<T, OffStream> { ... }
} Then to ensure data can only be used on its current stream or moved onto a stream, IntoKernelParam and LaunchCudaFunction should have a stream const associated with them: trait LaunchCudaFunction<const I: usize> { ... }
trait IntoKernelParam<const I: usize> { ... }
impl<const I: usize> IntoKernelParam<I> for CudaRc<T, OffStream> { ... }
impl<const I: usize> IntoKernelParam<I> for CudaRc<T, OnStream<I>> { ... } As far as which stream to actually use when launching, it could be |
Interesting details here in the streams & freeing memory section
https://zdevito.github.io/2022/08/04/cuda-caching-allocator.html |
Another thing to think about: slices of tensors on different streams. e.g. if i have a batch of data, each item in the batch could be computed on a different stream |
why would that be advantagous? wouldn’t this cause some synchronization issues, as one would have to synchronize the device and not only a stream? |
wouldn’t this also require CudaSlices to be on the same device (in general)? so some generic const is required anyway with the ordinal |
Ooooh that is a great call out. I think that even could apply to CudaSlice already? Imagine I create a slice with device 0 and then try to use that slice on a different device. I have no idea if that is even valid. Will open a separate issue for that! |
I don't think so, but there might be some virtual memory mapping that nvidia does 😅 |
Also, I can't test this, maybe someone with 2 CUDA-GPU can test this? When adding these const generics, we could probably add const generics for streams too (basically the same ig) |
Okay a different direction for this: don't use type states as they complicated things a bit too much, and are probably hard to get right. Instead:
|
Currently CudaDevice only supports a single stream. Look into how multiple should be supported
The text was updated successfully, but these errors were encountered: