Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mixing cublas and rustacuda ? #28

Closed
zeroexcuses opened this issue Jan 21, 2019 · 2 comments
Closed

Mixing cublas and rustacuda ? #28

zeroexcuses opened this issue Jan 21, 2019 · 2 comments

Comments

@zeroexcuses
Copy link

Can we please have sample code that

  1. allocates some memory

  2. calls A = B * C

  3. calls some kernel on A

  4. calls sgemm D = E * A

?
I have some tensor code that runs great in CPU mode, but fails in GPU mode (so the algorithm si correct). All CPU vs GPU unit tests pass -- so it seems I am running into a synchronization issue.

I am using stream.synchronize on after all kernel calls -- so it seems the remaining culprit is that kernels on streamA while cublas is on streamB .. and it's not clear to me how to synchronize the two.

@rusch95
Copy link
Contributor

rusch95 commented Jan 21, 2019 via email

@zeroexcuses
Copy link
Author

I think I got it working via the following changes:

  1. I made Stream's 'inner CUstream' pub:
#[derive(Debug)]
pub struct Stream {
    pub inner: CUstream,
}
  1. I initialize the cublas handle by calling
        unsafe {
            cublas::cublasSetStream_v2(gblas_handle.handle, stream.inner
                as *mut cuda_sys::cudart::CUstream_st );
        }

This appears to cause the blas to run on the same stream as the kernels.

However, I'm a bit uneasy as I'm brute force casing a sys::cuda::CUstream* to a sys::cudart::CUstream*

I'm not sure about the difference between the two.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants