Enabling runtime choice for mallocAsync or sync version. #174

Narsil · 2023-08-10T13:57:30Z

Basically cudarc fails on Windows even on recent cards, but also on older cards.

The core reason is that cuMemAllocAsync is not supported on those devices/OS.
This PR enables it by choosing between sync and async at runtime in the intended way by CUDA https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY__POOLS.html (Check supported platform).

Alternatives:

I would have preferred to use a compile time flag for this, since it seems always preferable to use async where available and sync elsewhere.

Previous work along that line.
#106

I think we could have had a feature enabled in build.rs (rust-lang/cargo#5499) for that particular thing.
However I was unable to find an easy way to detect that capacity using nvidia-smi (or any other expected CLI).
Meaning in order to do that at compile time I think we would have to compile first cudaDeviceGetAttribute small program in order to be able to call it within build.rs. This seems way overengineered and therefore I didn't do that.

I haven't measure performance with the if everywhere.

Another variant of this PR would be to keep track of wether the Slice was initialized with Async or not, to remove the if within Drop. (We could also store it directly on CudaDevice to remote the if within alloc.)
I deemed this as a performance feature and therefore didn't do it in this PR.

Narsil · 2023-08-10T14:07:04Z

@coreylowman

coreylowman · 2023-08-10T15:03:08Z

This looks fine to me, I like the runtime detection better than adding an additional compiler flag TBH.

@Narsil it looks like this deletes some examples, can you restore those? They are probably juts from out-dated fork

src/driver/safe/alloc.rs

src/driver/safe/core.rs

Narsil · 2023-08-10T15:54:28Z

@coreylowman The example is actually because this examples always fails to build. I started just kicking it out whenever I'm using tests.

I'll restore for this PR, but would be nice to have a real fix.
Same goes for the 2 multi device tests they are failing on single GPU machines which does not feel super great :)

src/driver/safe/core.rs

coreylowman

lgtm

coreylowman · 2023-08-10T16:41:49Z

@Narsil new version up on crates.io

Narsil added 2 commits August 10, 2023 15:36

Tmp.

afa7f44

Runtime flag.

c548f79

Narsil added 4 commits August 10, 2023 16:10

Fmt + clippy

9652b43

Other alloc location.

fbe66a6

Slight factorization.

812e80c

htod too.

6f85686

All async ops ?

6f845a0

coreylowman reviewed Aug 10, 2023

View reviewed changes

src/driver/safe/alloc.rs Outdated Show resolved Hide resolved

coreylowman reviewed Aug 10, 2023

View reviewed changes

src/driver/safe/core.rs Outdated Show resolved Hide resolved

Narsil added 2 commits August 10, 2023 17:57

Intern CudaDevice attribute for async check

0563f65

Update safety.

97818e1

Narsil commented Aug 10, 2023

View reviewed changes

src/driver/safe/core.rs Outdated Show resolved Hide resolved

Update src/driver/safe/core.rs

a6fee0b

coreylowman approved these changes Aug 10, 2023

View reviewed changes

coreylowman merged commit 7fd4ba7 into coreylowman:main Aug 10, 2023
3 checks passed

coreylowman mentioned this pull request Aug 10, 2023

Adding opt-out sync alloc for older cards. #106

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enabling runtime choice for mallocAsync or sync version. #174

Enabling runtime choice for mallocAsync or sync version. #174

Narsil commented Aug 10, 2023 •

edited

Loading

Narsil commented Aug 10, 2023

coreylowman commented Aug 10, 2023

Narsil commented Aug 10, 2023

coreylowman left a comment

coreylowman commented Aug 10, 2023

Enabling runtime choice for mallocAsync or sync version. #174

Enabling runtime choice for mallocAsync or sync version. #174

Conversation

Narsil commented Aug 10, 2023 • edited Loading

Narsil commented Aug 10, 2023

coreylowman commented Aug 10, 2023

Narsil commented Aug 10, 2023

coreylowman left a comment

Choose a reason for hiding this comment

coreylowman commented Aug 10, 2023

Narsil commented Aug 10, 2023 •

edited

Loading