-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
experimental wrapper types for cudaEvent_t
that provide a modern C++ interface.
#2017
Conversation
* `cuda::experimental::event_ref` is a non-owning wrapper around a `cudaEvent_t`. * `cuda::experimental::event` is an owning wrapper around a `cudaEvent_t`. * `cuda::experimental::timed_event` is a `cuda::experimental::event` that also records the time at which it was recorded.
marking this as a draft because i haven't yet finished writing tests, but i want input on the design. |
🟨 CI finished in 2h 22m: Pass: 96%/56 | Total: 2h 35m | Avg: 2m 47s | Max: 12m 49s | Hits: 83%/1855
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
+/- | pycuda |
🏃 Runner counts (total jobs: 56)
# | Runner |
---|---|
41 | linux-amd64-cpu16 |
9 | linux-amd64-gpu-v100-latest-1 |
4 | linux-arm64-cpu16 |
2 | windows-amd64-cpu16 |
🟩 CI finished in 11m 09s: Pass: 100%/56 | Total: 2h 34m | Avg: 2m 45s | Max: 11m 09s | Hits: 84%/1913
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
+/- | pycuda |
🏃 Runner counts (total jobs: 56)
# | Runner |
---|---|
41 | linux-amd64-cpu16 |
9 | linux-amd64-gpu-v100-latest-1 |
4 | linux-arm64-cpu16 |
2 | windows-amd64-cpu16 |
…d of microsoconds according to the CUDA docs for `cudaEventElapsedTime`, the elapsed time has sub-microsecond resolution, so it is more appropriate to represent it in nanoseconds.
🟩 CI finished in 10m 32s: Pass: 100%/56 | Total: 2h 34m | Avg: 2m 45s | Max: 10m 32s | Hits: 89%/1913
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
+/- | pycuda |
🏃 Runner counts (total jobs: 56)
# | Runner |
---|---|
41 | linux-amd64-cpu16 |
9 | linux-amd64-gpu-v100-latest-1 |
4 | linux-arm64-cpu16 |
2 | windows-amd64-cpu16 |
🟩 CI finished in 10m 22s: Pass: 100%/56 | Total: 2h 31m | Avg: 2m 42s | Max: 10m 22s | Hits: 96%/1913
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
+/- | pycuda |
🏃 Runner counts (total jobs: 56)
# | Runner |
---|---|
41 | linux-amd64-cpu16 |
9 | linux-amd64-gpu-v100-latest-1 |
4 | linux-arm64-cpu16 |
2 | windows-amd64-cpu16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nitpicks from my side, mostly regarding style
We will need to add current device manipulation and probably add a device taking constructor, but its probably a future PR |
🟩 CI finished in 3h 50m: Pass: 100%/56 | Total: 2h 45m | Avg: 2m 57s | Max: 11m 12s | Hits: 74%/1913
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
+/- | pycuda |
🏃 Runner counts (total jobs: 56)
# | Runner |
---|---|
41 | linux-amd64-cpu16 |
9 | linux-amd64-gpu-v100-latest-1 |
4 | linux-arm64-cpu16 |
2 | windows-amd64-cpu16 |
This comment was marked as spam.
This comment was marked as spam.
@pciolkosz @miscco i prototyped the lazy design we discussed, but i didn't like it. instead i removed default constructibility. in the latest design,
|
I like the changes, I think its a better way of expressing what I tried to achieve with the creation on record I guess the only pattern I can see where its not working as well is:
It would require either an if around |
🟩 CI finished in 13h 25m: Pass: 100%/56 | Total: 4h 50m | Avg: 5m 11s | Max: 12m 33s | Hits: 84%/1913
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
+/- | pycuda |
🏃 Runner counts (total jobs: 56)
# | Runner |
---|---|
41 | linux-amd64-cpu16 |
9 | linux-amd64-gpu-v100-latest-1 |
4 | linux-arm64-cpu16 |
2 | windows-amd64-cpu16 |
🟨 CI finished in 3h 55m: Pass: 82%/56 | Total: 2h 37m | Avg: 2m 48s | Max: 12m 10s | Hits: 83%/1575
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
+/- | pycuda |
🏃 Runner counts (total jobs: 56)
# | Runner |
---|---|
41 | linux-amd64-cpu16 |
9 | linux-amd64-gpu-v100-latest-1 |
4 | linux-arm64-cpu16 |
2 | windows-amd64-cpu16 |
TEST_CASE("can construct an event_ref from a cudaEvent_t", "[event]") | ||
{ | ||
::cudaEvent_t ev; | ||
CUDAX_REQUIRE(::cudaEventCreate(&ev) == ::cudaSuccess); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: There is a CUDART macro in testing_common (I need to move that header) that check if a runtime calls was successful
CUDAX_REQUIRE(ref.get() == ev); | ||
CUDAX_REQUIRE(!!ref); | ||
// test implicit converstion from cudaEvent_t: | ||
cudax::event_ref ref2 = ::test::fn_takes_event_ref(ev); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: I don't know if its better, but when I have a pattern of thing, thing2, thing3 in sections starting with a comment, I started to put them into catch2 SECTION("comment") { thing...} instead. This way the comment ends up being displayed on error and there is no need to number things since these are different scopes
🟨 CI finished in 3h 01m: Pass: 96%/56 | Total: 2h 42m | Avg: 2m 53s | Max: 15m 40s | Hits: 85%/1855
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
+/- | pycuda |
🏃 Runner counts (total jobs: 56)
# | Runner |
---|---|
41 | linux-amd64-cpu16 |
9 | linux-amd64-gpu-v100-latest-1 |
4 | linux-arm64-cpu16 |
2 | windows-amd64-cpu16 |
🟨 CI finished in 5h 00m: Pass: 99%/421 | Total: 5d 21h | Avg: 20m 13s | Max: 1h 05m | Hits: 75%/521796
|
Project | |
---|---|
+/- | CCCL Infrastructure |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | pycuda |
🏃 Runner counts (total jobs: 421)
# | Runner |
---|---|
305 | linux-amd64-cpu16 |
65 | linux-amd64-gpu-v100-latest-1 |
28 | linux-arm64-cpu16 |
23 | windows-amd64-cpu16 |
🟨 CI finished in 6h 59m: Pass: 94%/56 | Total: 2h 33m | Avg: 2m 44s | Max: 11m 36s | Hits: 95%/1808
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
+/- | pycuda |
🏃 Runner counts (total jobs: 56)
# | Runner |
---|---|
41 | linux-amd64-cpu16 |
9 | linux-amd64-gpu-v100-latest-1 |
4 | linux-arm64-cpu16 |
2 | windows-amd64-cpu16 |
🟩 CI finished in 9h 53m: Pass: 100%/56 | Total: 2h 32m | Avg: 2m 43s | Max: 11m 36s | Hits: 95%/1913
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
+/- | pycuda |
🏃 Runner counts (total jobs: 56)
# | Runner |
---|---|
41 | linux-amd64-cpu16 |
9 | linux-amd64-gpu-v100-latest-1 |
4 | linux-arm64-cpu16 |
2 | windows-amd64-cpu16 |
…+ interface. (NVIDIA#2017) * Wrapper types for `cudaEvent_t` that provide a modern C++ interface. * `cuda::experimental::event_ref` is a non-owning wrapper around a `cudaEvent_t`. * `cuda::experimental::event` is an owning wrapper around a `cudaEvent_t`. * `cuda::experimental::timed_event` is a `cuda::experimental::event` that also records the time at which it was recorded. * apparently `__event` is a word of power for msvc * represent the elapsed time between two events with nanoseconds instead of microsoconds according to the CUDA docs for `cudaEventElapsedTime`, the elapsed time has sub-microsecond resolution, so it is more appropriate to represent it in nanoseconds. * prune unused headers, switch to rst-friendly doxygen comment style * add class synopsis comments * construct with a stream_ref and record the event on construction * review feedback * tests for `cudax::event` and `cudax::timed_event` * change `event_ref::wait` to use `cudaEventSynchronize` * Use a struct for windows instead * Do not include superfluous config header * Add clang-format rule for cudax * Spell `cudax_add_catch2_test` correctly * Fix formatting issues --------- Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com>
…+ interface. (NVIDIA#2017) * Wrapper types for `cudaEvent_t` that provide a modern C++ interface. * `cuda::experimental::event_ref` is a non-owning wrapper around a `cudaEvent_t`. * `cuda::experimental::event` is an owning wrapper around a `cudaEvent_t`. * `cuda::experimental::timed_event` is a `cuda::experimental::event` that also records the time at which it was recorded. * apparently `__event` is a word of power for msvc * represent the elapsed time between two events with nanoseconds instead of microsoconds according to the CUDA docs for `cudaEventElapsedTime`, the elapsed time has sub-microsecond resolution, so it is more appropriate to represent it in nanoseconds. * prune unused headers, switch to rst-friendly doxygen comment style * add class synopsis comments * construct with a stream_ref and record the event on construction * review feedback * tests for `cudax::event` and `cudax::timed_event` * change `event_ref::wait` to use `cudaEventSynchronize` * Use a struct for windows instead * Do not include superfluous config header * Add clang-format rule for cudax * Spell `cudax_add_catch2_test` correctly * Fix formatting issues --------- Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com>
This commit adds some simple, modern C++ wrappers for
cudaEvent_t
, namely:cuda::experimental::event_ref
is a non-owning wrapper around acudaEvent_t
.cuda::experimental::event
is an owning wrapper around acudaEvent_t
.cuda::experimental::timed_event
is acuda::experimental::event
that also records the time at which it was recorded.It also adds a constructor tag
cuda::experimental::uninit
that can be used to defer initialization of the object.