Use checked allocators in CUB catch2 tests #1271

alliepiper · 2024-01-10T20:34:06Z

This adds new header-only utilities to the c2h testing library:

c2h::checked_cuda_allocator<T>: New allocator that checks free device memory before calling cudaMalloc and throws bad_alloc if (alloc_bytes + 16MiB) > free_bytes. This avoids issues on Tegra and Windows where over-allocating device memory causes slowdowns or even system hangs.
c2h::checked_host_allocator<T>: New host allocator using new/delete, but checks available device memory prior to allocating memory on systems with integrated host/device memory.
c2h::device_policy: Thrust execution policy that uses c2h::checked_cuda_allocator<char> for temporary storage allocations.
c2h::device_vector<T>: Device vector that uses c2h::checked_cuda_allocator<T>.
c2h::host_vector<T>: Host vector that uses c2h::checked_host_allocator<T>.

Description

Checklist

New or existing tests cover these changes.

These replace the device vector allocator with a custom version that checks the amount of free device memory before calling cudaMalloc. Ref issue NVIDIA#1212.

Several ADL functions for `thrust::detail::vector_base` were defined in the `thrust::` namespace, but should be in `thrust::detail`, otherwise custom aliases / subclasses of `vector_base` outside of the `thrust::` namespace will not find them. The `thrust::host_vector` and `thrust::device_vector` classes would find them by happenstance from pulling `thrust::` namespace functions into the ADL overload set. This commit moves these `vector_base` ADL functions (operator==, operator!=, swap) into the appropriate `thrust::detail::` namespace so they can be found reliably.

Also removed benchmarking code rather than porting, since benchmarks are now handled separately from tests.

The function-scope static approach resulted in cudaErrorSymbolNotFound.

This WARs another batch of cudaErrorSymbolNotFound.

cub/test/test_device_batch_memcpy.cu

cub/test/test_device_batch_copy.cu

cub/test/catch2_test_warp_exchange_smem.cu

cub/test/catch2_test_device_radix_sort_custom.cu

gevtushenko

I've tried running batched memcpy test on Jetson Orin Nano Developer Kit (6.3 GB) and some test cases were skipped, which indicates that the new logic works. But then, the test went 800MB into swap, took another 20 minutes and then the application was killed by OS. This is a huge progress, compared to the initial hang, and the code looks much better than what I initially suggested. I'd like to try and get the test passing till the end, though. My intuition is that it'd take checking available device memory when allocating host vectors (at least on Tegra). Since it's a small change in code, I think it might make sense to try it as part of this PR, but if you'd like to experiment in a follow-up PR, please, file an issue.

cub/test/catch2_test_device_for_api.cu

alliepiper · 2024-02-13T17:22:15Z

Interesting, I was unable to repro hangs from host allocs on orin. I'll add that check for integrated systems and we can retry.

alliepiper · 2024-02-13T21:34:12Z

@gevtushenko I added a device mem check for host allocations on integrated systems and cleaned up those API examples. Can you test this on your small Orin board again?

gevtushenko

The batched memcpy and copy executed till completion now

alliepiper requested review from a team as code owners January 10, 2024 20:34

alliepiper requested review from elstehle and wmaxey January 10, 2024 20:34

alliepiper marked this pull request as draft January 10, 2024 20:34

alliepiper removed request for elstehle and wmaxey January 10, 2024 20:34

alliepiper force-pushed the c2h_checked_allocator branch 3 times, most recently from 35c1c73 to a6ba0ba Compare January 12, 2024 19:29

alliepiper force-pushed the c2h_checked_allocator branch from 2036a23 to 393a950 Compare January 24, 2024 17:48

alliepiper added 12 commits January 25, 2024 07:13

Replace thrust vectors with c2h::*vector wrappers.

6c951b7

These replace the device vector allocator with a custom version that checks the amount of free device memory before calling cudaMalloc. Ref issue NVIDIA#1212.

Add missing includes to test utility header.

6b07e6d

Refactor test_device_batch_copy to use c2h vectors

d3dac43

Improve diagnostic when batch copy runs out of memory.

480b76f

Convert final std::vector usage in batch copy test to c2h

032623f

Port batch memcpy test to use c2h vectors

f2fab04

Also removed benchmarking code rather than porting, since benchmarks are now handled separately from tests.

Add a checked-allocator exec policy for CUB tests.

ced3cdc

Use new policy in batch memcpy test.

c719d1a

Use c2h::device_policy in catch2 tests.

3102e94

WAR MSVC/nvcc bug by changing try-block syntax.

38f5b09

WAR spurious failure when CUB_SEPARATE_CATCH2=OFF.

df9f779

alliepiper force-pushed the c2h_checked_allocator branch from 393a950 to 3bbd937 Compare January 26, 2024 16:54

alliepiper added 2 commits January 26, 2024 17:19

Split checked allocator utilities into distinct headers.

726926c

Add some padding to the free-memory checks.

bc0c868

alliepiper force-pushed the c2h_checked_allocator branch from 3bbd937 to bc0c868 Compare January 26, 2024 17:20

alliepiper added 3 commits January 26, 2024 18:30

Port recent changes to use checked allocators.

8276a6b

Make c2h::device_policy a global static.

5bb2fdc

The function-scope static approach resulted in cudaErrorSymbolNotFound.

Don't use an alias for checked_cuda_allocator.

d8e712d

This WARs another batch of cudaErrorSymbolNotFound.

alliepiper added 2 commits January 28, 2024 20:57

Update copyright dates on new files.

f2b1f95

Add tests for checked allocator utilities.

e22ab8e

alliepiper changed the title ~~EXPERIMENTAL: Replace thrust vectors with c2h::*vector wrappers in new CUB tests.~~ Replace thrust vectors with c2h::*vector wrappers in new CUB tests. Jan 29, 2024

alliepiper marked this pull request as ready for review January 29, 2024 17:52

alliepiper requested a review from a team as a code owner January 29, 2024 17:52

alliepiper requested review from miscco, elstehle and gevtushenko January 29, 2024 17:52

alliepiper changed the title ~~Replace thrust vectors with c2h::*vector wrappers in new CUB tests.~~ Use checked allocators in CUB catch2 tests Jan 29, 2024

gevtushenko reviewed Jan 31, 2024

View reviewed changes

alliepiper added 5 commits February 2, 2024 15:21

Remove unneeded stream.

7e4158d

Use thrust vectors in tests that are used for doc examples,

8f3cf24

Remove unneeded stream.

717607c

Remove outdated warning about test header order.

c08a803

Update test overview docs with info on new c2h utils.

5472f84

alliepiper requested a review from gevtushenko February 2, 2024 18:43

gevtushenko approved these changes Feb 13, 2024

View reviewed changes

cub/test/catch2_test_device_for_api.cu Outdated Show resolved Hide resolved

alliepiper added 3 commits February 13, 2024 20:18

Revert to thrust vectors in API examples.

44d428f

Make thrust::mr::new_delete_resource customizable.

6f68422

Check free device mem before host allocs on integ systems.

d739447

Add missing header.

4fd323a

gevtushenko approved these changes Feb 14, 2024

View reviewed changes

alliepiper added 2 commits February 14, 2024 14:50

Merge remote-tracking branch 'origin/main' into c2h_checked_allocator

1bd6c4a

Update merge sort tests to use checked allocators.

0162cba

alliepiper merged commit 2fd3b8c into NVIDIA:main Feb 15, 2024
538 checks passed

alliepiper deleted the c2h_checked_allocator branch February 15, 2024 00:58

gevtushenko mentioned this pull request Feb 21, 2024

Fix batch copy / memcpy tests on Orin #529

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use checked allocators in CUB catch2 tests #1271

Use checked allocators in CUB catch2 tests #1271

alliepiper commented Jan 10, 2024 •

edited

Loading

gevtushenko left a comment

alliepiper commented Feb 13, 2024

alliepiper commented Feb 13, 2024

gevtushenko left a comment

Use checked allocators in CUB catch2 tests #1271

Use checked allocators in CUB catch2 tests #1271

Conversation

alliepiper commented Jan 10, 2024 • edited Loading

Description

Checklist

gevtushenko left a comment

Choose a reason for hiding this comment

alliepiper commented Feb 13, 2024

alliepiper commented Feb 13, 2024

gevtushenko left a comment

Choose a reason for hiding this comment

alliepiper commented Jan 10, 2024 •

edited

Loading