Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Overhaul CUB test suite. #403

Merged
merged 2 commits into from
Dec 15, 2021
Merged

Conversation

alliepiper
Copy link
Collaborator

@alliepiper alliepiper commented Nov 9, 2021

Missing test cases

Issue NVIDIA/cccl#881 reported that we were missing several test cases in CUB
that were ifdef'd out. This patch enables most of those tests, though
CDP tests are not added here.

Some other deficiencies were addressed as they were noticed, for instance,
adding value_types other than unsigned char to test_block_histogram.

After these changes, the test suite build took >12 hours to build...prompting the rest of the changes in this PR, which lowered the serial build time to about 1h20m.

Removed MINIMAL and BENCHMARK variants

The way we split up tests into "BENCHMARK", "MINIMAL", and "THOROUGH"
variants wasn't well suited for regression testing, as a lot of
redundant code paths were generated between the various test
executables. These have been removed, leaving only the "THOROUGH" tests,
which should capture all test cases.

Benchmarks should go into the new thrust_benchmark project.

New %PARAM% mechanism

This patch adds a new mechanism that allows a test to include a comment such as:

// %PARAM% TEST_FOO foo 0:1:2
// %PARAM% TEST_BAR bar 4:8

CMake will parse these out and generate multiple test executables for
each combination of parameters, e.g:

cub.test.baz.foo_0.bar_4 -DTEST_FOO=0 -DTEST_BAR=4
cub.test.baz.foo_0.bar_8 -DTEST_FOO=0 -DTEST_BAR=8
cub.test.baz.foo_1.bar_4 -DTEST_FOO=1 -DTEST_BAR=4
cub.test.baz.foo_1.bar_8 -DTEST_FOO=1 -DTEST_BAR=8
cub.test.baz.foo_2.bar_4 -DTEST_FOO=2 -DTEST_BAR=4
cub.test.baz.foo_2.bar_8 -DTEST_FOO=2 -DTEST_BAR=8

This can be used to quickly split up problematically large tests. See
the new cub/test/README.md file for more details. The
PrintNinjaBuildTimes.cmake file from Thrust was used to identify
tests that needed to be split.

Remove non-CUB test code

Several tests were testing Thrust APIs. This isn't necessary, as Thrust
has it's own test suite. These tests have been removed.

Removed g_repeat options

This isn't needed for regression testing and has been removed. Some
of the other command line options could also be removed now that
benchmarking isn't handled by these regression tests, but this is a start.

@alliepiper alliepiper added P0: must have Absolutely necessary. Critical issue, major blocker, etc. only: gpuci Changes to gpuCI only. Doesn't need internal NVIDIA CI. labels Nov 9, 2021
@alliepiper alliepiper added this to the 1.16.0 milestone Nov 9, 2021
@alliepiper alliepiper self-assigned this Nov 9, 2021
@alliepiper alliepiper marked this pull request as draft November 9, 2021 21:37
alliepiper added a commit to alliepiper/thrust that referenced this pull request Nov 9, 2021
@alliepiper alliepiper added the testing: gpuCI in progress Started gpuCI testing. label Nov 9, 2021
@alliepiper
Copy link
Collaborator Author

gpuCI: NVIDIA/thrust#1563

@alliepiper
Copy link
Collaborator Author

DVS CL: 30772865
gpuCI: NVIDIA/thrust#1564

@alliepiper alliepiper added testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS). testing: gpuCI passed Passed gpuCI testing. and removed testing: gpuCI in progress Started gpuCI testing. testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS). labels Dec 13, 2021
@alliepiper alliepiper added the testing: internal ci passed Passed internal NVIDIA CI (DVS). label Dec 14, 2021
@alliepiper alliepiper marked this pull request as ready for review December 14, 2021 18:21
Copy link
Collaborator

@gevtushenko gevtushenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work! I've measured 18% build time speedup (MT) and about 2x runtime speedup.

test/CMakeLists.txt Show resolved Hide resolved
Issue #399 reported that we were missing several test cases in CUB
that were ifdef'd out. This patch enables most of those tests, though
CDP tests are not added here.

Some other deficiencies were addressed as they were noticed, for instance,
adding value_types other than unsigned char to test_block_histogram.

The way we split up tests into "BENCHMARK", "MINIMAL", and "THOROUGH"
variants wasn't well suited for regression testing, as a lot of
redundant code paths were generated between the various test
executables. These have been removed, leaving only the "THOROUGH" tests,
which should capture all test cases.

Benchmarks should go into the new `thrust_benchmark` project.

Some tests also took an excessively long time to build, especially after
enabling the missing test cases from #399. This patch adds a new
mechanism that allows a test to include a comment such as:

```
// %PARAM% TEST_FOO foo 0:1:2
// %PARAM% TEST_BAR bar 4:8
```

CMake will parse these out, and generate multiple test executables for
each combination of parameters, e.g:

```
cub.test.baz.foo_0.bar_4 -DTEST_FOO=0 -DTEST_BAR=4
cub.test.baz.foo_0.bar_8 -DTEST_FOO=0 -DTEST_BAR=8
cub.test.baz.foo_1.bar_4 -DTEST_FOO=1 -DTEST_BAR=4
cub.test.baz.foo_1.bar_8 -DTEST_FOO=1 -DTEST_BAR=8
cub.test.baz.foo_2.bar_4 -DTEST_FOO=2 -DTEST_BAR=4
cub.test.baz.foo_2.bar_8 -DTEST_FOO=2 -DTEST_BAR=8
```

This can be used to quickly split up problematically large tests. See
the note at the top of cub/test/CMakeLists.txt for more details.

The PrintNinjaBuildTimes.cmake file from Thrust was used to identify
tests that needed to be split.

Several tests were testing Thrust APIs. This isn't necessary, as Thrust
has it's own test suite. These tests have been removed.

This isn't needed for regression testing and has been removed. Some
of the other command line options could also be removed now that
benchmarking isn't handled by these regression tests, but this is a start.

Extended testing revealed that the cub::BlockHistogram algorithm's
behavior is undefined when input values are outside of [0, BINS). Added
this info to the algorithm docs.

* test_device_histogram from 15m -> 35s.
* test_device_run_length_encode from 7m -> 3s.
* test_device_scan tests from <3m  -> <4s.

# Conflicts:
#	test/test_device_radix_sort.cu

# Conflicts:
#	test/test_device_radix_sort.cu
@alliepiper
Copy link
Collaborator Author

alliepiper commented Dec 15, 2021

DVS CL: 30781720

gpuCI: NVIDIA/thrust#1564

@alliepiper alliepiper added testing: gpuCI in progress Started gpuCI testing. testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS). and removed testing: gpuCI passed Passed gpuCI testing. testing: internal ci passed Passed internal NVIDIA CI (DVS). labels Dec 15, 2021
@alliepiper alliepiper merged commit 78d5579 into NVIDIA:main Dec 15, 2021
@alliepiper alliepiper deleted the gh.399/test_overhaul branch December 15, 2021 22:18
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
only: gpuci Changes to gpuCI only. Doesn't need internal NVIDIA CI. P0: must have Absolutely necessary. Critical issue, major blocker, etc. testing: gpuCI in progress Started gpuCI testing. testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS).
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants