Skip to content
This repository was archived by the owner on Mar 21, 2024. It is now read-only.
This repository was archived by the owner on Mar 21, 2024. It is now read-only.

OpenMP and TBB tests should run serially #1168

@alliepiper

Description

@alliepiper

Issue

Running the OpenMP (GCC 7.5) tests in parallel using the ctest -jXX option is much, much slower than running them one at a time. This OMP thread scheduling implementation doesn't seem to take other processes into account.

CTest Parallelism (cpp.omp.cpp14) CPU Time (s) Walltime (mm:ss)
-j1 597 1:16
-j2 33574 46:42

Fix incoming

I've set https://cmake.org/cmake/help/v3.10/prop_test/RUN_SERIAL.html
on the OMP tests in my changes for #1159.

Other backends

TBB

TBB has some scaling issues, but doesn't fall off a cliff at -j2. On a 6-core x 2-SMT CPU, TBB scales well for a small number of processes:

CTest Parallelism (cpp.omp.cpp14) Walltime (s)
-j1 81
-j2 55
-j4 45
-j6 50
-j8 60
-j12 68

Since CMake doesn't offer parallelism properties with finer control than RUN_SERIAL and all of the parallel configs are faster than -j1, we should just leave this as-is. These tests will continue to run at the requested parallelism.

After discussion with @griwes and @brycelelbach, TBB tests should also be marked RUN_SERIAL. The increased runtime is worth ensuring that the individual test processes will run at full threaded parallelism.

CUDA

CUDA scales very favorably with more CPUs, at least in the range I can test. On the same CPU as above while running tests on both GV100 and GP100:

CTest Parallelism (cpp.cuda.cpp14) Walltime (s)
-j1 (a "Very Long Time")
-j6 208
-j8 199
-j10 197
-j12 176

These tests will continue to run at the requested parallelism.

Metadata

Metadata

Assignees

Labels

only: cmakeCMake changes only. Doesn't need internal NVIDIA CI.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions