Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ctest parallel #2367

Merged
merged 4 commits into from
Oct 1, 2021
Merged

ctest parallel #2367

merged 4 commits into from
Oct 1, 2021

Conversation

dschwoerer
Copy link
Contributor

I am not sure it is actually working as expected.

I tried with set(BOUT_TEST_OPTIONS_PROCESSORS 2) and ctest -j 2 and it never the less run 2 tests in parallel.

Might have been a caching issue - not sure ...

Copy link
Member

@ZedThree ZedThree left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think we're also missing PROCESSORS in test-drift-instability*

cmake/BOUT++functions.cmake Outdated Show resolved Hide resolved
cmake/BOUT++functions.cmake Outdated Show resolved Hide resolved
@ZedThree
Copy link
Member

ZedThree commented Jul 5, 2021

I'm seeing similar behaviour, even after adding PROCESSOR_AFFINITY. Changing the number of jobs (ctest -j <jobs>) does change the number of tests that run at the same time, but they seem to only ever use the first four cores on my machine.

This feels like a bug, the documentation explicitly says:

The set of processors chosen will be disjoint from the processors assigned to other concurrently running tests that also have the PROCESSOR_AFFINITY property enabled.

@ZedThree
Copy link
Member

ZedThree commented Jul 5, 2021

Ok, think I understand this a bit better now. Here, "affinity" means a set of logical cores that a process is restricted to running on.

The problem is that each test gets run with the affinity set to the correct set of processors, but this only applies to the parent process -- which is runtest in most cases. mpirun basically ignores the parent affinity and all the MPI processes launched from the individual runtest processes just use the first few cores. As far as I can tell, the parallel job implementation in test_suite suffers from the same problem: mpirun isn't bound to the affinity of its parent process.

It's pretty trivial to get the affinity of the current process: os.sched_getaffinity(0) in boututils.run_wrapper.launch, but then I had to set the cpu-list argument to mpirun, which I'm pretty sure is non-portable:

    procs = ",".join(map(str, os.sched_getaffinity(0)))
    cmd = f"{runcmd} {nproc} -cpu-list {procs} {command}"

I also needed to add --use-hwthread-cpus to my MPIRUN command in order to use the hyperthreads, otherwise mpirun just fails silently if the cpu-list includes any of the non-physical core IDs.

Another complication: if you use a system cmake, then it's almost certainly built against the system libuv, which likely doesn't include the affinity-setting stuff (I don't understand the libuv release cycle, it's in master but not v1.x which appears to be their stable branch). That means that setting the affinity doesn't work at all.

Sorting all of these issues out let me run the tests in parallel and actually use all the cores on my machine, meaning I could run everything in about 4 minutes.

PROCESSORS only takes one argument
We need to pass PROCESSORS as a property and need to enable processor
afinity
@dschwoerer
Copy link
Contributor Author

Some timings (in seconds, on raven, 72 physical cores)

#cores 32 16 1
test_suite 96 120 507
ctest 170 201 682
cmake + make check 588 ? 586
cmake + test_suite 184 ? ?
ctest + -cpu-list 116 133 697

cmake seems to be generally a bit slower, probably due to -O0 rather then -O2 by default.

Running with 144 threads doesn't help a lot:

32/41 Test #38: MMS-diffusion ......................   Passed   31.62 sec
33/41 Test #15: test-initial .......................   Passed   32.47 sec
34/41 Test #30: test-squash ........................   Passed   32.97 sec
35/41 Test #39: MMS-spatial-fci ....................   Passed   35.55 sec
36/41 Test  #4: test-command-args ..................   Passed   37.70 sec
37/41 Test  #7: test-cyclic ........................   Passed   37.74 sec
38/41 Test #17: test-interpolate ...................   Passed   40.87 sec
39/41 Test  #8: test-delp2 .........................   Passed   47.22 sec
40/41 Test #20: test-io ............................   Passed   86.14 sec
41/41 Test #16: test-interchange-instability .......   Passed   94.61 sec

98% tests passed, 1 tests failed out of 41

Total Test time (real) =  94.62 sec

@dschwoerer dschwoerer marked this pull request as ready for review September 21, 2021 16:05
bendudson
bendudson previously approved these changes Sep 30, 2021
Enables these tests to be run in parallel.
Might be safe to allow more processors, but probably
not going to work for arbitrary number (e.g 3)
@bendudson bendudson merged commit 23743b5 into next Oct 1, 2021
@bendudson bendudson deleted the ctest-parallel branch October 1, 2021 03:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants