Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CMake: add_test(): allow concurrent runs of tests with shared targets #14713

Merged
merged 6 commits into from Jan 26, 2023

Conversation

tamiko
Copy link
Member

@tamiko tamiko commented Jan 24, 2023

In order to make this possible we define an additional test (and test target) that depends on the executable target and ensures that the target is in place before it is called concurrently.

I have also remove the interrupt_guard.cc logic. I think this worked around a bug with CMake targets that an interrupt would nevertheless create a partial output file and then a subsequent diff failed. In any case, I was not able to trigger this issue any more so I have removed the logic.

Fixes #14648

@tamiko
Copy link
Member Author

tamiko commented Jan 24, 2023

@blaisb Would you mind to give this pull request as try?

@tamiko
Copy link
Member Author

tamiko commented Jan 24, 2023

All: I want to run the entire testsuite against this pull request to see whether the increased number of targets (for parallel tests) affects the running speed of the entire testsuite too much.

Edit: I see no change in execution speed for the testsuite - actually it seems to be minimally faster with this patch.

I have also removed the MPI oversubscription - this is not needed any more. Let's see how this affects the testsuite.

Edit: Even faster. Not overcommiting improves performance slightly.

@blaisb
Copy link
Member

blaisb commented Jan 25, 2023

@tamiko I tested the PR on the Lethe tests. Everything works perfectly fine. It's even faster than before (slightly).
Thank you so much. I owe you a beer next time we meet.

Comment on lines +415 to +442
#
# Determine whether the test shares a common executable target. This
# involves tests with .threads=N. and .mpirun=N. annotation, as well
# as tests with parameter files (that might share a common executable
# target).
#
# In this case we have to make sure that concurrently invoking the
# test does not accidentally trigger a concurrent build of the
# executable target. We ensure this by declaring an additional test
# that only builds the shared target / ensures the shared target is
# present. All run tests then requires this test target as a "setup
# fixture", see
# https://cmake.org/cmake/help/latest/prop_test/FIXTURES_REQUIRED.html#prop_test:FIXTURES_REQUIRED
#
set(_shared_target FALSE)
if(NOT "${_n_cpu}${_n_threads}" STREQUAL "00" OR "${_source_file}" MATCHES "(prm|json)$")
set(_shared_target TRUE)

#
# Build system-internal target name and final test name for the
# "executable" test. We have to make sure that the target and test
# names stay the same independent of test name and test category,
# thus the rather funny name:
#
set(_test_executable_target "test_dependency.${_target}.executable")
set(_test_executable_full "test_dependency/${_target}.executable")
endif()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this complexity? Why don't you just break any test into the executable and the execution?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bangerth Because make cannot handle 20k more top level targets. Otherwise I wouldn't add this complexity.

Copy link
Member Author

@tamiko tamiko Jan 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more context: make does not scale in the number of top-level targets, it is at least superlinear. (At least not in the way how CMake generates them.) For reference invoking make on any of the top level targets in some of the larger test subdirectories takes about 2-4 seconds. Multiply this by the number of tests and we get a lot of wasted CPU hours on our testers.

In addition we are at the limit of what CDash can handle and process. I think when we hit the 20k test mark we will have to think about another long term solution. So increasing the test artifacts to 30k immediately is an issue...

One possible solution to the make problem would be to move the run and comparison parts entirely into shell scripts. Then, only building the target would be handled in the build system and we could do the above split unconditionally.

The downside of this is that tests would be rerun unconditionally.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Would you add a comment that explains the issue with the number of targets in two sentences to the comment at the top of what I marked up?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me address this in a separate PR.

Comment on lines +439 to +440
set(_test_executable_target "test_dependency.${_target}.executable")
set(_test_executable_full "test_dependency/${_target}.executable")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's hard to keep apart, with the difference only being whether you use a dot or a slash. Could we indicate the difference in a better way?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is directly adapted from what is already used as a naming convention in this file:

  380       set(_test_target    ${_category}.${_test_name}) # diff target name         
  381       set(_test_full      ${_category}/${_test_name}) # full test name           
  382       set(_test_directory ${CMAKE_CURRENT_BINARY_DIR}/${_test_name}.${_build_lowercase}) # directory to run the test in

I can replace _full by _full_name in the variable name if you want to. (_test_full_test_name sounds a bit silly.)

But in general I need a name for a top level target in the build system (that does not contain a slash /) and I need a name for the test in the form category/test.

This is if we want to keep the category/test naming convention. I am happy to accomodate whatever but I suggest we make this part of a separate discussion and a separate pull request.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we at least mark up the name of the executable as such then? Maybe exe.${_category}.${_test_name} or some such?

Comment on lines +502 to +505
set_tests_properties(${_test_executable_full} PROPERTIES
LABEL "test_dependency"
TIMEOUT ${TEST_TIME_LIMIT}
FIXTURES_SETUP ${_test_executable_full}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this right? This suggests that the test is its own setup?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct. A "FIXTURE" is an arbitrary name for declaring ordering requirements. The important bit is to FIXTURE_SETUP and to FIXTURE_REQUIRES the same name.

Comment on lines -546 to -556
# Limit concurrency of mpi tests. We can only set concurrency for
# the entire test, which includes the compiling and linking stages
# that are purely sequential. There is no good way to model this
# without unnecessarily restricting concurrency. Consequently, we
# just choose to model an "average" concurrency as one half of the
# number of MPI jobs.
#
if(_n_cpu GREATER 2)
math(EXPR _slots "${_n_cpu} / 2")
set_tests_properties(${_test_full} PROPERTIES PROCESSORS ${_slots})
endif()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because, previously, an MPI test consisted of first compiling (serial) and then running a test (parallel). The issue now that we had to limit resources so that we do not overload the testing machine, but at the same time had to be a bit aggressive so that the serial compilation didn't block off too many execution slots of ctest -jX.

But now we make sure that the target is compiled in a separate test artefact. So we can simply limit with the resources that the test will actually need.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but don't you still want the execution of the test to be marked up with a specific number of processors, even though the compilation of the test requires only one processor?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the else() statement does! 😃

cmake/macros/macro_deal_ii_add_test.cmake Outdated Show resolved Hide resolved
@tamiko tamiko added this to the Release 9.5 milestone Jan 26, 2023
@tamiko
Copy link
Member Author

tamiko commented Jan 26, 2023

@bangerth What about I simply split this pull request up in three separate changesets and we restart the review? I know it's CMake, I know the syntax is annoying and it's hard to read.

@bangerth
Copy link
Member

I don't want to stand in the way either, though. If you're confident that it works, I'm ok with just merging. My questions really are minor.

Copy link
Member

@drwells drwells left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bruno and our CI are happy so I approve!

@bangerth bangerth merged commit 5f5bdd0 into dealii:master Jan 26, 2023
@bangerth
Copy link
Member

Let's do it then!

@tamiko tamiko deleted the parallelize_tests branch July 7, 2023 00:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tests stop running in parallel after a certain number of tests have been carried out
4 participants