-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel tests #2000
Parallel tests #2000
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parallel builds work for me on Marconi. Definite improvement 👍
It seems as though the tests still run consecutively though, and it looks like that is not the intention (#cores:8
, etc.)? If I do make -j 32 check-integrated-tests
, the sum of the times for running the individual tests is equal to the total run time. I haven't tried to dig into why...
When I run make -j 32 check
the unit tests, integrated tests and MMS tests all start building at the same time, and their output overlaps. That's what always used to happen - just wondering if the jobserver takes that into account, or might it be more efficient to run something like make -j 32 check-unit-tests && make -j 32 check-integrated-tests && make -j 32 check-mms-tests
?
Would you mind sharing instructions on how to run recent BOUT++ on marconi?
The tests should run in parallel. That sounds like a bug if the total runtime equals the sum of the runs, especially on 32 cores. I'll investigate ...
Yes, that it is not nice. I have been wondering whether it would be nicer to implement the job server on a higher level, so it can run all tests in parallel. That would produce a nicer output. Also the user probably doesn't care which tests are run ... Running one after another is what I thought about, that could be achieved by having the different targets sequential in a recipe. However, the unit tests run in serial, so it makes sense to run them in parallel if you have many cores. In the case of few cores (e.g. travis) not, as then we might schedule some expensive (i.e. tests requiring many cores) while other things are running, and it would be better to wait until the other tests are done. However in that case every test_suite gets only one job, so it doesn't know about the other tests. Moving the script to |
I tried to sum the output (running with
So maybe something is wrong on marconi? Can you provide more details? |
@dschwoerer I'm compiling on Marconi with gcc-7.3.0 and OpenMPI-4.0.1, which is fiddly because Marconi don't provide modules for most of the libraries. Compiling with intel should be simpler, but I haven't tried that in ages. I've described the current setup here https://gitlab.com/CCFE_SOL_Transport/STORM/-/wikis/setup/Compiling%20BOUT%20and%20external%20libraries%20on%20Marconi Running
(then I actually cancelled it because I got impatient).
and many outputs appeared more or less simultaneously. I tried running all the tests - |
Running |make -j 32 check-integrated-tests| and |make -j 32
build-check-integrated-tests| seem to behave quite differently. |make -j
32 check-integrated-tests| seems to be building in serial - it printed
|======= Making 57 integrated tests ======== test-include S - all_tests
=> False test-laplace-petsc3d ✓ 49.267 s test-compile-examples S -
all_tests => False |
(then I actually cancelled it because I got impatient). |make -j 32
build-check-integrated-tests| printed
Yes, the current setup only prints once the thing (test/compile) is
finished. Before it doesn't say anything.
So this might be confusing. I am thinking of replacing it by a
progress-bar and only show all results (ordered by name?) in the end.
That might be better. Combined with a timeout, this should also solve
issues where a tests doesn't finish, which we did in the past by
printing the name as we start.
and many outputs appeared more or less simultaneously.
Yes, because that prints as it starts, not once it is finished.
I tried running all the tests - |make -j 1 check-integrated-tests|
reported |All tests passed in 358.86 seconds| for the run, and your awk
command gave 494.902. |make -j 20 check-integrated-tests| reported |All
tests passed in 356.17 seconds| and your awk command gave 491.777.
This is odd.
I tried -j 1 as well, and got a sum of 287.971 and the total time
288.51, so what I expected.
What python version are you using?
Will try on marconi ...
|
Am using Python-3.6.4. |
On my Marconi setup, after a
doesn't seem like the builds are going in parallel... |
Very strange. I cannot reproduce this, I installed python-3.6.4 from source on marconi, and am still unable to reproduce. I am not able to run the tests, because I cannot get netcdf to work, and the tests need that, but dont have a |
I did a Running
Watching those results come in, they do seem to be running in parallel. |
Those times I just posted included compiling. If I start with the tests already compiled, running |
Thanks, I found the issue. Old Will investigate ... |
Also make sure we read non-blocking
93120b1 should add support for GNU Make 3.82 (marconi) |
Can confirm this works for me now. Thanks @dschwoerer! |
Thanks @dschwoerer ! Just some minor fixes. I think it's quite important you try to describe how this works so that future developers can maintain it. I'm sure it's quite clear and simple once you've read the |
- Add comments - Rename some variables, make the distinction of jobs as tests and jobs as threads more clear - Ensure we do not permanently change the status of the pipes. Also try to ensure no one else is trying to read before we do set the pipe to unblocking. This can otherwise cause issues with old gmake implementations.
# We can run in parallel. As some checks require several threads to | ||
# run, we take this into account when we schedule tests. Given a | ||
# certain amount of threads that we are told to use, we try to run | ||
# so many tests in parallel, that the total core count is not higher | ||
# than this number. If this isn't possible because some jobs require | ||
# more cores than we have threads, we run the remaining tests in | ||
# serial. The number of parallel threads is called cost of a test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, thanks @dschwoerer !
tests/integrated/test_suite
Outdated
self.local.req_met, self.local.req_expr = requirements.check(self.name+"/runtest") | ||
self.local.start_time = time.time() | ||
if not self.local.req_met: | ||
print(output % self.name + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't see where this is defined? Also, I would maybe not mixed %
and format
Thanks @dschwoerer ! It seems to work pretty well, went from 139s to 32s with 32 cores. Happy for this to go in now, I'd just like to understand where that |
Resolves #1451
It now works to run
make -j 4 check
to build and run tests in parallel.There is still further improvements possible, e.g. the jobserver isn't available to
test-compile-examples
(which isn't parallel, but should be easy to fix)There are some draw-backs:
Because other jobs are running, the info that the job started is only printed once the job is finished.
This could be worked around by adding a time-out. Maybe 30 to 120 minutes should be safe? 10 doesn't do, as compiling all examples takes more time.
Also the speed-up isn't awesome, because many tests already run in parallel, so there is nothing to gain.
(
time make -j <n> check
on 12 cores, 24 threads, 6 GB ram)