Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel tests #2000

Merged
merged 13 commits into from
May 26, 2020
Merged

Parallel tests #2000

merged 13 commits into from
May 26, 2020

Conversation

dschwoerer
Copy link
Contributor

Resolves #1451

It now works to run make -j 4 check to build and run tests in parallel.

There is still further improvements possible, e.g. the jobserver isn't available to test-compile-examples (which isn't parallel, but should be easy to fix)

There are some draw-backs:
Because other jobs are running, the info that the job started is only printed once the job is finished.

This could be worked around by adding a time-out. Maybe 30 to 120 minutes should be safe? 10 doesn't do, as compiling all examples takes more time.

Also the speed-up isn't awesome, because many tests already run in parallel, so there is nothing to gain.

jobs time
1 real 5m27.523s
2 real 4m19.255s
4 real 3m44.568s
8 real 2m25.634s
12 real 2m11.765s
16 real 1m48.893s
24 real 2m22.265s

(time make -j <n> check on 12 cores, 24 threads, 6 GB ram)

Copy link
Contributor

@johnomotani johnomotani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parallel builds work for me on Marconi. Definite improvement 👍

It seems as though the tests still run consecutively though, and it looks like that is not the intention (#cores:8, etc.)? If I do make -j 32 check-integrated-tests, the sum of the times for running the individual tests is equal to the total run time. I haven't tried to dig into why...

When I run make -j 32 check the unit tests, integrated tests and MMS tests all start building at the same time, and their output overlaps. That's what always used to happen - just wondering if the jobserver takes that into account, or might it be more efficient to run something like make -j 32 check-unit-tests && make -j 32 check-integrated-tests && make -j 32 check-mms-tests?

tests/integrated/test_suite Outdated Show resolved Hide resolved
tests/integrated/test_suite Outdated Show resolved Hide resolved
@dschwoerer
Copy link
Contributor Author

Parallel builds work for me on Marconi. Definite improvement +1

Would you mind sharing instructions on how to run recent BOUT++ on marconi?

It seems as though the tests still run consecutively though, and it looks like that is not the intention (#cores:8, etc.)? If I do make -j 32 check-integrated-tests, the sum of the times for running the individual tests is equal to the total run time. I haven't tried to dig into why...

The tests should run in parallel. That sounds like a bug if the total runtime equals the sum of the runs, especially on 32 cores. I'll investigate ...

When I run make -j 32 check the unit tests, integrated tests and MMS tests all start building at the same time, and their output overlaps. That's what always used to happen - just wondering if the jobserver takes that into account, or might it be more efficient to run something like make -j 32 check-unit-tests && make -j 32 check-integrated-tests && make -j 32 check-mms-tests?

Yes, that it is not nice. I have been wondering whether it would be nicer to implement the job server on a higher level, so it can run all tests in parallel. That would produce a nicer output. Also the user probably doesn't care which tests are run ...

Running one after another is what I thought about, that could be achieved by having the different targets sequential in a recipe. However, the unit tests run in serial, so it makes sense to run them in parallel if you have many cores. In the case of few cores (e.g. travis) not, as then we might schedule some expensive (i.e. tests requiring many cores) while other things are running, and it would be better to wait until the other tests are done. However in that case every test_suite gets only one job, so it doesn't know about the other tests. Moving the script to tests/ and schedule all jobs would help in that case ...

@dschwoerer
Copy link
Contributor Author

I tried to sum the output (running with time make check-integrated-tests -j 2)
cat output | awk '{print $3}'|grep ^[0-9]|awk '{sum += $1}END{print sum}' gives 399.704 while the output prints:


======= All tests passed in 303.58 seconds =======

real    6m18.751s

So maybe something is wrong on marconi? Can you provide more details?

@johnomotani
Copy link
Contributor

@dschwoerer I'm compiling on Marconi with gcc-7.3.0 and OpenMPI-4.0.1, which is fiddly because Marconi don't provide modules for most of the libraries. Compiling with intel should be simpler, but I haven't tried that in ages. I've described the current setup here https://gitlab.com/CCFE_SOL_Transport/STORM/-/wikis/setup/Compiling%20BOUT%20and%20external%20libraries%20on%20Marconi

Running make -j 32 check-integrated-tests and make -j 32 build-check-integrated-tests seem to behave quite differently. make -j 32 check-integrated-tests seems to be building in serial - it printed

======= Making 57 integrated tests ========
test-include                     S -  all_tests => False
test-laplace-petsc3d             ✓  49.267 s
test-compile-examples            S -  all_tests => False

(then I actually cancelled it because I got impatient). make -j 32 build-check-integrated-tests printed

/usr/bin/gmake --no-print-directory -C tests/integrated
gmake[2]: Nothing to be done for `all'.
  Compiling  test_invpar.cxx
  Compiling  test_multigrid_laplace.cxx
  Compiling  test-twistshift.cxx
  Compiling  2fluid.cxx
  Compiling  testVec.cxx
  Compiling  test_stopCheck.cxx
  Compiling  2fluid.cxx
  Compiling  test_restarting.cxx
  Compiling  test_griddata.cxx
  Compiling  test-laplacexy.cxx
  Compiling  command-args.cxx
  Compiling  test_smooth.cxx
  Compiling  test_yupdown.cxx
  Compiling  test_interpolate.cxx
  Compiling  test-twistshift.cxx
  Compiling  test_cyclic.cxx
  Compiling  test-coordinates-initialization.cxx
  Compiling  invertable_operator.cxx
  Compiling  test_fieldfactory.cxx
  Compiling  test_interpolate.cxx
  Compiling  test_griddata.cxx
  Compiling  test_solver.cxx
  Compiling  test-restart-io.cxx
  Compiling  test-communications.cxx
  Compiling  test_yupdown_weights.cxx
  Compiling  test-laplacexy.cxx
  Compiling  test_delp2.cxx
  Linking test_snb
  Compiling  test_io.cxx
  Compiling  test_naulin_laplace.cxx
...

and many outputs appeared more or less simultaneously.

I tried running all the tests - make -j 1 check-integrated-tests reported All tests passed in 358.86 seconds for the run, and your awk command gave 494.902. make -j 20 check-integrated-tests reported All tests passed in 356.17 seconds and your awk command gave 491.777.

@dschwoerer
Copy link
Contributor Author

dschwoerer commented Apr 1, 2020 via email

@johnomotani
Copy link
Contributor

Am using Python-3.6.4.

@johnomotani
Copy link
Contributor

On my Marconi setup, after a make clean-integrated-tests, I get this output

$ make -j 32 check-integrated-tests
======= Making 57 integrated tests ========
test-include                     S -  all_tests => False
test-laplace-petsc3d             ✓  34.837 s
test-compile-examples            S -  all_tests => False
test-snb                         ✓  31.665 s
test-command-args                ✓  22.199 s
test-io_hdf5                     S -  all_tests => False
test-cyclic                      ✓  30.190 s
test-yupdown-weights             ✓  24.587 s
test-interpolate                 ✓  29.607 s
test-naulin-laplace              ✓  23.102 s
test-drift-instability-staggered ✓  32.444 s
test-stopCheck                   ✓  22.653 s
test-laplacexy                   S -  all_tests => False
test-restart-io_hdf5             S -  hdf5 => False
test-attribs                     S -  all_tests => False
test-options-netcdf              S -  False => False
test-restarting                  ✓  24.015 s
test-interchange-instability     ✓  33.026 s
test-communications              ✓  24.237 s
test-petsc_laplace               S -  all_tests => False
test-restart-io                  ✓  24.010 s
test-slepc-solver                S -  slepc => False
test-multigrid_laplace           ✓  28.703 s
test-io                          ✓  23.742 s
test-vec                         ✓  25.654 s
test-griddata-yboundary-guards   ✓  22.541 s
test-invpar                      ✓  28.900 s
test-laplacexy-short             ✓  21.792 s
test-petsc_laplace_MAST-grid     S -  all_tests => False
test-laplacexy-fv                ✓  21.208 s
test-twistshift-staggered        ✓  21.749 s
test-solver                      ✓  28.679 s
test-initial                     S -  not make => False
test-fieldgroupComm              S -  all_tests => False
test-subdir                      ✓  38.511 s
test-smooth                      ✓  22.450 s
test-squash                      S -  all_tests => False
test-invertable-operator         ✓  30.373 s
test-boutcore/mms-ddz            S -  boutcore => False
test-boutcore/legacy-model       S -  boutcore => False
test-boutcore/collect            S -  boutcore => False
test-boutcore/collect-staggered  S -  boutcore => False
test-boutcore/simple-model       S -  boutcore => False
test-drift-instability           S -  all_tests => False
test-twistshift                  ✓  22.066 s
test-interpolate-z               ✓  28.551 s
test-coordinates-initialization  ✓  28.074 s
test-delp2                       ✓  27.126 s
test-yupdown                     ✓  20.680 s
test-griddata                    ✓  26.906 s
test-fieldfactory                ✓  22.239 s
test-code-style                  S -  not make => False
test-stopCheck-file              ✓  26.933 s
test-gyro                        ✓  23.595 s
test-laplace                     ✓  28.667 s
test-compile-examples-petsc      S -  all_tests => False
test-region-iterator             ✓  26.693 s


======= All tests passed in 952.45 seconds =======
======= Running 57 integrated tests ========
test-include                     S -  all_tests => False
test-laplace-petsc3d             ✓  12.945 s
test-compile-examples            S -  all_tests => False
test-snb                         ✓   1.059 s
test-command-args                ✓  17.485 s
test-yupdown-weights             ✓   1.574 s
test-interpolate                 ✓  16.042 s
test-drift-instability-staggered ✓  18.014 s
test-stopCheck                   ✓   0.092 s
test-attribs                     S -  all_tests => False
test-options-netcdf              S -  False => False
test-restarting                  ✓   7.399 s
test-slepc-solver                S -  slepc => False
test-twistshift-staggered        ✓   1.553 s
test-solver                      ✓   5.873 s
test-subdir                      ✓  21.799 s
test-boutcore/mms-ddz            S -  boutcore => False
test-boutcore/legacy-model       S -  boutcore => False
test-boutcore/collect            S -  boutcore => False
test-boutcore/collect-staggered  S -  boutcore => False
test-boutcore/simple-model       S -  boutcore => False
test-drift-instability           S -  all_tests => False
test-twistshift                  ✓   1.551 s
test-interpolate-z               ✓   5.438 s
test-yupdown                     ✓   2.690 s
test-griddata                    ✓   1.550 s
test-code-style                  ✓   3.799 s
test-stopCheck-file              ✓   8.895 s
test-compile-examples-petsc      S -  all_tests => False
test-communications              ✓  10.074 s
test-laplacexy                   S -  all_tests => False
test-laplacexy-short             ✓  11.827 s
test-laplacexy-fv                ✓   6.791 s
test-griddata-yboundary-guards   ✓   7.658 s
test-io_hdf5                     S -  all_tests => False
test-cyclic                      ✓  12.826 s
test-restart-io_hdf5             S -  hdf5 => False
test-petsc_laplace               S -  all_tests => False
test-restart-io                  ✓   8.555 s
test-io                          ✓  25.374 s
test-vec                         ✓   1.409 s
test-invpar                      ✓  24.754 s
test-petsc_laplace_MAST-grid     S -  all_tests => False
test-initial                     ✓  15.363 s
test-fieldgroupComm              S -  all_tests => False
test-smooth                      ✓   5.316 s
test-squash                      S -  all_tests => False
test-delp2                       ✓  19.691 s
test-fieldfactory                ✓   4.275 s
test-gyro                        ✓   4.085 s
test-laplace                     ✓   7.797 s
test-naulin-laplace              ✓   4.096 s
test-multigrid_laplace           ✓   5.995 s
test-coordinates-initialization  ✓   0.681 s
test-interchange-instability     ✓  12.548 s
test-invertable-operator         ✓   2.931 s
test-region-iterator             ✓   2.922 s


======= All tests passed in 322.78 seconds =======

doesn't seem like the builds are going in parallel...

@dschwoerer
Copy link
Contributor Author

Very strange. I cannot reproduce this, I installed python-3.6.4 from source on marconi, and am still unable to reproduce.
What does ./test_suite -j 20 print? Have you used the most recent version?

I am not able to run the tests, because I cannot get netcdf to work, and the tests need that, but dont have a #require netcdf ...

@johnomotani
Copy link
Contributor

I did a git pull to check I was up to date with parallel-tests but I had the latest already.

Running test_suite directly looks more sensible...

./test_suite -j 20
======= Running 57 integrated tests ========
test-communications              ✓  50.655 s
test-laplacexy                   S -  all_tests => False
test-interchange-instability     ✓  54.001 s
test-region-iterator             ✓  65.280 s
test-include                     S -  all_tests => False
test-compile-examples            S -  all_tests => False
test-invertable-operator         ✓  68.990 s
test-laplacexy-fv                ✓  72.366 s
test-laplacexy-short             ✓  81.444 s
test-io_hdf5                     S -  all_tests => False
test-restart-io_hdf5             S -  hdf5 => False
test-petsc_laplace               S -  all_tests => False
test-yupdown-weights             ✓  73.469 s
test-snb                         ✓  74.982 s
test-naulin-laplace              ✓  82.358 s
test-restart-io                  ✓  68.155 s
test-griddata-yboundary-guards   ✓  81.185 s
test-attribs                     S -  all_tests => False
test-laplace-petsc3d             ✓  91.606 s
test-options-netcdf              S -  False => False
test-cyclic                      ✓ 106.522 s
test-stopCheck                   ✓  67.461 s
test-slepc-solver                S -  slepc => False
test-multigrid_laplace           ✓  72.637 s
test-vec                         ✓  70.897 s
test-petsc_laplace_MAST-grid     S -  all_tests => False
test-restarting                  ✓  78.598 s
test-drift-instability-staggered ✓ 107.265 s
test-twistshift-staggered        ✓  40.290 s
test-boutcore/mms-ddz            S -  boutcore => False
test-boutcore/legacy-model       S -  boutcore => False
test-boutcore/collect            S -  boutcore => False
test-boutcore/collect-staggered  S -  boutcore => False
test-boutcore/simple-model       S -  boutcore => False
test-drift-instability           S -  all_tests => False
test-command-args                ✓ 123.657 s
test-interpolate                 ✓ 123.846 s
test-coordinates-initialization  ✓  47.766 s
test-code-style                  ✓   0.899 s
test-compile-examples-petsc      S -  all_tests => False
test-initial                     ✓  60.832 s
test-fieldgroupComm              S -  all_tests => False
test-io                          ✓ 137.474 s
test-squash                      S -  all_tests => False
test-solver                      ✓  58.852 s
test-invpar                      ✓ 142.620 s
test-interpolate-z               ✓  62.913 s
test-twistshift                  ✓  65.972 s
test-yupdown                     ✓  74.790 s
test-subdir                      ✓  93.082 s
test-griddata                    ✓  93.041 s
test-stopCheck-file              ✓  93.856 s
test-smooth                      ✓  80.320 s
test-gyro                        ✓  54.582 s
test-fieldfactory                ✓  57.715 s
test-delp2                       ✓ 109.269 s
test-laplace                     ✓  38.851 s


======= All tests passed in 457.48 seconds =======

Watching those results come in, they do seem to be running in parallel.

@johnomotani
Copy link
Contributor

Those times I just posted included compiling. If I start with the tests already compiled, running make -j 20 check-integrated-tests takes ~330s, but running ./test_suite -j 20 takes just 100s.

@dschwoerer
Copy link
Contributor Author

Thanks, I found the issue. Old make seems to be causing issues -.-

Will investigate ...

@dschwoerer
Copy link
Contributor Author

93120b1 should add support for GNU Make 3.82 (marconi)

@dschwoerer dschwoerer mentioned this pull request Apr 2, 2020
@johnomotani
Copy link
Contributor

Can confirm this works for me now. Thanks @dschwoerer!

tests/integrated/test_suite Outdated Show resolved Hide resolved
tests/integrated/test_suite Outdated Show resolved Hide resolved
tests/integrated/test_suite Show resolved Hide resolved
tests/integrated/test_suite Outdated Show resolved Hide resolved
tests/integrated/test_suite Outdated Show resolved Hide resolved
tests/integrated/test_suite Outdated Show resolved Hide resolved
tests/integrated/test_suite Outdated Show resolved Hide resolved
tests/integrated/test_suite Show resolved Hide resolved
tests/integrated/test_suite Outdated Show resolved Hide resolved
tests/integrated/test_suite Show resolved Hide resolved
@ZedThree
Copy link
Member

ZedThree commented Apr 6, 2020

Thanks @dschwoerer ! Just some minor fixes. I think it's quite important you try to describe how this works so that future developers can maintain it. I'm sure it's quite clear and simple once you've read the make jobserver docs, but it could do with some explanation here.

- Add comments
- Rename some variables, make the distinction of jobs as tests and
  jobs as threads more clear
- Ensure we do not permanently change the status of the pipes. Also
  try to ensure no one else is trying to read before we do set the
  pipe to unblocking. This can otherwise cause issues with old gmake
  implementations.
Comment on lines +70 to +76
# We can run in parallel. As some checks require several threads to
# run, we take this into account when we schedule tests. Given a
# certain amount of threads that we are told to use, we try to run
# so many tests in parallel, that the total core count is not higher
# than this number. If this isn't possible because some jobs require
# more cores than we have threads, we run the remaining tests in
# serial. The number of parallel threads is called cost of a test.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thanks @dschwoerer !

self.local.req_met, self.local.req_expr = requirements.check(self.name+"/runtest")
self.local.start_time = time.time()
if not self.local.req_met:
print(output % self.name +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't see where this is defined? Also, I would maybe not mixed % and format

tests/integrated/test_suite Outdated Show resolved Hide resolved
@ZedThree
Copy link
Member

Thanks @dschwoerer ! It seems to work pretty well, went from 139s to 32s with 32 cores.

Happy for this to go in now, I'd just like to understand where that output comes from.

@dschwoerer dschwoerer requested a review from ZedThree May 23, 2020 17:56
@ZedThree ZedThree merged commit 053bcf5 into next May 26, 2020
@ZedThree ZedThree deleted the parallel-tests branch May 26, 2020 09:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants