Parallel tests #2000

dschwoerer · 2020-03-28T00:15:26Z

Resolves #1451

It now works to run make -j 4 check to build and run tests in parallel.

There is still further improvements possible, e.g. the jobserver isn't available to test-compile-examples (which isn't parallel, but should be easy to fix)

There are some draw-backs:
Because other jobs are running, the info that the job started is only printed once the job is finished.

This could be worked around by adding a time-out. Maybe 30 to 120 minutes should be safe? 10 doesn't do, as compiling all examples takes more time.

Also the speed-up isn't awesome, because many tests already run in parallel, so there is nothing to gain.

jobs	time
1	real 5m27.523s
2	real 4m19.255s
4	real 3m44.568s
8	real 2m25.634s
12	real 2m11.765s
16	real 1m48.893s
24	real 2m22.265s

(time make -j <n> check on 12 cores, 24 threads, 6 GB ram)

johnomotani

Parallel builds work for me on Marconi. Definite improvement 👍

It seems as though the tests still run consecutively though, and it looks like that is not the intention (#cores:8, etc.)? If I do make -j 32 check-integrated-tests, the sum of the times for running the individual tests is equal to the total run time. I haven't tried to dig into why...

When I run make -j 32 check the unit tests, integrated tests and MMS tests all start building at the same time, and their output overlaps. That's what always used to happen - just wondering if the jobserver takes that into account, or might it be more efficient to run something like make -j 32 check-unit-tests && make -j 32 check-integrated-tests && make -j 32 check-mms-tests?

tests/integrated/test_suite

dschwoerer · 2020-03-30T21:15:39Z

Parallel builds work for me on Marconi. Definite improvement +1

Would you mind sharing instructions on how to run recent BOUT++ on marconi?

It seems as though the tests still run consecutively though, and it looks like that is not the intention (#cores:8, etc.)? If I do make -j 32 check-integrated-tests, the sum of the times for running the individual tests is equal to the total run time. I haven't tried to dig into why...

The tests should run in parallel. That sounds like a bug if the total runtime equals the sum of the runs, especially on 32 cores. I'll investigate ...

When I run make -j 32 check the unit tests, integrated tests and MMS tests all start building at the same time, and their output overlaps. That's what always used to happen - just wondering if the jobserver takes that into account, or might it be more efficient to run something like make -j 32 check-unit-tests && make -j 32 check-integrated-tests && make -j 32 check-mms-tests?

Yes, that it is not nice. I have been wondering whether it would be nicer to implement the job server on a higher level, so it can run all tests in parallel. That would produce a nicer output. Also the user probably doesn't care which tests are run ...

Running one after another is what I thought about, that could be achieved by having the different targets sequential in a recipe. However, the unit tests run in serial, so it makes sense to run them in parallel if you have many cores. In the case of few cores (e.g. travis) not, as then we might schedule some expensive (i.e. tests requiring many cores) while other things are running, and it would be better to wait until the other tests are done. However in that case every test_suite gets only one job, so it doesn't know about the other tests. Moving the script to tests/ and schedule all jobs would help in that case ...

dschwoerer · 2020-03-30T21:20:53Z

I tried to sum the output (running with time make check-integrated-tests -j 2)
cat output | awk '{print $3}'|grep ^[0-9]|awk '{sum += $1}END{print sum}' gives 399.704 while the output prints:


======= All tests passed in 303.58 seconds =======

real    6m18.751s

So maybe something is wrong on marconi? Can you provide more details?

johnomotani · 2020-03-31T17:52:17Z

@dschwoerer I'm compiling on Marconi with gcc-7.3.0 and OpenMPI-4.0.1, which is fiddly because Marconi don't provide modules for most of the libraries. Compiling with intel should be simpler, but I haven't tried that in ages. I've described the current setup here https://gitlab.com/CCFE_SOL_Transport/STORM/-/wikis/setup/Compiling%20BOUT%20and%20external%20libraries%20on%20Marconi

Running make -j 32 check-integrated-tests and make -j 32 build-check-integrated-tests seem to behave quite differently. make -j 32 check-integrated-tests seems to be building in serial - it printed

======= Making 57 integrated tests ========
test-include                     S -  all_tests => False
test-laplace-petsc3d             ✓  49.267 s
test-compile-examples            S -  all_tests => False

(then I actually cancelled it because I got impatient). make -j 32 build-check-integrated-tests printed

/usr/bin/gmake --no-print-directory -C tests/integrated
gmake[2]: Nothing to be done for `all'.
  Compiling  test_invpar.cxx
  Compiling  test_multigrid_laplace.cxx
  Compiling  test-twistshift.cxx
  Compiling  2fluid.cxx
  Compiling  testVec.cxx
  Compiling  test_stopCheck.cxx
  Compiling  2fluid.cxx
  Compiling  test_restarting.cxx
  Compiling  test_griddata.cxx
  Compiling  test-laplacexy.cxx
  Compiling  command-args.cxx
  Compiling  test_smooth.cxx
  Compiling  test_yupdown.cxx
  Compiling  test_interpolate.cxx
  Compiling  test-twistshift.cxx
  Compiling  test_cyclic.cxx
  Compiling  test-coordinates-initialization.cxx
  Compiling  invertable_operator.cxx
  Compiling  test_fieldfactory.cxx
  Compiling  test_interpolate.cxx
  Compiling  test_griddata.cxx
  Compiling  test_solver.cxx
  Compiling  test-restart-io.cxx
  Compiling  test-communications.cxx
  Compiling  test_yupdown_weights.cxx
  Compiling  test-laplacexy.cxx
  Compiling  test_delp2.cxx
  Linking test_snb
  Compiling  test_io.cxx
  Compiling  test_naulin_laplace.cxx
...

and many outputs appeared more or less simultaneously.

I tried running all the tests - make -j 1 check-integrated-tests reported All tests passed in 358.86 seconds for the run, and your awk command gave 494.902. make -j 20 check-integrated-tests reported All tests passed in 356.17 seconds and your awk command gave 491.777.

dschwoerer · 2020-04-01T00:57:32Z

Yes, the current setup only prints once the thing (test/compile) is finished. Before it doesn't say anything. So this might be confusing. I am thinking of replacing it by a progress-bar and only show all results (ordered by name?) in the end. That might be better. Combined with a timeout, this should also solve issues where a tests doesn't finish, which we did in the past by printing the name as we start.

and many outputs appeared more or less simultaneously.

Yes, because that prints as it starts, not once it is finished.

This is odd. I tried -j 1 as well, and got a sum of 287.971 and the total time 288.51, so what I expected. What python version are you using? Will try on marconi ...

johnomotani · 2020-04-01T09:01:53Z

Am using Python-3.6.4.

johnomotani · 2020-04-01T10:07:27Z

On my Marconi setup, after a make clean-integrated-tests, I get this output

$ make -j 32 check-integrated-tests
======= Making 57 integrated tests ========
test-include                     S -  all_tests => False
test-laplace-petsc3d             ✓  34.837 s
test-compile-examples            S -  all_tests => False
test-snb                         ✓  31.665 s
test-command-args                ✓  22.199 s
test-io_hdf5                     S -  all_tests => False
test-cyclic                      ✓  30.190 s
test-yupdown-weights             ✓  24.587 s
test-interpolate                 ✓  29.607 s
test-naulin-laplace              ✓  23.102 s
test-drift-instability-staggered ✓  32.444 s
test-stopCheck                   ✓  22.653 s
test-laplacexy                   S -  all_tests => False
test-restart-io_hdf5             S -  hdf5 => False
test-attribs                     S -  all_tests => False
test-options-netcdf              S -  False => False
test-restarting                  ✓  24.015 s
test-interchange-instability     ✓  33.026 s
test-communications              ✓  24.237 s
test-petsc_laplace               S -  all_tests => False
test-restart-io                  ✓  24.010 s
test-slepc-solver                S -  slepc => False
test-multigrid_laplace           ✓  28.703 s
test-io                          ✓  23.742 s
test-vec                         ✓  25.654 s
test-griddata-yboundary-guards   ✓  22.541 s
test-invpar                      ✓  28.900 s
test-laplacexy-short             ✓  21.792 s
test-petsc_laplace_MAST-grid     S -  all_tests => False
test-laplacexy-fv                ✓  21.208 s
test-twistshift-staggered        ✓  21.749 s
test-solver                      ✓  28.679 s
test-initial                     S -  not make => False
test-fieldgroupComm              S -  all_tests => False
test-subdir                      ✓  38.511 s
test-smooth                      ✓  22.450 s
test-squash                      S -  all_tests => False
test-invertable-operator         ✓  30.373 s
test-boutcore/mms-ddz            S -  boutcore => False
test-boutcore/legacy-model       S -  boutcore => False
test-boutcore/collect            S -  boutcore => False
test-boutcore/collect-staggered  S -  boutcore => False
test-boutcore/simple-model       S -  boutcore => False
test-drift-instability           S -  all_tests => False
test-twistshift                  ✓  22.066 s
test-interpolate-z               ✓  28.551 s
test-coordinates-initialization  ✓  28.074 s
test-delp2                       ✓  27.126 s
test-yupdown                     ✓  20.680 s
test-griddata                    ✓  26.906 s
test-fieldfactory                ✓  22.239 s
test-code-style                  S -  not make => False
test-stopCheck-file              ✓  26.933 s
test-gyro                        ✓  23.595 s
test-laplace                     ✓  28.667 s
test-compile-examples-petsc      S -  all_tests => False
test-region-iterator             ✓  26.693 s


======= All tests passed in 952.45 seconds =======
======= Running 57 integrated tests ========
test-include                     S -  all_tests => False
test-laplace-petsc3d             ✓  12.945 s
test-compile-examples            S -  all_tests => False
test-snb                         ✓   1.059 s
test-command-args                ✓  17.485 s
test-yupdown-weights             ✓   1.574 s
test-interpolate                 ✓  16.042 s
test-drift-instability-staggered ✓  18.014 s
test-stopCheck                   ✓   0.092 s
test-attribs                     S -  all_tests => False
test-options-netcdf              S -  False => False
test-restarting                  ✓   7.399 s
test-slepc-solver                S -  slepc => False
test-twistshift-staggered        ✓   1.553 s
test-solver                      ✓   5.873 s
test-subdir                      ✓  21.799 s
test-boutcore/mms-ddz            S -  boutcore => False
test-boutcore/legacy-model       S -  boutcore => False
test-boutcore/collect            S -  boutcore => False
test-boutcore/collect-staggered  S -  boutcore => False
test-boutcore/simple-model       S -  boutcore => False
test-drift-instability           S -  all_tests => False
test-twistshift                  ✓   1.551 s
test-interpolate-z               ✓   5.438 s
test-yupdown                     ✓   2.690 s
test-griddata                    ✓   1.550 s
test-code-style                  ✓   3.799 s
test-stopCheck-file              ✓   8.895 s
test-compile-examples-petsc      S -  all_tests => False
test-communications              ✓  10.074 s
test-laplacexy                   S -  all_tests => False
test-laplacexy-short             ✓  11.827 s
test-laplacexy-fv                ✓   6.791 s
test-griddata-yboundary-guards   ✓   7.658 s
test-io_hdf5                     S -  all_tests => False
test-cyclic                      ✓  12.826 s
test-restart-io_hdf5             S -  hdf5 => False
test-petsc_laplace               S -  all_tests => False
test-restart-io                  ✓   8.555 s
test-io                          ✓  25.374 s
test-vec                         ✓   1.409 s
test-invpar                      ✓  24.754 s
test-petsc_laplace_MAST-grid     S -  all_tests => False
test-initial                     ✓  15.363 s
test-fieldgroupComm              S -  all_tests => False
test-smooth                      ✓   5.316 s
test-squash                      S -  all_tests => False
test-delp2                       ✓  19.691 s
test-fieldfactory                ✓   4.275 s
test-gyro                        ✓   4.085 s
test-laplace                     ✓   7.797 s
test-naulin-laplace              ✓   4.096 s
test-multigrid_laplace           ✓   5.995 s
test-coordinates-initialization  ✓   0.681 s
test-interchange-instability     ✓  12.548 s
test-invertable-operator         ✓   2.931 s
test-region-iterator             ✓   2.922 s


======= All tests passed in 322.78 seconds =======

doesn't seem like the builds are going in parallel...

dschwoerer · 2020-04-02T14:21:04Z

Very strange. I cannot reproduce this, I installed python-3.6.4 from source on marconi, and am still unable to reproduce.
What does ./test_suite -j 20 print? Have you used the most recent version?

I am not able to run the tests, because I cannot get netcdf to work, and the tests need that, but dont have a #require netcdf ...

johnomotani · 2020-04-02T14:45:52Z

I did a git pull to check I was up to date with parallel-tests but I had the latest already.

Running test_suite directly looks more sensible...

./test_suite -j 20
======= Running 57 integrated tests ========
test-communications              ✓  50.655 s
test-laplacexy                   S -  all_tests => False
test-interchange-instability     ✓  54.001 s
test-region-iterator             ✓  65.280 s
test-include                     S -  all_tests => False
test-compile-examples            S -  all_tests => False
test-invertable-operator         ✓  68.990 s
test-laplacexy-fv                ✓  72.366 s
test-laplacexy-short             ✓  81.444 s
test-io_hdf5                     S -  all_tests => False
test-restart-io_hdf5             S -  hdf5 => False
test-petsc_laplace               S -  all_tests => False
test-yupdown-weights             ✓  73.469 s
test-snb                         ✓  74.982 s
test-naulin-laplace              ✓  82.358 s
test-restart-io                  ✓  68.155 s
test-griddata-yboundary-guards   ✓  81.185 s
test-attribs                     S -  all_tests => False
test-laplace-petsc3d             ✓  91.606 s
test-options-netcdf              S -  False => False
test-cyclic                      ✓ 106.522 s
test-stopCheck                   ✓  67.461 s
test-slepc-solver                S -  slepc => False
test-multigrid_laplace           ✓  72.637 s
test-vec                         ✓  70.897 s
test-petsc_laplace_MAST-grid     S -  all_tests => False
test-restarting                  ✓  78.598 s
test-drift-instability-staggered ✓ 107.265 s
test-twistshift-staggered        ✓  40.290 s
test-boutcore/mms-ddz            S -  boutcore => False
test-boutcore/legacy-model       S -  boutcore => False
test-boutcore/collect            S -  boutcore => False
test-boutcore/collect-staggered  S -  boutcore => False
test-boutcore/simple-model       S -  boutcore => False
test-drift-instability           S -  all_tests => False
test-command-args                ✓ 123.657 s
test-interpolate                 ✓ 123.846 s
test-coordinates-initialization  ✓  47.766 s
test-code-style                  ✓   0.899 s
test-compile-examples-petsc      S -  all_tests => False
test-initial                     ✓  60.832 s
test-fieldgroupComm              S -  all_tests => False
test-io                          ✓ 137.474 s
test-squash                      S -  all_tests => False
test-solver                      ✓  58.852 s
test-invpar                      ✓ 142.620 s
test-interpolate-z               ✓  62.913 s
test-twistshift                  ✓  65.972 s
test-yupdown                     ✓  74.790 s
test-subdir                      ✓  93.082 s
test-griddata                    ✓  93.041 s
test-stopCheck-file              ✓  93.856 s
test-smooth                      ✓  80.320 s
test-gyro                        ✓  54.582 s
test-fieldfactory                ✓  57.715 s
test-delp2                       ✓ 109.269 s
test-laplace                     ✓  38.851 s


======= All tests passed in 457.48 seconds =======

Watching those results come in, they do seem to be running in parallel.

johnomotani · 2020-04-02T15:07:59Z

Those times I just posted included compiling. If I start with the tests already compiled, running make -j 20 check-integrated-tests takes ~330s, but running ./test_suite -j 20 takes just 100s.

dschwoerer · 2020-04-02T16:00:31Z

Thanks, I found the issue. Old make seems to be causing issues -.-

Will investigate ...

Also make sure we read non-blocking

dschwoerer · 2020-04-02T17:56:33Z

93120b1 should add support for GNU Make 3.82 (marconi)

johnomotani · 2020-04-02T19:35:12Z

Can confirm this works for me now. Thanks @dschwoerer!

tests/integrated/test_suite

ZedThree · 2020-04-06T09:52:31Z

Thanks @dschwoerer ! Just some minor fixes. I think it's quite important you try to describe how this works so that future developers can maintain it. I'm sure it's quite clear and simple once you've read the make jobserver docs, but it could do with some explanation here.

- Add comments - Rename some variables, make the distinction of jobs as tests and jobs as threads more clear - Ensure we do not permanently change the status of the pipes. Also try to ensure no one else is trying to read before we do set the pipe to unblocking. This can otherwise cause issues with old gmake implementations.

ZedThree · 2020-05-13T14:55:14Z

tests/integrated/test_suite

+# We can run in parallel. As some checks require several threads to
+# run, we take this into account when we schedule tests. Given a
+# certain amount of threads that we are told to use, we try to run
+# so many tests in parallel, that the total core count is not higher
+# than this number. If this isn't possible because some jobs require
+# more cores than we have threads, we run the remaining tests in
+# serial. The number of parallel threads is called cost of a test.


This is great, thanks @dschwoerer !

ZedThree · 2020-05-13T14:56:06Z

tests/integrated/test_suite

+        self.local.req_met, self.local.req_expr = requirements.check(self.name+"/runtest")
+        self.local.start_time = time.time()
+        if not self.local.req_met:
+            print(output % self.name +


I can't see where this is defined? Also, I would maybe not mixed % and format

tests/integrated/test_suite

ZedThree · 2020-05-13T14:58:36Z

Thanks @dschwoerer ! It seems to work pretty well, went from 139s to 32s with 32 cores.

Happy for this to go in now, I'd just like to understand where that output comes from.

dschwoerer added 3 commits March 26, 2020 01:57

Run tests in parallel

7c7fa40

Accept -k in test-suite

7c5ea5b

Ignore unknown commands in test_suite

76890ee

dschwoerer requested a review from johnomotani March 30, 2020 17:52

johnomotani reviewed Mar 30, 2020

View reviewed changes

tests/integrated/test_suite Outdated Show resolved Hide resolved

tests/integrated/test_suite Outdated Show resolved Hide resolved

dschwoerer added 3 commits April 2, 2020 18:52

Support older make

93120b1

Also make sure we read non-blocking

Only parse flags from make

b9569dc

Merge remote-tracking branch 'boutproject/next' into parallel-tests

2aa9ac6

dschwoerer mentioned this pull request Apr 2, 2020

Without netcdf #2004

Merged

dschwoerer added 5 commits April 2, 2020 21:01

use low-level io api for test_suite

6d52a03

set encoding when reading files

98a6855

Handle it gracefully if we don't have a jobserver

2c86d65

Only build the tests that we are running

670bdbb

Only touch pipe if we are not in --get mode

d540a73

ZedThree requested changes Apr 6, 2020

View reviewed changes

ZedThree reviewed May 13, 2020

View reviewed changes

pass string length as argument

e1dd467

dschwoerer requested a review from ZedThree May 23, 2020 17:56

ZedThree approved these changes May 26, 2020

View reviewed changes

ZedThree merged commit 053bcf5 into next May 26, 2020

ZedThree deleted the parallel-tests branch May 26, 2020 09:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel tests #2000

Parallel tests #2000

dschwoerer commented Mar 28, 2020

johnomotani left a comment

dschwoerer commented Mar 30, 2020

dschwoerer commented Mar 30, 2020

johnomotani commented Mar 31, 2020

dschwoerer commented Apr 1, 2020 via email

johnomotani commented Apr 1, 2020

johnomotani commented Apr 1, 2020

dschwoerer commented Apr 2, 2020

johnomotani commented Apr 2, 2020

johnomotani commented Apr 2, 2020

dschwoerer commented Apr 2, 2020

dschwoerer commented Apr 2, 2020

johnomotani commented Apr 2, 2020

ZedThree commented Apr 6, 2020

ZedThree May 13, 2020

ZedThree May 13, 2020

ZedThree commented May 13, 2020

Parallel tests #2000

Parallel tests #2000

Conversation

dschwoerer commented Mar 28, 2020

johnomotani left a comment

Choose a reason for hiding this comment

dschwoerer commented Mar 30, 2020

dschwoerer commented Mar 30, 2020

johnomotani commented Mar 31, 2020

dschwoerer commented Apr 1, 2020 via email

johnomotani commented Apr 1, 2020

johnomotani commented Apr 1, 2020

dschwoerer commented Apr 2, 2020

johnomotani commented Apr 2, 2020

johnomotani commented Apr 2, 2020

dschwoerer commented Apr 2, 2020

dschwoerer commented Apr 2, 2020

johnomotani commented Apr 2, 2020

ZedThree commented Apr 6, 2020

ZedThree May 13, 2020

Choose a reason for hiding this comment

ZedThree May 13, 2020

Choose a reason for hiding this comment

ZedThree commented May 13, 2020