Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should make test use jobserver parallelism? #808

Closed
LiberalArtist opened this issue Feb 27, 2024 · 6 comments
Closed

Should make test use jobserver parallelism? #808

LiberalArtist opened this issue Feb 27, 2024 · 6 comments

Comments

@LiberalArtist
Copy link
Contributor

The BUILDING document says:

ChezScheme/BUILDING

Lines 244 to 245 in 57f92bb

To run test sets in parallel, use `zuo` instead of `make`, and
supply the `-j` flag between `.` and `test`.

Should testing use Zuo's support for the GNU Make jobserver so that make -j <jobs> test also runs tests in parallel?

It doesn't seem to be happening automatically. While working on update the Guix package of Chez Scheme to version 10, I changed the "check phase" of the build recipe to use Zuo instead of GNU Make. The BUILDING document warns that make test "can take on the order of an hour, depending on the speed of the machine," and indeed, on my 16-core laptop, an old log for a build of 9.9.9-pre-release.18 reported that the tests took 3165.6 seconds. Changing to Zuo for version 10.0.0 got the time down to 496.5, a huge improvement!

@mflatt
Copy link
Contributor

mflatt commented Feb 27, 2024

Now that Zuo's build uses jobserver support by default, it doesn't need to be specifically enabled. When I try on my machines (macOS and Linux), make -j 4 test runs with 4 jobs in a shell where make test doesn't (i.e., where I've unset the ZUO_JOBS environment variable that I normally have set).

Could it be an issue with Zuo and the particular variant of GNU make that you're using?

@LiberalArtist
Copy link
Contributor Author

I will test that and find out!

@LiberalArtist
Copy link
Contributor Author

I tried commenting out my customized "check phase", and not only does the make-based one not seem to run in parallel, it also failed! I've uploaded the log as a7kk3sqx1vw0fwqsj65pp38g425kz7wv-chez-scheme-10.0.0.drv.log.gz; here's the tail:

-------- o=3 hci=101 eval=interpret rmg=2 hci-rmg --------
some output differs from expected
 in build-one
 in loop
 in module->hash
make: *** [Makefile:40: test] Error 1

Test suite failed, dumping logs.
error: in phase 'check': uncaught exception:
%exception #<&invoke-error program: "make" arguments: ("test" "-j" "16") exit-status: 2 term-signal: #f stop-signal: #f> 
phase `check' failed after 669.9 seconds
command "make" "test" "-j" "16" failed with status 2

I don't know what's going on. If it helps, make --version reports:

make --version
GNU Make 4.3
Built for x86_64-unknown-linux-gnu
Copyright (C) 1988-2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

@mflatt
Copy link
Contributor

mflatt commented Feb 27, 2024

I don't know if there's a way to tell definitively from the log that it's sequential or parallel, but that log seems to report 669.9s, which is closer to 496.5s than 3165.6s . I forget how long ago pre-release.18 was, but maybe that was before jobserver support was enabled by default.

Meanwhile, the test failure appears to be a crash in the test related to signal handling. I have seen this on occasion, and I suspect an issue with the test, but I didn't figure it out the last time I tried.

@LiberalArtist
Copy link
Contributor Author

I don't know if there's a way to tell definitively from the log that it's sequential or parallel, but that log seems to report 669.9s, which is closer to 496.5s than 3165.6s .

True: my hypothesis had been that it had failed early but was on its way to a higher number.

I forget how long ago pre-release.18 was, but maybe that was before jobserver support was enabled by default.

That's the version from the Racket 8.11.1 release, built with the Zuo 1.7 from the same release. Just in case it's useful, here's that log:
lwa63x3g6pa4x2xvddsq3wflfmnmz3rd-chez-scheme-for-racket-9.9.9-pre-release.18.drv.log.gz

Meanwhile, the test failure appears to be a crash in the test related to signal handling. I have seen this on occasion, and I suspect an issue with the test, but I didn't figure it out the last time I tried.

If it's an inconsistent failure, I can try again later (once I've got my local tree back in the right state). I can also try building 9.9.9.pre-release-18 with the latest Zuo: I should test that anyway, as a consequence of the way Guix likes changes to be organized.

@LiberalArtist
Copy link
Contributor Author

I can also try building 9.9.9.pre-release-18 with the latest Zuo: I should test that anyway, as a consequence of the way Guix likes changes to be organized.

This succeeded in 526.6 seconds, so it does seem to work! For posterity,
16p8jj8dx9pv33aqsd6s0as7w814w53k-chez-scheme-for-racket-9.9.9-pre-release.18.drv.log.gz is the log from building commit 5f7cbdcd81 in my fork of Guix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants