Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add make version 4.3 to speedup parallel build #72

Closed
dileks opened this issue Feb 21, 2020 · 12 comments
Closed

Add make version 4.3 to speedup parallel build #72

dileks opened this issue Feb 21, 2020 · 12 comments
Labels
enhancement New feature or request wontfix This will not be worked on

Comments

@dileks
Copy link
Contributor

dileks commented Feb 21, 2020

Hi,

as pointed out at the first ClangBuiltLinux Meetup in Zurich I suggest to try make version 4.3 to check if we see some speedups in parallel builds.

The official release announcement [1] and a review in German [2] is added.

While dealing with GNU make I fell over this commit ("pipe: use exclusive waits when reading or writing") in Linus tree [3] which reports to speedup parallel-make-jobs when building a Linux-kernel significantly.

Speaking for the Debian side there is 'make (4.2.1-1.2)` in buster/testing/unstable available.

[1] https://lists.gnu.org/archive/html/info-gnu/2020-01/msg00004.html
[2] https://www.heise.de/developer/meldung/Build-Tool-GNU-Make-4-3-verbessert-die-Performance-4641700.html
[3] https://git.kernel.org/linus/0ddad21d3e99c743a3aa473121dc5561679e26bb

@dileks dileks added the enhancement New feature or request label Feb 21, 2020
@nathanchance
Copy link
Member

I think this is an exercise best left up to the user. We only use make when doing kernel builds which only happens when using --pgo, so I would argue it probably won’t be that big of a difference. I can try to write a hyperfine benchmark that evaluates make 4.2.1 versus 4.3 and see what kind of difference it makes but I think building make as a part of the script is out of scope.

@nathanchance
Copy link
Member

So I wired up a benchmark with GCC and the initial results for make 4.3 don't seem good.

$ hyperfine -w 1 -r 25 -p 'rm -rf out.aarch64' '/tmp/tmp.Ms0KgyWriW/make-4.2.1/bin/make -j$(nproc) -s ARCH=arm64 CROSS_COMPILE=${HOME}/toolchains/gcc/9.2.0/bin/aarch64-linux- O=out.aarch64 defconfig all' '/tmp/tmp.Ms0KgyWriW/make-4.3/bin/make -j$(nproc) -s ARCH=arm64 CROSS_COMPILE=${HOME}/toolchains/gcc/9.2.0/bin/aarch64-linux- O=out.aarch64 defconfig all'
Benchmark #1: /tmp/tmp.Ms0KgyWriW/make-4.2.1/bin/make -j$(nproc) -s ARCH=arm64 CROSS_COMPILE=${HOME}/toolchains/gcc/9.2.0/bin/aarch64-linux- O=out.aarch64 defconfig all
  Time (mean ± σ):     134.735 s ±  1.756 s    [User: 3966.322 s, System: 410.203 s]
  Range (min … max):   133.331 s … 141.069 s    25 runs

Benchmark #2: /tmp/tmp.Ms0KgyWriW/make-4.3/bin/make -j$(nproc) -s ARCH=arm64 CROSS_COMPILE=${HOME}/toolchains/gcc/9.2.0/bin/aarch64-linux- O=out.aarch64 defconfig all
  Time (mean ± σ):     143.395 s ±  5.753 s    [User: 3946.455 s, System: 407.553 s]
  Range (min … max):   135.598 s … 160.649 s    25 runs

Summary
  '/tmp/tmp.Ms0KgyWriW/make-4.2.1/bin/make -j$(nproc) -s ARCH=arm64 CROSS_COMPILE=${HOME}/toolchains/gcc/9.2.0/bin/aarch64-linux- O=out.aarch64 defconfig all' ran
    1.06 ± 0.04 times faster than '/tmp/tmp.Ms0KgyWriW/make-4.3/bin/make -j$(nproc) -s ARCH=arm64 CROSS_COMPILE=${HOME}/toolchains/gcc/9.2.0/bin/aarch64-linux- O=out.aarch64 defconfig all'

I am building clang now to see if the results line up. This is on a machine with Ubuntu 18.04 so it does not have Linus' pipe commit but it should not need it since 4.3 is fixing the bug that it was running into.

@dileks
Copy link
Contributor Author

dileks commented Feb 22, 2020

Numbers talk, bullshit walks. (Linus Torvalds)

Thanks for the numbers @nathanchance.

BTW, I see the latest make in Debian includes SV 51159 fix:

  * Fix potential hangs (Closes: #890309):
    - [SV 51159] Use a non-blocking read with pselect to avoid hangs.
    - [SV 51400] Only unblock fatal signals after child invocation
    - Treat -Otarget and -Orecurse as -Oline, to avoid a hang

I agree that using make v4.3 might be of a higher interest when building the Linux-kernel.

UPDATE: Add Link to Debian bug #890309.

[1] https://sources.debian.org/src/make-dfsg/4.2.1-1.2/debian/changelog/#L13
[2] https://bugs.debian.org/890309

@nathanchance
Copy link
Member

Clang doesn’t seem to be any better:

Summary
  '/tmp/tmp.Ms0KgyWriW/make-4.2.1/bin/make -j$(nproc) -s ARCH=arm64 CC=clang CROSS_COMPILE=aarch64-linux-gnu- LD=ld.lld O=out.aarch64 defconfig all' ran
    1.03 ± 0.12 times faster than '/tmp/tmp.Ms0KgyWriW/make-4.3/bin/make -j$(nproc) -s ARCH=arm64 CC=clang CROSS_COMPILE=aarch64-linux-gnu- LD=ld.lld O=out.aarch64 defconfig all'

Maybe Linus’ pipe changes make 4.3 better but on this machine (stock Ubuntu 18.04 with the stock 4.15 kernel), it is a regression and I don’t think that should be shipped to users.

@nathanchance
Copy link
Member

This is outside the scope of the script. If you want a newer version of make, install it through your package manager or build it from source and install it to somewhere in PATH, where it will be naturally picked up my the script.

@nathanchance nathanchance added the wontfix This will not be worked on label Mar 31, 2020
@dileks
Copy link
Contributor Author

dileks commented May 22, 2020

Update 2020-05-22:

make version 4.3-1 entered Debian/unstable:

# LC_ALL=C make --version
GNU Make 4.3
Built for x86_64-pc-linux-gnu
Copyright (C) 1988-2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Numbers please:

[ make version 4.2.1-2+b1 (Debian/testing AMD64) ]
Start: 08:27
End: 13:29
Total: 05:02

[ make version 4.3-1 (Debian/unstable AMD64) ]
Start: 13:23
End: 18:36
Total: 05:13

[1] https://packages.debian.org/make

@nivedita76
Copy link

So I wired up a benchmark with GCC and the initial results for make 4.3 don't seem good.

$ hyperfine -w 1 -r 25 -p 'rm -rf out.aarch64' '/tmp/tmp.Ms0KgyWriW/make-4.2.1/bin/make -j$(nproc) -s ARCH=arm64 CROSS_COMPILE=${HOME}/toolchains/gcc/9.2.0/bin/aarch64-linux- O=out.aarch64 defconfig all' '/tmp/tmp.Ms0KgyWriW/make-4.3/bin/make -j$(nproc) -s ARCH=arm64 CROSS_COMPILE=${HOME}/toolchains/gcc/9.2.0/bin/aarch64-linux- O=out.aarch64 defconfig all'
Benchmark #1: /tmp/tmp.Ms0KgyWriW/make-4.2.1/bin/make -j$(nproc) -s ARCH=arm64 CROSS_COMPILE=${HOME}/toolchains/gcc/9.2.0/bin/aarch64-linux- O=out.aarch64 defconfig all
  Time (mean ± σ):     134.735 s ±  1.756 s    [User: 3966.322 s, System: 410.203 s]
  Range (min … max):   133.331 s … 141.069 s    25 runs

Benchmark #2: /tmp/tmp.Ms0KgyWriW/make-4.3/bin/make -j$(nproc) -s ARCH=arm64 CROSS_COMPILE=${HOME}/toolchains/gcc/9.2.0/bin/aarch64-linux- O=out.aarch64 defconfig all
  Time (mean ± σ):     143.395 s ±  5.753 s    [User: 3946.455 s, System: 407.553 s]
  Range (min … max):   135.598 s … 160.649 s    25 runs

Summary
  '/tmp/tmp.Ms0KgyWriW/make-4.2.1/bin/make -j$(nproc) -s ARCH=arm64 CROSS_COMPILE=${HOME}/toolchains/gcc/9.2.0/bin/aarch64-linux- O=out.aarch64 defconfig all' ran
    1.06 ± 0.04 times faster than '/tmp/tmp.Ms0KgyWriW/make-4.3/bin/make -j$(nproc) -s ARCH=arm64 CROSS_COMPILE=${HOME}/toolchains/gcc/9.2.0/bin/aarch64-linux- O=out.aarch64 defconfig all'

Both user and system times are better with 4.3 but elapsed time is worse?

I am building clang now to see if the results line up. This is on a machine with Ubuntu 18.04 so it does not have Linus' pipe commit but it should not need it since 4.3 is fixing the bug that it was running into.

This is confusing? The dependency is the other way: I thought Linus' pipe commit isn't really a bug fix but a speedup, which happens to trigger a bug in 4.2.1, which 4.3 is fixing.

@dileks
Copy link
Contributor Author

dileks commented Jun 20, 2020

I have no numbers for you.

But as you say and that is what I have understood Linus' pipe improvements in Linux v5.7 is a speedup when used together with make version 4.3.

If you have a vanilla make version 4.2.1 without the [SV 51159] Use a non-blocking read with pselect to avoid hangs. fix then you should see performance lost.

See Debian bug below [1] and noted in previous conversation here.

Jonathan Derrick reports "Parallel compilation performance regression" in [2] with make -j72.
His environment is not clearly described.
Did you read that thread and came in here :-)?
Unsure how big the number for -j has to be to see a performance benefit or lost.
In my case I use make -j3.

As said we are talking here about tc-build performance in building a llvm-toolchain - without building a Linux-kernel in my case.

According to @nathanchance the build-host is Ubuntu 18.04 LTS with Linux v4.15.
Unsure what system environment is used these days in CBL eco-systems.

[1] https://bugs.debian.org/890309
[2] https://marc.info/?t=159252454100001&r=1&w=2

@nivedita76
Copy link

I have no numbers for you.

But as you say and that is what I have understood Linus' pipe improvements in Linux v5.7 is a speedup when used together with make version 4.3.

If you have a vanilla make version 4.2.1 without the [SV 51159] Use a non-blocking read with pselect to avoid hangs. fix then you should see performance lost.

See Debian bug below [1] and noted in previous conversation here.

Yes, but I'd put the emphasis slightly differently. SV51159 is fixing a bug in Make 4.2.1 which exists even before Linus' pipe patches. After those patches, the Make bug is much more likely to trigger, thus those patches are effectively a performance regression if you don't have SV51159.

Jonathan Derrick reports "Parallel compilation performance regression" in [2] with make -j72.
His environment is not clearly described.
Did you read that thread and came in here :-)?

yes :)

Unsure how big the number for -j has to be to see a performance benefit or lost.
In my case I use make -j3.

As said we are talking here about tc-build performance in building a llvm-toolchain - without building a Linux-kernel in my case.

i.e. you mostly use cmake, not make, and cmake doesn't use pipes for implementing parallelism?

According to @nathanchance the build-host is Ubuntu 18.04 LTS with Linux v4.15.
Unsure what system environment is used these days in CBL eco-systems.

[1] https://bugs.debian.org/890309
[2] https://marc.info/?t=159252454100001&r=1&w=2

@nathanchance
Copy link
Member

So I wired up a benchmark with GCC and the initial results for make 4.3 don't seem good.

$ hyperfine -w 1 -r 25 -p 'rm -rf out.aarch64' '/tmp/tmp.Ms0KgyWriW/make-4.2.1/bin/make -j$(nproc) -s ARCH=arm64 CROSS_COMPILE=${HOME}/toolchains/gcc/9.2.0/bin/aarch64-linux- O=out.aarch64 defconfig all' '/tmp/tmp.Ms0KgyWriW/make-4.3/bin/make -j$(nproc) -s ARCH=arm64 CROSS_COMPILE=${HOME}/toolchains/gcc/9.2.0/bin/aarch64-linux- O=out.aarch64 defconfig all'
Benchmark #1: /tmp/tmp.Ms0KgyWriW/make-4.2.1/bin/make -j$(nproc) -s ARCH=arm64 CROSS_COMPILE=${HOME}/toolchains/gcc/9.2.0/bin/aarch64-linux- O=out.aarch64 defconfig all
  Time (mean ± σ):     134.735 s ±  1.756 s    [User: 3966.322 s, System: 410.203 s]
  Range (min … max):   133.331 s … 141.069 s    25 runs

Benchmark #2: /tmp/tmp.Ms0KgyWriW/make-4.3/bin/make -j$(nproc) -s ARCH=arm64 CROSS_COMPILE=${HOME}/toolchains/gcc/9.2.0/bin/aarch64-linux- O=out.aarch64 defconfig all
  Time (mean ± σ):     143.395 s ±  5.753 s    [User: 3946.455 s, System: 407.553 s]
  Range (min … max):   135.598 s … 160.649 s    25 runs

Summary
  '/tmp/tmp.Ms0KgyWriW/make-4.2.1/bin/make -j$(nproc) -s ARCH=arm64 CROSS_COMPILE=${HOME}/toolchains/gcc/9.2.0/bin/aarch64-linux- O=out.aarch64 defconfig all' ran
    1.06 ± 0.04 times faster than '/tmp/tmp.Ms0KgyWriW/make-4.3/bin/make -j$(nproc) -s ARCH=arm64 CROSS_COMPILE=${HOME}/toolchains/gcc/9.2.0/bin/aarch64-linux- O=out.aarch64 defconfig all'

Both user and system times are better with 4.3 but elapsed time is worse?

I am building clang now to see if the results line up. This is on a machine with Ubuntu 18.04 so it does not have Linus' pipe commit but it should not need it since 4.3 is fixing the bug that it was running into.

This is confusing? The dependency is the other way: I thought Linus' pipe commit isn't really a bug fix but a speedup, which happens to trigger a bug in 4.2.1, which 4.3 is fixing.

I must have misunderstood then. I can't explain the regression in compile time but I can try to run the benchmarks on a few different machines to see if those measurements were inaccurate or an outlier.

@nathanchance
Copy link
Member

Maybe it was just an Ubuntu 18.04 fluke or something since I see no regression with 4.3 on two machines running Ubuntu 20.04. Kernel version on both machines:

$ uname -r
5.4.0-26-generic

c3.medium.x86

Clang 11.0.0:

Command Mean [s] Min [s] Max [s] Relative
make 4.2.1 124.912 ± 0.233 124.520 125.331 1.02 ± 0.00
make 4.3 122.696 ± 0.177 122.425 123.124 1.00

GCC 10.1.0:

Command Mean [s] Min [s] Max [s] Relative
make 4.2.1 103.251 ± 0.157 102.960 103.598 1.02 ± 0.00
make 4.3 101.411 ± 0.064 101.299 101.525 1.00

x2.xlarge.x86

Clang 11.0.0 (do note this one had an outlier, I just don't want to re-run it):

Command Mean [s] Min [s] Max [s] Relative
make 4.2.1 151.990 ± 2.299 150.320 157.917 1.02 ± 0.02
make 4.3 149.202 ± 0.718 148.654 152.099 1.00

GCC 10.1.0:

Command Mean [s] Min [s] Max [s] Relative
make 4.2.1 131.453 ± 0.269 131.038 132.069 1.01 ± 0.00
make 4.3 130.063 ± 0.109 129.895 130.340 1.00

If you are curious about doing your own benchmarks, this is the script I used. I recommend running it in a /tmp directory or something that is easy to clean up after (I used mktemp -d -p .). Really just requires hyperfine to be installed along with some dependencies for building clang.

$ cat test.sh
#!/usr/bin/env bash

set -eux

BASE=$(dirname "$(readlink -f "${0}")")
cd "${BASE}"

# Grab some sources/toolchains
[[ -d make-4.2.1 ]] || curl -LSs http://ftp.gnu.org/gnu/make/make-4.2.1.tar.gz | tar -xzf -
[[ -d make-4.3 ]] || curl -LSs http://ftp.gnu.org/gnu/make/make-4.3.tar.gz | tar -xzf -
[[ -d linux-5.7.4 ]] || curl -LSs https://cdn.kernel.org/pub/linux/kernel/v5.x/linux-5.7.4.tar.xz | tar -xJf -
[[ -d tc-build ]] || git clone https://github.com/ClangBuiltLinux/tc-build
[[ -x gcc-10.1.0-nolibc/aarch64-linux/bin/aarch64-linux-gcc ]] || curl -LSs https://mirrors.edge.kernel.org/pub/tools/crosstool/files/bin/x86_64/10.1.0/x86_64-gcc-10.1.0-nolibc-aarch64-linux.tar.xz | tar -xJf -

# Build make binaries
for MAKE_VER in 4.2.1 4.3; do
    cd "${BASE}"/make-"${MAKE_VER}"
    if [[ ! -x make ]]; then
        # Fix error due to definition of __alloca
        [[ ${MAKE_VER} = "4.2.1" ]] && sed -i '211d;232d' glob/glob.c
        ./configure
        make -j"$(nproc)"
    fi
done

# Build clang + aarch64 binutils
TC_BLD_BIN=${BASE}/tc-build/install/bin
[[ -x ${TC_BLD_BIN}/aarch64-linux-gnu-as ]] || "${BASE}"/tc-build/build-binutils.py -t aarch64
[[ -x ${TC_BLD_BIN}/clang ]] || "${BASE}"/tc-build/build-llvm.py \
    --check-targets clang lld llvm \
    --projects "clang;lld" \
    --targets "AArch64;X86"

# Move into kernel source
cd "${BASE}"/linux-5.7.4

GCC_MAKE_COMMAND="make -skj$(nproc) ARCH=arm64 CROSS_COMPILE=${BASE}/gcc-10.1.0-nolibc/aarch64-linux/bin/aarch64-linux- O=out defconfig all"
hyperfine \
--export-markdown "${BASE}"/make-gcc-results.md \
--prepare 'rm -rf out' \
--runs 25 \
--style basic \
--warmup 1 \
"${BASE}/make-4.2.1/${GCC_MAKE_COMMAND}" \
"${BASE}/make-4.3/${GCC_MAKE_COMMAND}"

export PATH=${TC_BLD_BIN}:${PATH}
CLANG_MAKE_COMMAND="make -skj$(nproc) ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- LLVM=1 O=out defconfig all"
hyperfine \
--export-markdown "${BASE}"/make-clang-results.md \
--prepare 'rm -rf out' \
--runs 25 \
--style basic \
--warmup 1 \
"${BASE}/make-4.2.1/${CLANG_MAKE_COMMAND}" \
"${BASE}/make-4.3/${CLANG_MAKE_COMMAND}"

@dileks
Copy link
Contributor Author

dileks commented Jun 22, 2020

My observation on Debian/testing AMD64 were similiar: No big differences in build-time between Debian's make v4.2.1 and v4.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

3 participants