Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLVM Clang is 20% slower than it should be #77975

Closed
2 tasks done
carlocab opened this issue May 25, 2021 · 3 comments
Closed
2 tasks done

LLVM Clang is 20% slower than it should be #77975

carlocab opened this issue May 25, 2021 · 3 comments
Labels
bug Reproducible Homebrew/homebrew-core bug help wanted Task(s) needing PRs from the community or maintainers outdated PR was locked due to age

Comments

@carlocab
Copy link
Member

carlocab commented May 25, 2021

brew gist-logs <formula> link OR brew config AND brew doctor output

❯ brew config
HOMEBREW_VERSION: 3.1.9-7-ge2febdf
ORIGIN: https://github.com/Homebrew/brew
HEAD: e2febdfd0796ab04fb87b8b93ce2aed74225dad8
Last commit: 27 minutes ago
Core tap ORIGIN: https://github.com/Homebrew/homebrew-core
Core tap HEAD: 2143f2525c1454d09d89f29998e111417969d5ab
Core tap last commit: 42 minutes ago
Core tap branch: master
HOMEBREW_PREFIX: /usr/local
HOMEBREW_BAT: set
HOMEBREW_BOOTSNAP: set
HOMEBREW_CASK_OPTS: []
HOMEBREW_COLOR: set
HOMEBREW_DEVELOPER: set
HOMEBREW_EDITOR: nvim
HOMEBREW_FORCE_BREWED_CURL: set
HOMEBREW_FORCE_BREWED_GIT: set
HOMEBREW_GITHUB_PACKAGES_TOKEN: set
HOMEBREW_GITHUB_PACKAGES_USER: carlocab
HOMEBREW_GIT_EMAIL: 30379873+carlocab@users.noreply.github.com
HOMEBREW_GIT_NAME: Carlo Cabrera
HOMEBREW_MAKE_JOBS: 4
HOMEBREW_NO_AUTO_UPDATE: set
HOMEBREW_PRY: set
Homebrew Ruby: 2.6.3 => /System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/bin/ruby
CPU: quad-core 64-bit icelake
Clang: 12.0.5 build 1205
Git: 2.31.1 => /usr/local/opt/git/bin/git
Curl: 7.76.1 => /usr/local/opt/curl/bin/curl
macOS: 11.3.1-x86_64
CLT: 12.5.0.22.9
Xcode: N/A

❯ brew doctor
Please note that these warnings are just used to help the Homebrew maintainers
with debugging if you file an issue. If everything you use Homebrew for is
working fine: please don't worry or file an issue; just ignore this. Thanks!

Warning: Putting non-prefixed coreutils in your path can cause GMP builds to fail.

Warning: Putting non-prefixed findutils in your path can cause python builds to fail.

  • I ran brew update and am still able to reproduce my issue.
  • I have resolved all warnings from brew doctor and that did not fix my problem.

What were you trying to do (and why)?

Build formulae using LLVM Clang instead of system Clang. This is needed to resolve build failures on Mojave for certain formulae that use C++17 features that haven't been implemented in older versions of Clang.

See, for example: #77819, #77505, #77504, #74765

What happened (include all command output)?

The Mojave nodes systematically took a longer time to complete their runs than the other Intel nodes:

PR Big Sur Catalina Mojave
#77819 31m 41s 29m 32s 34m 59s
#77505 8m 33s 8m 15s 9m 16s
#77504 16m 1s 15m 9s 17m 48s
#74765 1h 1m 38s 55m 44s 40m 11s

This pattern does not hold for #74765 (gRPC) because, at the time of that PR, there were several gRPC-dependent formulae (e.g. mavsdk, bear) that declared depends_on macos: :catalina.

What did you expect to happen?

The Mojave nodes are typically 20-30% faster than the other Intel CI nodes, so I expected Mojave CI to not be slower than the other Intel CI nodes on those PRs.

It's possible that this is just a regression coming from LLVM 12, but I think there are still tweaks we can do with the formula to try to minimise this gap. This is not a regression from LLVM 12. Doing the same exercise with llvm@11 shows an even larger performance gap. See succeeding comments in this issue.

Step-by-step reproduction instructions (by running brew commands)

❯ hyperfine --prepare='brew uninstall hello || true' --warmup=1 --parameter-list compiler clang,gcc,llvm_clang 'brew install -s --cc={compiler} hello'
Benchmark #1: brew install -s --cc=clang hello
  Time (mean ± σ):     54.466 s ±  2.345 s    [User: 31.460 s, System: 23.392 s]
  Range (min … max):   50.691 s … 58.124 s    10 runs

Benchmark #2: brew install -s --cc=gcc hello
  Time (mean ± σ):     55.394 s ±  3.557 s    [User: 31.098 s, System: 23.556 s]
  Range (min … max):   50.850 s … 60.694 s    10 runs

Benchmark #3: brew install -s --cc=llvm_clang hello
  Time (mean ± σ):     65.602 s ±  4.668 s    [User: 35.053 s, System: 28.252 s]
  Range (min … max):   57.526 s … 74.649 s    10 runs

Summary
  'brew install -s --cc=clang hello' ran
    1.02 ± 0.08 times faster than 'brew install -s --cc=gcc hello'
    1.20 ± 0.10 times faster than 'brew install -s --cc=llvm_clang hello'
@carlocab carlocab added the bug Reproducible Homebrew/homebrew-core bug label May 25, 2021
@carlocab
Copy link
Member Author

carlocab commented May 25, 2021

As an example for things that can be tweaked in the formula:

Building with ENV["HOMEBREW_OPTIMIZATION_LEVEL] = "O3" leads the following benchmark:

❯ hyperfine --prepare='brew uninstall hello || true' --warmup=1 --parameter-list compiler clang,gcc,llvm_clang 'brew install -s --cc={compiler} hello'
Benchmark #1: brew install -s --cc=clang hello
  Time (mean ± σ):     77.139 s ± 10.377 s    [User: 40.030 s, System: 31.192 s]
  Range (min … max):   63.561 s … 98.281 s    10 runs

Benchmark #2: brew install -s --cc=gcc hello
  Time (mean ± σ):     76.367 s ±  8.564 s    [User: 38.406 s, System: 30.673 s]
  Range (min … max):   67.180 s … 93.893 s    10 runs

Benchmark #3: brew install -s --cc=llvm_clang hello
  Time (mean ± σ):     87.162 s ± 21.648 s    [User: 40.989 s, System: 35.121 s]
  Range (min … max):   62.428 s … 122.935 s    10 runs

Summary
  'brew install -s --cc=gcc hello' ran
    1.01 ± 0.18 times faster than 'brew install -s --cc=clang hello'
    1.14 ± 0.31 times faster than 'brew install -s --cc=llvm_clang hello'

Not sure why everything is slower than in my original benchmark, but what's important is the relative difference between each compiler. (Maybe my laptop was tired after building LLVM...)

Another is perhaps to instead follow the suggestion here: https://llvm.org/docs/BuildingADistribution.html

I think the llvm formula installs a stage1 compiler, whereas gcc installs a compiler built after building a bootstrap stage1 compiler using the system compiler. If this is the case, then we have an under-optimised compiler, so it's not surprising it runs more slowly than it could.

@carlocab carlocab changed the title LLVM Clang is slow LLVM Clang is 20% slower than it should be May 25, 2021
@carlocab carlocab added the help wanted Task(s) needing PRs from the community or maintainers label May 25, 2021
@carlocab
Copy link
Member Author

Just to make the comparison a bit more direct, we can do the same exercise with llvm@11 by making the $(brew --prefix)/opt/llvm symlink point to the llvm@11 keg:

❯ hyperfine --prepare='brew uninstall hello || true' --warmup=1 --parameter-list compiler llvm_clang,gcc,clang 'brew install -s --cc={compiler} hello' --export-markdown=benchmark-llvm11.md
Benchmark #1: brew install -s --cc=llvm_clang hello
  Time (mean ± σ):     66.180 s ±  8.543 s    [User: 37.319 s, System: 27.661 s]
  Range (min … max):   53.543 s … 82.929 s    10 runs

Benchmark #2: brew install -s --cc=gcc hello
  Time (mean ± σ):     46.629 s ±  0.238 s    [User: 26.621 s, System: 19.016 s]
  Range (min … max):   46.106 s … 46.952 s    10 runs

Benchmark #3: brew install -s --cc=clang hello
  Time (mean ± σ):     44.927 s ±  0.383 s    [User: 25.795 s, System: 18.500 s]
  Range (min … max):   44.437 s … 45.720 s    10 runs

Summary
  'brew install -s --cc=clang hello' ran
    1.04 ± 0.01 times faster than 'brew install -s --cc=gcc hello'
    1.47 ± 0.19 times faster than 'brew install -s --cc=llvm_clang hello'

@carlocab
Copy link
Member Author

I built LLVM with PGO according to the guide in the upstream docs. It did improve on my original benchmark somewhat: Apple Clang is only 14% faster at building hello than LLVM Clang with PGO.

However, it did create substantial improvements with some other benchmarks.


With PGO:

❯ hyperfine --prepare='~/homebrew/bin/brew uninstall luajit || true' --warmup=1 --parameter-list compiler clang,llvm_clang '~/homebrew/bin/brew install --HEAD --cc={compiler} luajit' --export-markdown=benchmark-llvm-PGO.md
Benchmark #1: ~/homebrew/bin/brew install --HEAD --cc=clang luajit
  Time (mean ± σ):     29.095 s ±  1.372 s    [User: 25.691 s, System: 6.977 s]
  Range (min … max):   27.524 s … 31.570 s    10 runs

Benchmark #2: ~/homebrew/bin/brew install --HEAD --cc=llvm_clang luajit
  Time (mean ± σ):     28.098 s ±  2.239 s    [User: 24.863 s, System: 6.773 s]
  Range (min … max):   26.080 s … 33.703 s    10 runs

Summary
  '~/homebrew/bin/brew install --HEAD --cc=llvm_clang luajit' ran
    1.04 ± 0.10 times faster than '~/homebrew/bin/brew install --HEAD --cc=clang luajit'

Without PGO:

❯ hyperfine --prepare='~/homebrew/bin/brew uninstall luajit || true' --warmup=1 --parameter-list compiler clang,llvm_clang '~/homebrew/bin/brew install --HEAD --cc={compiler} luajit' --export-markdown=benchmark-llvm-bottle.md
Benchmark #1: ~/homebrew/bin/brew install --HEAD --cc=clang luajit
  Time (mean ± σ):     25.827 s ±  0.778 s    [User: 23.915 s, System: 5.783 s]
  Range (min … max):   24.950 s … 27.218 s    10 runs

Benchmark #2: ~/homebrew/bin/brew install --HEAD --cc=llvm_clang luajit
  Time (mean ± σ):     32.664 s ±  1.762 s    [User: 29.682 s, System: 6.802 s]
  Range (min … max):   30.308 s … 36.366 s    10 runs

Summary
  '~/homebrew/bin/brew install --HEAD --cc=clang luajit' ran
    1.26 ± 0.08 times faster than '~/homebrew/bin/brew install --HEAD --cc=llvm_clang luajit'

With PGO:

❯ hyperfine --prepare='~/homebrew/bin/brew uninstall tmux || true' --warmup=1 --parameter-list compiler clang,llvm_clang '~/homebrew/bin/brew install -s --cc={compiler} tmux' --export-markdown=tmux-benchmark-llvm-PGO.md
Benchmark #1: ~/homebrew/bin/brew install -s --cc=clang tmux
  Time (mean ± σ):     40.649 s ±  1.390 s    [User: 62.938 s, System: 20.672 s]
  Range (min … max):   38.668 s … 42.188 s    10 runs

Benchmark #2: ~/homebrew/bin/brew install -s --cc=llvm_clang tmux
  Time (mean ± σ):     45.953 s ±  0.484 s    [User: 63.233 s, System: 23.870 s]
  Range (min … max):   45.463 s … 47.015 s    10 runs

Summary
  '~/homebrew/bin/brew install -s --cc=clang tmux' ran
    1.13 ± 0.04 times faster than '~/homebrew/bin/brew install -s --cc=llvm_clang tmux'

Without PGO:

❯ hyperfine --prepare='~/homebrew/bin/brew uninstall tmux || true' --warmup=1 --parameter-list compiler clang,llvm_clang '~/homebrew/bin/brew install -s --cc={compiler} tmux' --export-markdown=tmux-benchmark-llvm-bottle.md
Benchmark #1: ~/homebrew/bin/brew install -s --cc=clang tmux
  Time (mean ± σ):     40.904 s ±  2.824 s    [User: 61.855 s, System: 20.618 s]
  Range (min … max):   38.148 s … 48.078 s    10 runs

Benchmark #2: ~/homebrew/bin/brew install -s --cc=llvm_clang tmux
  Time (mean ± σ):     58.321 s ±  5.809 s    [User: 78.444 s, System: 26.163 s]
  Range (min … max):   48.843 s … 69.784 s    10 runs

Summary
  '~/homebrew/bin/brew install -s --cc=clang tmux' ran
    1.43 ± 0.17 times faster than '~/homebrew/bin/brew install -s --cc=llvm_clang tmux'

@carlocab carlocab mentioned this issue Jun 16, 2021
6 tasks
carlocab added a commit to carlocab/homebrew-core that referenced this issue Jun 20, 2021
This patch implements various improvements I found while working on Homebrew#77975.

A few are cosmetic, but the primary substantive change is the use of the
`C_INCLUDE_DIRS` CMake variable. [1]

This adds Homebrew's `include` directory along with Xcode's system
header path to Clang's default include search path. The former change
aligns our Clang with Apple's, which searches `/usr/local/include` by
default. [2] The latter change allows the bottle to be poured in
Xcode-only installs so that we no longer need the `pour_bottle?` check.

We also add `/Library/Developer/CommandLineTools/usr/include` to the
default header search path to align our build with Apple's.

[1] https://reviews.llvm.org/D69221
[2] From `/usr/bin/clang -E -xc -v /dev/null`:

    Apple clang version 12.0.5 (clang-1205.0.22.9)
    [snip]
    #include "..." search starts here:
    #include <...> search starts here:
     /usr/local/include
     /Library/Developer/CommandLineTools/usr/lib/clang/12.0.5/include
     /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include
     /Library/Developer/CommandLineTools/usr/include
     /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/System/Library/Frameworks (framework directory)
    End of search list.
    [snip]
carlocab added a commit to carlocab/homebrew-core that referenced this issue Jul 4, 2021
This builds LLVM with profile-guided optimisations. I've adapted the
build from upstream documentation. [1, 2]

This does mean that builds will now take 2-3x longer than before. LLVM
currently takes about four hours to build at most (including recursive
dependents), so the additional time isn't as bad as some other formulae
that require long CI runs. Moreover, the collective time this saves
users of the LLVM formula should make the additional build time worth
it.

LTO is another potential optimisation that I haven't enabled here.
This appears to be enabled in Apple's default build [3], but is a little
complicated to implement for an LLVM distribution that includes static
archives [4].

Closes Homebrew#77975

[1] https://llvm.org/docs/HowToBuildWithPGO.html#building-clang-with-pgo
[2] https://github.com/llvm/llvm-project/blob/33ba8bd2/llvm/utils/collect_and_build_with_pgo.py
[3] https://github.com/apple/llvm-project/blob/swift-5.4-RELEASE/clang/cmake/caches/Apple-stage2.cmake#L30
[4] https://llvm.org/docs/BuildingADistribution.html#options-for-optimizing-llvm
@github-actions github-actions bot added the outdated PR was locked due to age label Aug 5, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 5, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Reproducible Homebrew/homebrew-core bug help wanted Task(s) needing PRs from the community or maintainers outdated PR was locked due to age
Projects
None yet
Development

No branches or pull requests

1 participant