Expose lto and llvm folder compilation flags #8357

carlopi · 2023-07-25T07:50:51Z

This PR expose 2 independent flags to make and cmake invocations that enable link time optimisations and allow relying on an specific LLVM binary folder.
Examples:

LTO=full make                                          // default options plus passing -flto=full to the compiler/linker
LTO=thin make                                          // default options plus passing -flto=thin to the compiler/linker
CMAKE_LLVM_PATH='~/llvm-project/build' make            // default options using clang++ / clang and llvm-ranlib found in the given folder
LTO=full CMAKE_LLVM_PATH='~/llvm-project/build' make   // default options using clang++ / clang and llvm-ranlib found in the given folder PLUS -flto=full

Option are composable, between each other and with other options, but LTO requires the underlying compiler to supports it.
clang supports both thin and full options, gcc only the full option, while providing a non-supported option will result in a compiler failure.

This PR do not adds either LTO or updated clang to any workflow but for self tests executed in NightlyTests.yml.
Eventually enabling this for distributed binaries is moved to a follow-up PR.

LTO basics

LTO (Link Time Optimisation) is basically trading slower compilation times for somewhat more optimised binary produced.
clang exposes also a rebuild friendly ThinLTO (http://blog.llvm.org/2016/06/thinlto-scalable-and-incremental-lto.html) that aims to be possible to turn in also during development.

This are full rebuild times without LTO, LTO=thin and LTO=full on a Mac M2 using the default clang 14:

GEN=ninja make			489.60s user 30.12s system 678% cpu 1:16.57 total
GEN=ninja LTO=thin make		900.59s user 40.24s system 663% cpu 2:21.72 total
GEN=ninja LTO=full make		711.20s user 46.07s system 341% cpu 3:41.52 total

I tried to estimate performance gains using our current benchmark suite, and speed up for full LTO seems to be, (very unscientifically) about 3% (on geometrical mean of all tests).
More testing / benchmarking is probably required to put a serious number into that. But I'd say the fact that there seems to be no regressions AND some workload are optimised significantly (up to 70% faster) it would make sense to consider this for inclusion.

Idea of this PR is allow, now or in the future, to experiment with this easily.

CMAKE_LLVM_PATH

Currently on most workflows we build DuckDB using the default system compiler, that for clang is version 14 both on macos and on ubuntu 22.04.
Current development is at clang 17 (used by duckdb-wasm), while clang 16 is already stable and packaged for example via brew install llvm.
On my machine (Mac M2):

brew install llvm
locate llvm-ranlib
--- /opt/homebrew/Cellar/llvm/16.0.6/bin/llvm-ranlib
CMAKE_LLVM_PATH='/opt/homebrew/Cellar/llvm/16.0.6' make

allows to build DuckDB using clang 16.0.6 instead of the stock clang 14.0.3.
More recent compilers allows more optimisation opportunities to be leveraged, also here benchmarking has been done not very seriously but seems to point towards something like 5% improvement in execution speed.

I have done this only for LLVM/clang (since it required llvm-ranlib to be specified), but potentially it could be worth exploring performance gains for more recent gcc versions.

How to roll this in

My idea is that having there options more easily available allows more experimenting with this, potentially moving some nightly binaries to use LTO and more recent compilers version and collect feedback to be able to decide whether this is worthy to be turned on also for proper releases.
But input is very welcome, and if someone wants to take this over and give a critical look, you are very welcome!

Note on benchmarking

Benchmarking is hard, especially if you do that while trying to prove a point.
I added to the regression test runner a summary such as "new is roughly X% faster | old is roughly Y% faster | about the same". This is done comparing geometrical means. It's a very blunt simplification, do take this with quite some distance, if it does more harm than good it should be removed.

I also added 3 jobs to be executed on nightly that re use the benchmark runner to evaluate LTO gains (on clang and gcc) and performance differences between clang and gcc. Role of these tests is mostly checking that LTO and CMAKE_LLVM_PATH options keep working over time.

Current invocation of regression script involves lots of copy pasting, probably logic should be refactored, but at first I though this was clearer.

carlopi · 2023-07-25T19:11:44Z

There is still an unconnected failure in Linux job here: https://github.com/duckdb/duckdb/actions/runs/5659062468/job/15331786339?pr=8357#step:7:2949, and a few unconnected jobs that are still to be done.

I run a few experiments as part of the tests, using geometric mean of current regression tests.
Results are:

image	base	alternative	micro	tpch	tpcds	h2oai	imdb
macos-latest	clang 14	clang 14, LTO=full	-17%	-3%	-2%	-5%	-5%
macos-latest	clang 14	clang 16	-15%	same	-2%	same	same
ubuntu-latest	gcc 11.3	clang 14	-7%	+2%	-4%	-2%	-3%
ubuntu-latest	gcc 11.3	gcc 11.3, LTO=full	-1%	-2%	-1%	-3%	-1%

Unsure what should be read here, something like:
clang marginally better than gcc, newer compiler version marginally better than older compiler versions, gcc LTO marginally better than regular gcc, clang LTO visibly better than regular clang.
Takeaway is that probably best combination is recent clang with LTO enabled, but hard to put actual numbers, and to be balanced with bringing in additional dependencies or existing restrictions.

samansmink

Very cool work! I think we should carefully consider this. Running this on master builds seems viable CI-time wise for sure, and probably worth it. However, the question then would be how often will we run into ci failures that only occur on lto builds. Having to debug issues that are only caught on master lto builds seems like a potentially painful process that cost a lot of dev time

Mytherin

Thanks for the PR! LGTM - one comment:

.github/workflows/NightlyTests.yml

carlopi · 2023-07-26T18:58:23Z

Thanks @samansmink and @Mytherin for the feedback.

I guess the hard choice that is what to make of all this / what to turn on and on what condition, and that would probably require some added considerations. On this PR I took the easy road of just providing options (and avoiding to have to re-discover the set of changes that were needed again in the future) without making real choices.

This PR is having another round of CI since I moved the benchmarks to a separate workflow (to be run only via workflow dispatch or on changes to the workflow itself), but on my side is ready to be merged.

If/when we want to experiment, I would consider probably easier to do so on OSX-based workflows, given that there clang it's already the default + we have an easier time testing them, and potentially moving from clang 14 to brew installed clang 16 and enabling LTO can bring gains with lower risks.

To be enabled via `LTO=thin make` or `LTO=full make`. To opt-in LTO variable has to be defined to something that clang will recognize LTO or FullLTO implies running additional optimisations at link time, trading off time spent compiling with improved compiled binary (both smaller and somehow more performant). ThinLTO aims at reaching similar gains with a smaller footprint AND avoiding degenerate cases where recompilations times becomes similar to compiling each time from scratch. Here some background on ThinLTO: http://blog.llvm.org/2016/06/thinlto-scalable-and-incremental-lto.html

Example of use `CMAKE_LLVM_PATH=~/llvm-project/build make` or `CMAKE_LLVM_PATH=/opt/homebrew/Cellar/llvm/16.0.6/ make` This is currently done only of LLVM/Clang, since executable names are hardcoded (eg llvm-ranlib). Same logic can be adapted to other compilers if we found it useful

This has two roles: check these option will keep working AND give a rough estimate of what can be gained by turning these on

LinkTimeOptimizations is available also in gcc, so make it also turn it on in the -flto version. Here additional details: https://gcc.gnu.org/onlinedocs/gccint/LTO-Overview.html, both on -flto and -whopr (somehow similar to ThinLTO)

…results

Geometrical mean is very blunt, not to be used too seriously, but at least summarize as a single number "better / about the same / worst"

Mytherin · 2023-08-07T13:04:08Z

Thanks - we can merge this and leave the actual choice of whether/where we want to enable LTO for a later date.

carlopi force-pushed the lto_and_llvm_flags branch from 0c578fa to 756610a Compare July 25, 2023 07:53

github-actions bot marked this pull request as draft July 25, 2023 07:53

carlopi marked this pull request as ready for review July 25, 2023 09:06

carlopi requested review from samansmink and Mytherin July 25, 2023 09:06

github-actions bot marked this pull request as draft July 25, 2023 09:13

carlopi marked this pull request as ready for review July 25, 2023 09:13

carlopi force-pushed the lto_and_llvm_flags branch from 080f887 to 9a8b456 Compare July 25, 2023 15:59

github-actions bot marked this pull request as draft July 25, 2023 16:00

carlopi marked this pull request as ready for review July 25, 2023 16:00

carlopi force-pushed the lto_and_llvm_flags branch from 9a8b456 to ed8ade8 Compare July 26, 2023 05:57

github-actions bot marked this pull request as draft July 26, 2023 05:57

carlopi marked this pull request as ready for review July 26, 2023 05:58

samansmink reviewed Jul 26, 2023

View reviewed changes

Mytherin reviewed Jul 26, 2023

View reviewed changes

.github/workflows/NightlyTests.yml Outdated Show resolved Hide resolved

github-actions bot marked this pull request as draft July 26, 2023 18:35

carlopi marked this pull request as ready for review July 26, 2023 18:36

carlopi force-pushed the lto_and_llvm_flags branch from ad9873a to 8a56b9a Compare July 27, 2023 05:15

github-actions bot marked this pull request as draft July 27, 2023 05:15

carlopi marked this pull request as ready for review July 27, 2023 05:15

carlopi added 8 commits July 28, 2023 10:19

Add 2 nightly tests to test LTO and clang-custom folder options

ed913c0

This has two roles: check these option will keep working AND give a rough estimate of what can be gained by turning these on

Regression test runner: print also geometric mean of complete set of …

42af56f

…results

Regression test runner: Emit estimate on percentage change

22cd5fd

Geometrical mean is very blunt, not to be used too seriously, but at least summarize as a single number "better / about the same / worst"

black to format script/regression_test_runner.py

ef6fc42

Regression test runner: option to avoid regression related failures

661e7b8

carlopi added 4 commits July 28, 2023 10:19

NightlyTests.yml: Fix path to benchmark_runner

7f74bc9

Remove verbose logging from benchmarks

6a1a274

NightlyTests: Add OSX benchmark on clang 14 vs clang 16

3295c36

CI: Move extended regression tests to ExtendedTests.yml

cff0a04

carlopi force-pushed the lto_and_llvm_flags branch from 8a56b9a to cff0a04 Compare July 28, 2023 08:20

github-actions bot marked this pull request as draft July 28, 2023 08:20

carlopi marked this pull request as ready for review July 28, 2023 08:20

Mytherin merged commit 538b9f9 into duckdb:master Aug 7, 2023
72 of 74 checks passed

carlopi deleted the lto_and_llvm_flags branch August 28, 2023 12:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose lto and llvm folder compilation flags #8357

Expose lto and llvm folder compilation flags #8357

carlopi commented Jul 25, 2023

carlopi commented Jul 25, 2023 •

edited

samansmink left a comment

Mytherin left a comment

carlopi commented Jul 26, 2023

Mytherin commented Aug 7, 2023

Expose lto and llvm folder compilation flags #8357

Expose lto and llvm folder compilation flags #8357

Conversation

carlopi commented Jul 25, 2023

LTO basics

CMAKE_LLVM_PATH

How to roll this in

Note on benchmarking

carlopi commented Jul 25, 2023 • edited

samansmink left a comment

Choose a reason for hiding this comment

Mytherin left a comment

Choose a reason for hiding this comment

carlopi commented Jul 26, 2023

Mytherin commented Aug 7, 2023

carlopi commented Jul 25, 2023 •

edited