Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add building and testing of PGO+LTO LLVM and Julia #345

Open
wants to merge 31 commits into
base: main
Choose a base branch
from

Conversation

Zentrik
Copy link
Contributor

@Zentrik Zentrik commented Mar 26, 2024

I ended up just using the profile from Julia being built for PGO, locally it seems to work fine. If anyone has better ideas for what to profile, please do say.

I don't think we need to run the full test suite but it seems good to run at least some stuff to make sure nothing got messed up. The testing seems to finish a fair bit quicker than the gnuassertrr tests and not that far behind the other test pipelines, so this shouldn't slow down CI much.

@Zentrik
Copy link
Contributor Author

Zentrik commented Mar 26, 2024

I don't think I have permissions to fix ERROR: Pipeline '.buildkite/pipelines/main/launch_upload_jobs.yml' fails treehash signature check! You may need to re-run cryptic/bin/sign_treehashes!.

@Zentrik
Copy link
Contributor Author

Zentrik commented Mar 26, 2024

Build currently failing as the bump to patchelf JuliaLang/julia#53845 introduced a check that the compiler supports C++ 17. Not sure why that check is failing on CI as it does seem to be using clang 16.

We run /cache/build/builder-amdci4-3/julialang/julia-buildkite/deps/srccache/patchelf-0.18.0//configure --prefix=/cache/build/builder-amdci4-3/julialang/julia-buildkite/contrib/pgo-lto/stage1.build/usr --build=x86_64-unknown-linux-gnu --libdir=/cache/build/builder-amdci4-3/julialang/julia-buildkite/contrib/pgo-lto/stage1.build/usr/lib --bindir=/cache/build/builder-amdci4-3/julialang/julia-buildkite/contrib/pgo-lto/stage1.build/usr/tools LDFLAGS="-fuse-ld=lld -flto=thin -fprofile-generate=/cache/build/builder-amdci4-3/julialang/julia-buildkite/contrib/pgo-lto/profiles -Wl,-rpath,'\$\$ORIGIN' -Wl,-z,origin -Wl,-rpath-link,/cache/build/builder-amdci4-3/julialang/julia-buildkite/contrib/pgo-lto/stage1.build/usr/lib -Wl,--enable-new-dtags " F77="gfortran -m64" CC="/cache/build/builder-amdci4-3/julialang/julia-buildkite/contrib/pgo-lto/stage0.build/usr/tools/clang " CXX="/cache/build/builder-amdci4-3/julialang/julia-buildkite/contrib/pgo-lto/stage0.build/usr/tools/clang++ " LD="/cache/build/builder-amdci4-3/julialang/julia-buildkite/contrib/pgo-lto/stage0.build/usr/tools/ld.lld" LDFLAGS="" CPPFLAGS="" for the PGO build and the failure is checking whether /cache/build/builder-amdci4-3/julialang/julia-buildkite/contrib/pgo-lto/stage0.build/usr/tools/clang++ supports C++17 features with -std=c++17... no.
For the non PGO build we run /cache/build/tester-amdci5-9/julialang/julia-buildkite/deps/srccache/patchelf-0.18.0//configure --prefix=/cache/build/tester-amdci5-9/julialang/julia-buildkite/usr --build=x86_64-linux-gnu --libdir=/cache/build/tester-amdci5-9/julialang/julia-buildkite/usr/lib --bindir=/cache/build/tester-amdci5-9/julialang/julia-buildkite/usr/tools LDFLAGS=" -Wl,-rpath,'\$\$ORIGIN' -Wl,-z,origin -Wl,-rpath-link,/cache/build/tester-amdci5-9/julialang/julia-buildkite/usr/lib -Wl,--enable-new-dtags " F77="gfortran -m64" CC="gcc -m64 " CXX="g++ -m64 " LD="ld" LDFLAGS="" CPPFLAGS="" and checking whether g++ -m64 supports C++17 features with -std=c++17... yes is the relevant check.

The line that runs the check is https://github.com/NixOS/patchelf/blob/99c24238981b7b1084313aca8f5c493bb46f302c/configure.ac#L35 and the macro defined in https://github.com/NixOS/patchelf/blob/master/m4/ax_cxx_compile_stdcxx.m4.

The only obvious difference I see is the -m64 but I really have no clue why this is failing on CI.

@Zentrik
Copy link
Contributor Author

Zentrik commented Mar 26, 2024

Maybe clang is picking the wrong gcc?

@Zentrik
Copy link
Contributor Author

Zentrik commented Mar 26, 2024

Ok so the actual error is

configure:6320: /cache/build/builder-amdci4-6/julialang/julia-buildkite/contrib/pgo-lto/stage0.build/usr/tools/clang++ -m64  -std=c++17 -c -fprofile-generate=/cache/build/builder-amdci4-6/julialang/julia-buildkite/contrib/pgo-lto/profiles -Xclang -mllvm -Xclang -vp-counters-per-site=6  conftest.cpp >&5
conftest.cpp:437:10: fatal error: 'initializer_list' file not found
#include <initializer_list>

JuliaLang/julia#45641 (comment) mentions something about libstdc++ detection, so maybe something to do with that.
Clang does seem to be loading /usr/lib/gcc/x86_64-linux-gnu/10 the only gcc installation it finds whereas under 'Print software versions' we see gcc version 9.

@haampie
Copy link

haampie commented Mar 26, 2024

Is that a partial gcc 10 install? No g++10 / libstdc++-10-dev?

You can force clang to use gcc 9 if necessary with --gcc-install-dir=/usr/lib/gcc/x86_64-linux-gnu/9

Notice that --gcc-toolchain is entirely useless when multiple GCCs are installed in the same prefix /usr.

@Zentrik
Copy link
Contributor Author

Zentrik commented Mar 26, 2024

I'm quite out of my depth here but I'm not sure gcc is installed in /usr/lib/gcc/x86_64-linux-gnu/9 because this was printed which seems to suggest it's installed elsewhere

COLLECT_LTO_WRAPPER=/usr/local/bin/../libexec/gcc/x86_64-linux-gnu/9.1.0/lto-wrapper
Target: x86_64-linux-gnu
Configured with: /workspace/srcdir/gcc-9.1.0/configure --prefix=/workspace/destdir --with-build-sysroot=/workspace/destdir/x86_64-linux-gnu --with-sysroot=/workspace/destdir/x86_64-linux-gnu --with-gxx-include-dir=/workspace/destdir/x86_64-linux-gnu/include/c++/9.1.0 --target=x86_64-linux-gnu --host=x86_64-linux-gnu --build=x86_64-linux-gnu --disable-multilib --disable-werror --disable-bootstrap --enable-threads=posix --program-prefix=x86_64-linux-gnu- --with-arch=x86-64 --enable-languages=c,c++

@Zentrik
Copy link
Contributor Author

Zentrik commented Mar 26, 2024

It does seem to find the correct gcc version now. Yes, it does seem to get through patchelf's configure phase but now fails at making it

/cache/build/builder-amdci4-0/julialang/julia-buildkite/contrib/pgo-lto/stage0.build/usr/tools/clang++ -DPACKAGE_NAME=\"patchelf\" -DPACKAGE_TARNAME=\"patchelf\" -DPACKAGE_VERSION=\"0.18.0\" -DPACKAGE_STRING=\"patchelf\ 0.18.0\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"patchelf\" -DVERSION=\"0.18.0\" -DHAVE_CXX17=1 -I. -I/cache/build/builder-amdci4-0/julialang/julia-buildkite/deps/srccache/patchelf-0.18.0/src    -Wall -Wextra -Wcast-qual -std=c++17 -D_FILE_OFFSET_BITS=64     -fprofile-generate=/cache/build/builder-amdci4-0/julialang/julia-buildkite/contrib/pgo-lto/profiles -Xclang -mllvm -Xclang -vp-counters-per-site=6 -MT patchelf.o -MD -MP -MF .deps/patchelf.Tpo -c -o patchelf.o /cache/build/builder-amdci4-0/julialang/julia-buildkite/deps/srccache/patchelf-0.18.0/src/patchelf.cc
/cache/build/builder-amdci4-0/julialang/julia-buildkite/deps/srccache/patchelf-0.18.0/src/patchelf.cc:19:10: fatal error: 'algorithm' file not found
#include <algorithm>

@haampie
Copy link

haampie commented Mar 26, 2024

The build environment likely contains just the C compiler part of GCC 10, and clang has poor detection & heuristics for the GCC install dir. If multiple are in the same prefix it picks the newest, and it will do so also when there are missing components like C++ stdlib header files. The --gcc-toolchain flag is useles cause it expects the prefix /usr, in which both GCC 9 and 10 live.

Use --gcc-install-dir=/usr/lib/gcc/x86_64-linux-gnu/9 or uninstall GCC 10.

@Zentrik
Copy link
Contributor Author

Zentrik commented Mar 26, 2024

/cache/build/builder-amdci5-3/julialang/julia-buildkite/contrib/pgo-lto/stage0.build/usr/tools/clang  -fprofile-generate=/cache/build/builder-amdci5-3/julialang/julia-buildkite/contrib/pgo-lto/profiles -Xclang -mllvm -Xclang -vp-counters-per-site=6 -O2 -m64 --gcc-install-dir=/usr/lib/gcc/x86_64-linux-gnu/9 -fPIC -std=c99 -Wsign-conversion -Wall -Wextra -pedantic -DUTF8PROC_EXPORTS  -c -o utf8proc.o utf8proc.c
clang-16: error: '/usr/lib/gcc/x86_64-linux-gnu/9' does not contain a GCC installation

On the previous commit I saw

Found candidate GCC installation: /usr/local/bin/../lib/gcc/x86_64-linux-gnu/9.1.0//../../../../lib/gcc/x86_64-linux-gnu/9.1.0
Selected GCC installation: /usr/local/bin/../lib/gcc/x86_64-linux-gnu/9.1.0//../../../../lib/gcc/x86_64-linux-gnu/9.1.0

So I presume the specified path was wrong.

@Zentrik
Copy link
Contributor Author

Zentrik commented Mar 26, 2024

Thanks, it seems to be working much better now. I just need to figure out why there's a hash mismatch, I had the same locally but I just refreshed it.

Now takes ~10 minutes.
Precompiling OrdinaryDiffEq takes 367s and Plots 135s. Actually running the script takes 1-2 minutes.
Not really sure why this works, but it seems to.
I think 0f8bae0 might have slowed down the binary. At least locally this seems to be roughly the same as using pgo_script prior to 0f8bae0.

I have 7 binaries locally and cannot work out why some are slower than others but I think this should be good enough.
Buildkite was complaining about non-alphanumeric character
Hopefully this fixes the error as I have no idea how to debug this
This seems to be what the form source builds use, though they seem to have been failing for the last year. Not sure if I still need to specify `--gcc-install-dir`.
This gets passed to patch elf's configure step
@Zentrik Zentrik force-pushed the main branch 3 times, most recently from fc58388 to f2ffa42 Compare April 21, 2024 12:13
GCCFLAG should no longer be required now that we use the llvmpasses image
@Zentrik
Copy link
Contributor Author

Zentrik commented Apr 21, 2024

This looks good to me, testing was 15% faster. If we want -Werror I think we just need to merge or otherwise fix JuliaLang/julia#53897 and bump libwhich to vtjnash/libwhich@613b3f0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants