Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: build Valgrind (3.21) from source #27992

Closed
wants to merge 1 commit into from

Conversation

fanquake
Copy link
Member

I've been using this branch for some time, for working Valgrind CI jobs on aarch64. Benefits include:

  • Valgrind CI jobs work across x86_64 & aarch64.
  • Can use latest (hopefully less buggy) Valgrind, rather than whatever the distro happens to package.
  • No need to "bless" a specific compiler, (current discussion includes switching from Clang to GCC as a workaround).
  • Valgrind from source runs significantly faster compared to the system package. i.e, when fuzzing under valgrind:

Master:

asmap_direct with args
Done 646 runs in 155 second(s)
....
addrman_deserialize with args
Done 2944 runs in 2875 second(s)

vs running this branch:

asmap_direct with args
Done 646 runs in 23 second(s)
...
addrman_deserialize with args
Done 2944 runs in 413 second(s)

This is also being seen in the qa-assets repo: bitcoin-core/qa-assets#136 (comment).

For example, the tx_pool_standard target under Valgrind currently takes > 10 hours to complete:

Run tx_pool_standard with args ['valgrind', '--quiet', '--error-exitcode=1', '/tmp/cirrus-build/bitcoin-core/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/src/test/fuzz/fuzz', '-runs=1', PosixPath('/tmp/cirrus-build/bitcoin-core/ci/scratch/qa-assets/fuzz_seed_corpus/tx_pool_standard')]INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 242510469
INFO: Loaded 1 modules   (248538 inline 8-bit counters): 248538 [0x27fd278, 0x2839d52), 
INFO: Loaded 1 PC tables (248538 PCs): 248538 [0x2839d58,0x2c04af8), 
INFO:     3775 files found in /tmp/cirrus-build/bitcoin-core/ci/scratch/qa-assets/fuzz_seed_corpus/tx_pool_standard
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 1048576 bytes
INFO: seed corpus: files: 3775 min: 1b max: 2090089b total: 88593805b rss: 321Mb
#16	pulse  cov: 4536 ft: 4537 corp: 1/1b exec/s: 5 rss: 323Mb
#32	pulse  cov: 4538 ft: 4544 corp: 4/10b exec/s: 6 rss: 323Mb
#64	pulse  cov: 4540 ft: 4553 corp: 8/34b exec/s: 6 rss: 323Mb
#128	pulse  cov: 6319 ft: 9483 corp: 21/196b exec/s: 4 rss: 327Mb
#256	pulse  cov: 6339 ft: 13188 corp: 104/1621b exec/s: 2 rss: 327Mb
#512	pulse  cov: 8952 ft: 24180 corp: 262/6023b exec/s: 2 rss: 335Mb
#1024	pulse  cov: 9924 ft: 36577 corp: 575/23Kb exec/s: 1 rss: 343Mb
<snip>
#2048	pulse  cov: 10161 ft: 56438 corp: 1218/244Kb exec/s: 0 rss: 371Mb
<snip>
#3776	INITED cov: 10988 ft: 65398 corp: 1933/10331Kb exec/s: 0 rss: 430Mb
#3776	DONE   cov: 10988 ft: 65398 corp: 1933/10331Kb lim: 1048576 exec/s: 0 rss: 430Mb
Done 3776 runs in 37778 second(s)

however with this branch, it takes 1.5 hours:

Run tx_pool_standard with args ['valgrind', '--quiet', '--error-exitcode=1', '/tmp/cirrus-build-1174734651/bitcoin-core/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/src/test/fuzz/fuzz', '-runs=1', PosixPath('/tmp/cirrus-build-1174734651/bitcoin-core/ci/scratch/qa-assets/fuzz_seed_corpus/tx_pool_standard')]INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 350811728
INFO: Loaded 1 modules   (366489 inline 8-bit counters): 366489 [0x1c106d0, 0x1c69e69), 
INFO: Loaded 1 PC tables (366489 PCs): 366489 [0x1c69e70,0x2201800), 
INFO:     3775 files found in /tmp/cirrus-build-1174734651/bitcoin-core/ci/scratch/qa-assets/fuzz_seed_corpus/tx_pool_standard
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 1048576 bytes
INFO: seed corpus: files: 3775 min: 1b max: 2090089b total: 88593805b rss: 302Mb
#64	pulse  cov: 1172 ft: 1186 corp: 11/47b exec/s: 32 rss: 304Mb
#128	pulse  cov: 1793 ft: 2253 corp: 32/285b exec/s: 32 rss: 304Mb
#256	pulse  cov: 1862 ft: 3792 corp: 99/1399b exec/s: 19 rss: 305Mb
#512	pulse  cov: 3074 ft: 7764 corp: 221/4862b exec/s: 15 rss: 308Mb
#1024	pulse  cov: 3767 ft: 12721 corp: 498/20Kb exec/s: 10 rss: 314Mb
#2048	pulse  cov: 4141 ft: 22302 corp: 1101/224Kb exec/s: 5 rss: 341Mb
<snip>
#3776	INITED cov: 4573 ft: 26452 corp: 1737/6505Kb exec/s: 0 rss: 400Mb
#3776	DONE   cov: 4573 ft: 26452 corp: 1737/6505Kb lim: 698384 exec/s: 0 rss: 400Mb
Done 3776 runs in 5163 second(s)

Running the native_valgrind CI (master, aarch64):

test/sighash_tests.cpp(120): Entering test case "sighash_test"
==21957== Source and destination overlap in memcpy(0x871e4b0, 0x871e4b0, 36)
==21957==    at 0x488CFA0: __GI_memcpy (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==21957==    by 0x8F7F63: CTxIn::operator=(CTxIn const&) (transaction.h:74)
==21957==    by 0x93F96B: SignatureHashOld(CScript, CTransaction const&, unsigned int, int) (sighash_tests.cpp:76)
==21957==    by 0x93EF1F: sighash_tests::sighash_test::test_method() (sighash_tests.cpp:138)
==21957==    by 0x93EB73: sighash_tests::sighash_test_invoker() (sighash_tests.cpp:120)
==21957==    by 0x36CF47: boost::detail::function::void_function_invoker0<void (*)(), void>::invoke(boost::detail::function::function_buffer&) (function_template.hpp:117)
==21957==    by 0x25B367: boost::function0<void>::operator()() const (function_template.hpp:763)
==21957==    by 0x2D6647: boost::detail::forward::operator()() (execution_monitor.ipp:1388)
==21957==    by 0x2D627F: boost::detail::function::function_obj_invoker0<boost::detail::forward, int>::invoke(boost::detail::function::function_buffer&) (function_template.hpp:137)
==21957==    by 0x2D0393: boost::function0<int>::operator()() const (function_template.hpp:763)
==21957==    by 0x234A6B: int boost::detail::do_invoke<boost::shared_ptr<boost::detail::translator_holder_base>, boost::function<int ()> >(boost::shared_ptr<boost::detail::translator_holder_base> const&, boost::function<int ()> const&) (execution_monitor.ipp:301)
==21957==    by 0x1F7277: boost::execution_monitor::catch_signals(boost::function<int ()> const&) (execution_monitor.ipp:903)
==21957==
{
   <insert_a_suppression_name_here>
   Memcheck:Overlap
   fun:__GI_memcpy
   fun:_ZN5CTxInaSERKS_
   fun:_ZL16SignatureHashOld7CScriptRK12CTransactionji
   fun:_ZN13sighash_tests12sighash_test11test_methodEv
   fun:_ZN13sighash_testsL20sighash_test_invokerEv
   fun:_ZN5boost6detail8function22void_function_invoker0IPFvvEvE6invokeERNS1_15function_bufferE
   fun:_ZNK5boost9function0IvEclEv
   fun:_ZN5boost6detail7forwardclEv
   fun:_ZN5boost6detail8function21function_obj_invoker0INS0_7forwardEiE6invokeERNS1_15function_bufferE
   fun:_ZNK5boost9function0IiEclEv
   fun:_ZN5boost6detail9do_invokeINS_10shared_ptrINS0_22translator_holder_baseEEENS_8functionIFivEEEEEiRKT_RKT0_
   fun:_ZN5boost17execution_monitor13catch_signalsERKNS_8functionIFivEEE
}

vs running this branch:

real	118m55.057s

Disadvantages includes:

  • Becoming slightly more of a package manager in the CI.

Related to the discussion in #27444. See also bitcoin-core/qa-assets#136.

@DrahtBot
Copy link
Contributor

DrahtBot commented Jun 28, 2023

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK dergoegge
Concept NACK MarcoFalke
Concept ACK TheCharlatan

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #28071 (ci: Add missing -O2 to valgrind tasks by MarcoFalke)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@maflcko
Copy link
Member

maflcko commented Jun 28, 2023

No need to "bless" a specific compiler, (current discussion includes switching from Clang to GCC as a workaround).

I think this is the same as point one, which is about the "Source and destination overlap in memcpy" false positive? If so, it could make sense to combine it into one point.

Can use latest (hopefully less buggy) Valgrind, rather than whatever the distro happens to package.

Does it mean we drop support for distro-shipped valgrind? Might be good to clarify, and also might be good to report both issues upstream.

@dergoegge
Copy link
Member

Concept ACK

Speeding up the valgrind jobs seems like a nice win! (assuming our self compiled valgrind isn't skipping more expensive checks).

@fanquake fanquake marked this pull request as ready for review June 30, 2023 18:29
@TheCharlatan
Copy link
Contributor

Re #27992 (comment)

(assuming our self compiled valgrind isn't skipping more expensive checks)

I checked the debian/rules file and did not see anything that we might be missing there in terms of build configuration.

Concept ACK

@fanquake
Copy link
Member Author

fanquake commented Jul 3, 2023

Does it mean we drop support for distro-shipped valgrind?

I think they should remain supported similar to what we do now, where we roughly support recent Valgrind versions on recent distro releases, when combined with recent compilers etc, and we still see occasional issues i.e #27741.

I checked the debian/rules file and did not see anything that we might be missing there in terms of build configuration.

I posted a little more about this here: bitcoin-core/qa-assets#136 (comment). The only option would be --enable-tls (HAVE_TLS), but that is only used in Valgrinds own tests (valgrind/*/tests/). Outside of that, I can't see anything obvious.

I've been using this branch for some time, for working Valgrind CI jobs
on aarch64. Benefits include:
* Valgrind CI jobs work across x86_64 & aarch64.
* Can use latest (hopefully less buggy) Valgrind, rather than whatever
  the distro happens to package.
* No need to "bless" a specific compiler for use with Valgrind, (current
  discussion includes switching from Clang to GCC).
* Valgrind from source seems to run significantly faster compared to running
  the system package. i.e, when fuzzing under Valgrind:

Master:
```bash
asmap_direct with args
Done 646 runs in 155 second(s)
....
addrman_deserialize with args
Done 2944 runs in 2875 second(s)
```

vs running this branch:
```bash
asmap_direct with args
Done 646 runs in 23 second(s)
...
addrman_deserialize with args
Done 2944 runs in 413 second(s)
```

This is also being seen in the qa-assets repo: bitcoin-core/qa-assets#136 (comment).
For example, the `descriptor_parse` target under Valgrind currently takes:
`Done 6304 runs in 12971 second(s)`
however [with this branch, it takes](https://cirrus-ci.com/task/4623075795271680?):
`Done 6304 runs in 4609 second(s)`.

Running the native_valgrind CI (master, aarch64):
```bash
test/sighash_tests.cpp(120): Entering test case "sighash_test"
==21957== Source and destination overlap in memcpy(0x871e4b0, 0x871e4b0, 36)
==21957==    at 0x488CFA0: __GI_memcpy (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==21957==    by 0x8F7F63: CTxIn::operator=(CTxIn const&) (transaction.h:74)
==21957==    by 0x93F96B: SignatureHashOld(CScript, CTransaction const&, unsigned int, int) (sighash_tests.cpp:76)
==21957==    by 0x93EF1F: sighash_tests::sighash_test::test_method() (sighash_tests.cpp:138)
==21957==    by 0x93EB73: sighash_tests::sighash_test_invoker() (sighash_tests.cpp:120)
==21957==    by 0x36CF47: boost::detail::function::void_function_invoker0<void (*)(), void>::invoke(boost::detail::function::function_buffer&) (function_template.hpp:117)
==21957==    by 0x25B367: boost::function0<void>::operator()() const (function_template.hpp:763)
==21957==    by 0x2D6647: boost::detail::forward::operator()() (execution_monitor.ipp:1388)
==21957==    by 0x2D627F: boost::detail::function::function_obj_invoker0<boost::detail::forward, int>::invoke(boost::detail::function::function_buffer&) (function_template.hpp:137)
==21957==    by 0x2D0393: boost::function0<int>::operator()() const (function_template.hpp:763)
==21957==    by 0x234A6B: int boost::detail::do_invoke<boost::shared_ptr<boost::detail::translator_holder_base>, boost::function<int ()> >(boost::shared_ptr<boost::detail::translator_holder_base> const&, boost::function<int ()> const&) (execution_monitor.ipp:301)
==21957==    by 0x1F7277: boost::execution_monitor::catch_signals(boost::function<int ()> const&) (execution_monitor.ipp:903)
==21957==
{
   <insert_a_suppression_name_here>
   Memcheck:Overlap
   fun:__GI_memcpy
   fun:_ZN5CTxInaSERKS_
   fun:_ZL16SignatureHashOld7CScriptRK12CTransactionji
   fun:_ZN13sighash_tests12sighash_test11test_methodEv
   fun:_ZN13sighash_testsL20sighash_test_invokerEv
   fun:_ZN5boost6detail8function22void_function_invoker0IPFvvEvE6invokeERNS1_15function_bufferE
   fun:_ZNK5boost9function0IvEclEv
   fun:_ZN5boost6detail7forwardclEv
   fun:_ZN5boost6detail8function21function_obj_invoker0INS0_7forwardEiE6invokeERNS1_15function_bufferE
   fun:_ZNK5boost9function0IiEclEv
   fun:_ZN5boost6detail9do_invokeINS_10shared_ptrINS0_22translator_holder_baseEEENS_8functionIFivEEEEEiRKT_RKT0_
   fun:_ZN5boost17execution_monitor13catch_signalsERKNS_8functionIFivEEE
}
```

vs running this branch:
```bash
real	118m55.057s
```

Disadvantages includes:
* Becoming slightly more of a package manager in the CI.

Related to the discussion in
bitcoin#27444.
Copy link
Member

@dergoegge dergoegge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK 83266cd

@maflcko
Copy link
Member

maflcko commented Jul 10, 2023

Also, I don't think this is a fix for https://github.com/bitcoin/bitcoin/pull/27444/files#r1165491936. This will still fail outside the CI. Here, you are only modifying the CI to use a different version of valgrind.

@fanquake
Copy link
Member Author

This will still fail outside the CI.

Yes, but outside the CI, the only route to solving that would be with more (achitecture specific) suppressions? If we don't control anything else, we can't make any guarantees about anything passing or not.

At least in this PR, it no-longer fails inside the CI, and that's difference between the CI working, and not, on aarch64.

Happy to followup with further improvements to add more architecture specific supressions or similar, for use outside the CI.

@maflcko
Copy link
Member

maflcko commented Jul 11, 2023

Well, it would be be good to exactly explain what this change is trying to solve and then explain why it solves the issue.

If you are trying to fix the slowness, a better fix may be to just apply -O3 instead of -O0 to Bitcoin Core and leave valgrind alone, as failing to apply optimizations is not a valgrind bug.

If you are trying to fix something else, it may be good to explain as well.

@maflcko
Copy link
Member

maflcko commented Jul 11, 2023

NACK. I don't think compiling from source should be used to unintentionally and accidentally fix an unrelated bug.

@dergoegge
Copy link
Member

Looks like this is implicitly doing the same as #28071? from the qa-assets test run: https://cirrus-ci.com/task/4623075795271680?logs=ci#L1952-L1955

Options used to compile and link:
  external signer = no
  multiprocess    = no
  with experimental syscall sandbox support = no
  with libs       = no
  with wallet     = yes
    with sqlite   = yes
    with bdb      = no
  with gui / qt   = no
  with zmq        = no
  with test       = not building test_bitcoin because fuzzing is enabled
  with fuzz binary = yes
  with bench      = no
  with upnp       = no
  with natpmp     = no
  use asm         = yes
  USDT tracing    = no
  sanitizers      = fuzzer
  debug enabled   = no
  gprof enabled   = no
  werror          = yes
  LTO             = no
  target os       = linux-gnu
  build os        = linux-gnu
  CC              = /usr/bin/ccache clang
  CFLAGS          = -pthread -g -O2
  CPPFLAGS        =  -DABORT_ON_FAILED_ASSUME -fmacro-prefix-map=$(abs_top_srcdir)=.  -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=3  -DHAVE_BUILD_INFO 
  CXX             = /usr/bin/ccache clang++ -std=c++17
  CXXFLAGS        =   -fdebug-prefix-map=$(abs_top_srcdir)=.  -Wstack-protector -fstack-protector-all -fcf-protection=full -fstack-clash-protection  -Wall -Wextra -Wgnu -Wformat -Wformat-security -Wvla -Wshadow-field -Wthread-safety -Wloop-analysis -Wredundant-decls -Wunused-member-function -Wdate-time -Wconditional-uninitialized -Woverloaded-virtual -Wsuggest-override -Wunreachable-code-loop-increment -Wimplicit-fallthrough -Wdocumentation  -Wno-unused-parameter -Wno-self-assign -Werror   -g -O2

@fanquake
Copy link
Member Author

I'll follow up with just adding more suppressions until things work on aarch64.

@fanquake fanquake closed this Jul 14, 2023
@maflcko
Copy link
Member

maflcko commented Jul 14, 2023

I haven't tried, but is this still an issue on aarch64 on current master?

@maflcko
Copy link
Member

maflcko commented Jul 14, 2023

Just ran both CI configs locally on aarch64 and they passed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants