New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci: build Valgrind (3.21) from source #27992
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
I think this is the same as point one, which is about the "Source and destination overlap in memcpy" false positive? If so, it could make sense to combine it into one point.
Does it mean we drop support for distro-shipped valgrind? Might be good to clarify, and also might be good to report both issues upstream. |
Concept ACK Speeding up the valgrind jobs seems like a nice win! (assuming our self compiled valgrind isn't skipping more expensive checks). |
I checked the debian/rules file and did not see anything that we might be missing there in terms of build configuration. Concept ACK |
I think they should remain supported similar to what we do now, where we roughly support recent Valgrind versions on recent distro releases, when combined with recent compilers etc, and we still see occasional issues i.e #27741.
I posted a little more about this here: bitcoin-core/qa-assets#136 (comment). The only option would be |
I've been using this branch for some time, for working Valgrind CI jobs on aarch64. Benefits include: * Valgrind CI jobs work across x86_64 & aarch64. * Can use latest (hopefully less buggy) Valgrind, rather than whatever the distro happens to package. * No need to "bless" a specific compiler for use with Valgrind, (current discussion includes switching from Clang to GCC). * Valgrind from source seems to run significantly faster compared to running the system package. i.e, when fuzzing under Valgrind: Master: ```bash asmap_direct with args Done 646 runs in 155 second(s) .... addrman_deserialize with args Done 2944 runs in 2875 second(s) ``` vs running this branch: ```bash asmap_direct with args Done 646 runs in 23 second(s) ... addrman_deserialize with args Done 2944 runs in 413 second(s) ``` This is also being seen in the qa-assets repo: bitcoin-core/qa-assets#136 (comment). For example, the `descriptor_parse` target under Valgrind currently takes: `Done 6304 runs in 12971 second(s)` however [with this branch, it takes](https://cirrus-ci.com/task/4623075795271680?): `Done 6304 runs in 4609 second(s)`. Running the native_valgrind CI (master, aarch64): ```bash test/sighash_tests.cpp(120): Entering test case "sighash_test" ==21957== Source and destination overlap in memcpy(0x871e4b0, 0x871e4b0, 36) ==21957== at 0x488CFA0: __GI_memcpy (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so) ==21957== by 0x8F7F63: CTxIn::operator=(CTxIn const&) (transaction.h:74) ==21957== by 0x93F96B: SignatureHashOld(CScript, CTransaction const&, unsigned int, int) (sighash_tests.cpp:76) ==21957== by 0x93EF1F: sighash_tests::sighash_test::test_method() (sighash_tests.cpp:138) ==21957== by 0x93EB73: sighash_tests::sighash_test_invoker() (sighash_tests.cpp:120) ==21957== by 0x36CF47: boost::detail::function::void_function_invoker0<void (*)(), void>::invoke(boost::detail::function::function_buffer&) (function_template.hpp:117) ==21957== by 0x25B367: boost::function0<void>::operator()() const (function_template.hpp:763) ==21957== by 0x2D6647: boost::detail::forward::operator()() (execution_monitor.ipp:1388) ==21957== by 0x2D627F: boost::detail::function::function_obj_invoker0<boost::detail::forward, int>::invoke(boost::detail::function::function_buffer&) (function_template.hpp:137) ==21957== by 0x2D0393: boost::function0<int>::operator()() const (function_template.hpp:763) ==21957== by 0x234A6B: int boost::detail::do_invoke<boost::shared_ptr<boost::detail::translator_holder_base>, boost::function<int ()> >(boost::shared_ptr<boost::detail::translator_holder_base> const&, boost::function<int ()> const&) (execution_monitor.ipp:301) ==21957== by 0x1F7277: boost::execution_monitor::catch_signals(boost::function<int ()> const&) (execution_monitor.ipp:903) ==21957== { <insert_a_suppression_name_here> Memcheck:Overlap fun:__GI_memcpy fun:_ZN5CTxInaSERKS_ fun:_ZL16SignatureHashOld7CScriptRK12CTransactionji fun:_ZN13sighash_tests12sighash_test11test_methodEv fun:_ZN13sighash_testsL20sighash_test_invokerEv fun:_ZN5boost6detail8function22void_function_invoker0IPFvvEvE6invokeERNS1_15function_bufferE fun:_ZNK5boost9function0IvEclEv fun:_ZN5boost6detail7forwardclEv fun:_ZN5boost6detail8function21function_obj_invoker0INS0_7forwardEiE6invokeERNS1_15function_bufferE fun:_ZNK5boost9function0IiEclEv fun:_ZN5boost6detail9do_invokeINS_10shared_ptrINS0_22translator_holder_baseEEENS_8functionIFivEEEEEiRKT_RKT0_ fun:_ZN5boost17execution_monitor13catch_signalsERKNS_8functionIFivEEE } ``` vs running this branch: ```bash real 118m55.057s ``` Disadvantages includes: * Becoming slightly more of a package manager in the CI. Related to the discussion in bitcoin#27444.
685a684
to
83266cd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK 83266cd
Also, I don't think this is a fix for https://github.com/bitcoin/bitcoin/pull/27444/files#r1165491936. This will still fail outside the CI. Here, you are only modifying the CI to use a different version of valgrind. |
Yes, but outside the CI, the only route to solving that would be with more (achitecture specific) suppressions? If we don't control anything else, we can't make any guarantees about anything passing or not. At least in this PR, it no-longer fails inside the CI, and that's difference between the CI working, and not, on aarch64. Happy to followup with further improvements to add more architecture specific supressions or similar, for use outside the CI. |
Well, it would be be good to exactly explain what this change is trying to solve and then explain why it solves the issue. If you are trying to fix the slowness, a better fix may be to just apply If you are trying to fix something else, it may be good to explain as well. |
NACK. I don't think compiling from source should be used to unintentionally and accidentally fix an unrelated bug. |
Looks like this is implicitly doing the same as #28071? from the qa-assets test run: https://cirrus-ci.com/task/4623075795271680?logs=ci#L1952-L1955
|
I'll follow up with just adding more suppressions until things work on aarch64. |
I haven't tried, but is this still an issue on aarch64 on current master? |
Just ran both CI configs locally on aarch64 and they passed. |
I've been using this branch for some time, for working Valgrind CI jobs on aarch64. Benefits include:
Master:
vs running this branch:
This is also being seen in the qa-assets repo: bitcoin-core/qa-assets#136 (comment).
For example, the
tx_pool_standard
target under Valgrind currently takes > 10 hours to complete:however with this branch, it takes 1.5 hours:
Running the native_valgrind CI (master, aarch64):
vs running this branch:
Disadvantages includes:
Related to the discussion in #27444. See also bitcoin-core/qa-assets#136.