-
Notifications
You must be signed in to change notification settings - Fork 35.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace non-standard CLZ builtins with c++20's bit_width #29057
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code CoverageFor detailed information about the code coverage, see the test coverage report. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code-review ACK 273e4dc
Unit tests succeeded locally ✔️
Seems reasonable to keep CountBits
in a first step with its tests and fuzzers for reassurance and then replace all of its call-sites. Happy to re-review this PR or a follow-up, depending on where you decide to put the next commit.
Concept ACK. |
I'm convinced :) |
Concept ACK. We can also check the benchmarks in https://corecheck.dev/bitcoin/bitcoin/pulls/29057 when they become available.
I think the changes here should be straightforward enough, that you could push another commit doing the cleanup. |
I don't think the benchmarks will come. I guess the machine has been preempted by AWS? @aureleoules |
I've re-ran the job. I actually found a way to re-run preempted jobs automatically! |
So the benchmark results are available now but they are pessimistic on a few benchs (WalletCreateTxUseOnlyPresetInputs, WalletAvailableCoins and PrevectorClearNontrivial). I ran the benchmarks locally without cachegrind and there does not seem to be any variation between master and the pull. This may be due to a limitation of cachegrind because it cannot emulate L2 cache and so results may be pessimist? I'm not an expert in this field though so I'll let you interpret the results😅 |
I can reproduce a slight slowdown here. I added a bench that demonstrates the difference, then the original commit, then the removal. Both libc++ and libstdc++ implement this in terms of the same built-ins we were using before, so I find this surprising. I'd hate to find that the c++ism's have a cost. Going to convert this to a draft while I look into it. |
bit_width is a drop-in replacement with an exact meaning in c++, so there is no need to continue testing/fuzzing/benchmarking.
I ran the first commit via |
@maflcko I'm thinking we should just take the c++20 code and assume it will be more performant going forward. That makes sense to me anyway. I'm going to go ahead and close this and include it in another PR which gathers the serialization c++20 changes. We can do another round of benches there. |
86b7f28 serialization: use internal endian conversion functions (Cory Fields) 432b18c serialization: detect byteswap builtins without autoconf tests (Cory Fields) 297367b crypto: replace CountBits with std::bit_width (Cory Fields) 52f9bba crypto: replace non-standard CLZ builtins with c++20's bit_width (Cory Fields) Pull request description: This replaces #28674, #29036, and #29057. Now ready for testing and review. Replaces platform-specific endian and byteswap functions. This is especially useful for kernel, as it means that our deep serialization code no longer requires bitcoin-config.h. I apologize for the size of the last commit, but it's hard to avoid making those changes at once. All platforms now use our internal functions rather than libc or platform-specific ones, with the exception of MSVC. Sadly, benchmarking showed that not all compilers are capable of detecting and optimizing byteswap functions, so compiler builtins are instead used where possible. However, they're now detected via macros rather than autoconf checks. This[ matches how libc++ implements std::byteswap for c++23](https://github.com/llvm/llvm-project/blob/main/libcxx/include/__bit/byteswap.h#L26). I suggest we move/rename `compat/endian.h`, but I left that out of this PR to avoid bikeshedding. #29057 pointed out some irregularities in benchmarks. After messing with various compilers and configs for a few weeks with these changes, I'm of the opinion that we can't win on every platform every time, so we should take the code that makes sense going forward. That said, if any real-world slowdowns are caused here, we should obviously investigate. ACKs for top commit: maflcko: ACK 86b7f28 📘 fanquake: ACK 86b7f28 - we can finish pruning out the __builtin_clz* checks/usage once the minisketch code has been updated. This is more good cleanup pre-CMake & for the kernal. Tree-SHA512: 715a32ec190c70505ffbce70bfe81fc7b6aa33e376b60292e801f60cf17025aabfcab4e8c53ebb2e28ffc5cf4c20b74fe3dd8548371ad772085c13aec8b7970e
Split out of #28674
Note that we can't yet drop our configure checks because we pass the results on to minisketch. I've opened a PR for that upstream here: sipa/minisketch#80
fanquake suggested that we simply replace our
CountBits
call-sites with uses ofstd::bit_width
directly and just drop the tests and fuzzers. I agree with that, but I wanted to allow our tests/fuzzers to run with this change first to help convince reviewers thatstd::bit_width
is a drop-in replacement forCountBits
.I can either add a commit on top of this PR to do the switch after the c-i has run or do it as a follow-up PR, I have no preference.
I was curious to see what would happen under the hood with this change. Fortunately libc++ (as an example) does exactly what one would expect, using the built-ins: https://github.com/llvm/llvm-project/blob/main/libcxx/include/__bit/countl.h#L56