-
Notifications
You must be signed in to change notification settings - Fork 36.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pool based memory resource #25325
Add pool based memory resource #25325
Conversation
6de57f7
to
d3c437f
Compare
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsNo conflicts as of last run. |
850fe38
to
ba38016
Compare
6932541
to
b56c223
Compare
98bb268
to
d196da1
Compare
d196da1
to
9205b60
Compare
rebased to 9205b60 with minor fixes in |
Posthumous utACK 9f947fc |
Wohoo 🎉 Thanks everyone for making this happen! |
Summary: as many of the unit tests don't use this code This is a backport of [[bitcoin/bitcoin#26940 | core#26940]] (partial) and [[bitcoin/bitcoin#27425 | core#27425]] bitcoin/bitcoin@81f5ade This is a depency of [[bitcoin/bitcoin#25325 | core#25325]] Test Plan: `ninja check` Reviewers: #bitcoin_abc, Fabien Reviewed By: #bitcoin_abc, Fabien Differential Revision: https://reviews.bitcoinabc.org/D16155
Summary: A memory resource similar to std::pmr::unsynchronized_pool_resource, but optimized for node-based containers. Co-Authored-By: Pieter Wuille <pieter@wuille.net> This is a partial backport of [[bitcoin/bitcoin#25325 | core#25325]] bitcoin/bitcoin@b8401c3 with ARM fixes from bitcoin/bitcoin@ce881bf Test Plan: `ninja all check-all bench-bitcoin` ``` | ns/op | op/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 26.04 | 38,408,682.21 | 1.3% | 0.02 | `PoolAllocator_StdUnorderedMap` | 10.29 | 97,207,783.17 | 0.3% | 0.01 | `PoolAllocator_StdUnorderedMapWithPoolResource` ``` Reviewers: #bitcoin_abc, Fabien Reviewed By: #bitcoin_abc, Fabien Differential Revision: https://reviews.bitcoinabc.org/D16175
…cator Summary: Extracts the resource from a PoolAllocator and uses it for calculation of the node's memory usage. This is a partial backport of [[bitcoin/bitcoin#25325 | core#25325]] bitcoin/bitcoin@e19943f with an ARM fix and additional test to make sure the pool is actually used by the nodes from [[bitcoin/bitcoin#28913 | core#28913]] Depends on D16175 Test Plan: `ninja check` Reviewers: #bitcoin_abc, Fabien Reviewed By: #bitcoin_abc, Fabien Differential Revision: https://reviews.bitcoinabc.org/D16176
Summary: This frees up all associated memory with the map, not only the nodes. This is necessary in preparation for using the PoolAllocator for CCoinsMap, which does not actually free any memory on clear(). This is a partial backport of [[bitcoin/bitcoin#25325 | core#25325]] bitcoin/bitcoin@5e4ac5a Depends on D16176 Test Plan: `ninja all check-all` Reviewers: #bitcoin_abc, Fabien Reviewed By: #bitcoin_abc, Fabien Differential Revision: https://reviews.bitcoinabc.org/D16177
Summary: > In my benchmarks, using this pool allocator for CCoinsMap gives about 20% faster `-reindex-chainstate` with -dbcache=5000 with practically the same memory usage. The change in max RSS changed was 0.3%. > > The `validation_flush_tests` tests need to be updated because memory allocation is now done in large pools instead of one node at a time, so the limits need to be updated accordingly. This is a partial backport of [[bitcoin/bitcoin#25325 | core#25325]] bitcoin/bitcoin@9f947fc with a fix from [[bitcoin/bitcoin#28913 | core#28913]] Depends on D16177 Benchmark on a 12 cores AMD CPU, with the data on a spinning harddrive, using the default dbcache (1024 MB): ``` time src/bitcoind -reindex-chainstate -stopatheight=400000 -assumevalid=000000000000000004ec466ce4732fe6f1ed1cddc2ed4b328fff5224276e3f6f ``` Before: ``` real 96m2s user 27m27s sys 2min58s ``` After: ``` real 74m22s user 23m39s sys 2min23s ``` Test Plan: `ninja all check-all` Reviewers: #bitcoin_abc, Fabien Reviewed By: #bitcoin_abc, Fabien Subscribers: Fabien Differential Revision: https://reviews.bitcoinabc.org/D16178
, bitcoin#24021, bitcoin#24543, bitcoin#26844, bitcoin#25325, bitcoin#28165, partial bitcoin#20524, bitcoin#26036, bitcoin#27981 (networking backports: part 7) 76a458e fmt: apply formatting suggestions from `clang-format-diff.py` (Kittywhiskers Van Gogh) 63962ec merge bitcoin#28165: transport abstraction (Kittywhiskers Van Gogh) c6b9186 merge bitcoin#25325: Add pool based memory resource (Kittywhiskers Van Gogh) 8c986d6 partial bitcoin#27981: Fix potential network stalling bug (Kittywhiskers Van Gogh) 13f6dc1 merge bitcoin#26844: Pass MSG_MORE flag when sending non-final network messages (Kittywhiskers Van Gogh) caaa0fd net: use `std::deque` for `vSendMsg` instead of `std::list` (Kittywhiskers Van Gogh) 2ecba6b partial bitcoin#26036: add NetEventsInterface::g_msgproc_mutex (Kittywhiskers Van Gogh) f6c9439 merge bitcoin#24543: Move remaining globals into PeerManagerImpl (Kittywhiskers Van Gogh) dbe41ea refactor: move object request logic to `PeerManagerImpl` (Kittywhiskers Van Gogh) 112c4e0 merge bitcoin#24021: Rename and move PoissonNextSend functions (Kittywhiskers Van Gogh) 6d690ed merge bitcoin#23970: Remove pointless and confusing shift in RelayAddress (Kittywhiskers Van Gogh) 87205f2 merge bitcoin#21327: ignore transactions while in IBD (Kittywhiskers Van Gogh) 51ad8e4 merge bitcoin#21148: Split orphan handling from net_processing into txorphanage (Kittywhiskers Van Gogh) cbff29a partial bitcoin#20524: Move MIN_VERSION_SUPPORTED to p2p.py (Kittywhiskers Van Gogh) Pull request description: ## Additional Information * Dependent on #6098 * Dependent on #6233 * `p2p_ibd_txrelay.py` was first introduced in [bitcoin#19423](bitcoin#19423) to test feefilter logic but on account of Dash not having feefilter capabilities, that backport was skipped over but on account of the tests introduced in [bitcoin#21327](bitcoin#21327) that test capabilities present in Dash, a minimal version of `p2p_ibd_txrelay.py` has been committed in. * `vSendMsg` is originally a `std::deque` and as an optimization, was changed to a `std::list` in 027a852 ([dash#3398](#3398)) but this renders us unable to backport [bitcoin#26844](bitcoin#26844) as it introduces build failures. The optimization has been reverted to make way for the backport. <details> <summary>Compile failure:</summary> ``` net.cpp:959:20: error: invalid operands to binary expression ('iterator' (aka '_List_iterator<std::vector<unsigned char, std::allocator<unsigned char>>>') and 'int') if (it + 1 != node.vSendMsg.end()) { ~~ ^ ~ /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stl_bvector.h:303:3: note: candidate function not viable: no known conversion from 'iterator' (aka '_List_iterator<std::vector<unsigned char, std::allocator<unsigned char>>>') to 'ptrdiff_t' (aka 'long') for 1st argument operator+(ptrdiff_t __n, const _Bit_iterator& __x) [...] 1 error generated. make[2]: *** [Makefile:11296: libbitcoin_server_a-net.o] Error 1 make[2]: *** Waiting for unfinished jobs.... make[2]: Leaving directory '/src/dash/src' make[1]: *** [Makefile:19171: all-recursive] Error 1 make[1]: Leaving directory '/src/dash/src' make: *** [Makefile:799: all-recursive] Error 1 ``` </details> * The collection of `CNode` pointers in `CConnman::SocketHandlerConnected` has been changed to a `std::set` to allow for us to erase elements from `vReceivableNodes` if the node is _also_ in the set of sendable nodes and the send hasn't entirely succeeded to avoid a deadlock (i.e. backport [bitcoin#27981](bitcoin#27981)) * When backporting [bitcoin#28165](bitcoin#28165), `denialofservice_tests` has been modified to still check with `vSendMsg` instead of `Transport::GetBytesToSend()` as changes in networking code to support LT-SEMs (level-triggered socket events mode) mean that the message doesn't get shifted from `vSendMsg` to `m_message_to_send`, as the test expects. * Specifically, the changes made for LT-SEM support result in the function responsible for making that shift (`Transport::SetMessageToSend()` through `CConnman::SocketSendData()`), not being called during the test runtime. * As checking `vSendMsg` (directly or through `nSendMsgSize`) isn't enough to determine if the queue is empty, we now also check with `to_send` from `Transport::GetBytesToSend()` to help us make that determination. This mirrors the change present in the upstream backport ([source](https://github.com/bitcoin/bitcoin/pull/28165/files#diff-00021eed586a482abdb09d6cdada1d90115abe988a91421851960e26658bed02R1324-R1327)). ## Breaking Changes * `bandwidth.message.*.bytesSent` will no longer include overhead and will now only report message size as specifics that let us calculate the overhead have been abstracted away. ## Checklist: - [x] I have performed a self-review of my own code - [x] I have commented my code, particularly in hard-to-understand areas **(note: N/A)** - [x] I have added or updated relevant unit/integration/functional/e2e tests - [x] I have made corresponding changes to the documentation **(note: N/A)** - [x] I have assigned this pull request to a milestone _(for repository code-owners and collaborators only)_ ACKs for top commit: PastaPastaPasta: utACK 76a458e Tree-SHA512: 2e47c207c1f854cfbd5b28c07dd78e12765ddb919abcd7710325df5d253cd0ba4bc30aa21545d88519e8acfe65638a57db4ca66853aca82fc355542210f4b394
A memory resource similar to
std::pmr::unsynchronized_pool_resource
, but optimized for node-based containers. The goal is to be able to cache more coins with the same memory usage, and allocate/deallocate faster.This is a reimplementation of #22702. The goal was to implement it in a way that is simpler to review & test
There is now a generic
PoolResource
for allocating/deallocating memory. This has practically the same API asstd::pmr::memory_resource
. (Unfortunately I cannot use std::pmr because libc++ simply doesn't implement that API).Thanks to sipa there is now a fuzzer for PoolResource! On a fast machine I ran it for ~770 million executions without finding any issue.
The estimation of the correct node size is now gone, PoolResource now has multiple pools and just needs to be created large enough to have space for the unordered_map nodes.
I ran benchmarks with #22702, mergebase, and this PR. Frequency locked Intel i7-8700, clang++ 13.0.1 to reindex up to block 690000.
The performance is practically identical with #22702, just 0.4% slower. It's ~21% faster than master:
Note that on cache drops mergebase's memory doesnt go so far down because it does not free the
CCoinsMap
bucket array.