New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch BlockMap to use an unordered_set under the hood #19677
Conversation
@@ -1875,8 +1876,8 @@ static unsigned int GetBlockScriptFlags(const CBlockIndex* pindex, const Consens | |||
// mainnet and testnet), so for simplicity, always leave P2SH | |||
// on except for the one violating block. | |||
if (consensusparams.BIP16Exception.IsNull() || // no bip16 exception on this chain | |||
pindex->phashBlock == nullptr || // this is a new candidate block, eg from TestBlockValidity() | |||
*pindex->phashBlock != consensusparams.BIP16Exception) // this block isn't the historical exception | |||
pindex->m_hash_block.IsNull() || // this is a new candidate block, eg from TestBlockValidity() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this kosher 100% of the time as a replacement for nullptr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should fix itself in the next rebase?
c61c74e
to
e3d661e
Compare
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
fe727c3
to
c80841e
Compare
c80841e
to
99a1274
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK!
This is a bit of a peculiar choice because we then store a pointer to the uint256 from the pair inside of the CBlockIndex
Indeed.
99a1274
to
0aa7e94
Compare
Thanks for the review! Converted GetBlockHash to be a const ref, good suggestion! I've removed the WIP status here as I eliminated the earlier bug. |
0aa7e94
to
714aea5
Compare
714aea5
to
fdc2f15
Compare
🐙 This pull request conflicts with the target branch and needs rebase. Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a "draft". |
There hasn't been much activity lately and the patch still needs rebase. What is the status here?
|
if there is a concept ACK from someone who would merge it on this approach I will rebase it, but since it touches a lot of places I don't have time to rebase it continually. |
Yeah, it is non-trivial to review so hard to tell. I wonder if this tiny layers can be peeled off of this or if this needs to be one chunk. Maybe the |
hmmm... it's possible? but why? I see two split-up paths: Make phashblock m_hash_block and keep the map structure firstThis uses 24 bytes more memory at first, but doesn't change the lookup data structure. Next, we could change the data structure to an indirect_map type and change the key to a pointer. Then, we could change to a set. Make the data structure a set first and keep phashblockThis changes the data structure, and changes all the lookup sites. No extra storage, no savings. Uses of phashblock remain consistent, although it's just a self-pointer. phashblock could become an inline method too, so you don't change the memory usage, but do need to add (). Eventually switch to using m_hash_block. |
There hasn't been much activity lately and the patch still needs rebase. What is the status here?
|
Is this still relevant after #24050? |
yes, it's completely orthogonal to #24050 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK. I started making the same change after reviewing #24050 and #24199. As noted in the PR description, C++20 will allow a simpler implementation of this because it adds unordered_set
find
and count
method overloads that can accept block hashes, instead of CBlockIndex pointers (heterogenous lookups, https://www.cppstories.com/2021/heterogeneous-access-cpp20/). But since the find method is mostly wrapped anyway and not called directly, this is not a big deal.
size_t operator()(CBlockIndex* const& ptr) const { return ReadLE64(ptr->GetBlockHash().begin()); } | ||
// Helper for querying by hash | ||
// e.g., map.find(map.hash_function()(h)) | ||
CBlockIndex* operator()(const uint256& hash) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "Refactor BlockMap to use an unordered_set instead of an unordered_map" (fdc2f15)
I was confused by the idx.hash_function()(hash)
stuff initially, and think it could be clearer if you gave this method a different name like FakeBlockIndex
instead of operator()
. idx.hash_function().FakeBlockIndex(hash)
would give a clue that the method is returning a fake block index pointer, not taking the hash of a hash, and also avoid overloading the same operator in two ways that do different things conceptually.
Also I guess you are using a shared mock CBlockIndex instance rather than creating temporary CBlockIndex objects for performance reasons, but it it would be interesting to know if taking a more straightforward with temporary mock objects would actually hurt performance:
struct FakeBlockIndex : public CBlockIndex
{
FakeBlockIndex(const uint256& hash) { m_hash_block = hash; }
FakeBlockIndex* pointer() { return this; }
};
Since it would allow simplifying:
- BlockMap::const_iterator it = m_blockman.m_block_index.find(m_blockman.m_block_index.hash_function()(hashAssumeValid));
+ BlockMap::const_iterator it = m_blockman.m_block_index.find(FakeBlockIndex(hashAssumeValid).pointer());
and:
struct BlockHasher
{
// this used to call `GetCheapHash()` in uint256, which was later moved; the
// cheap hash function simply calls ReadLE64() however, so the end result is
// identical
- mutable CBlockIndex mock;
- BlockHasher() : mock() {};
size_t operator()(CBlockIndex* const& ptr) const { return ReadLE64(ptr->GetBlockHash().begin()); }
- // Helper for querying by hash
- // e.g., map.find(map.hash_function()(h))
- CBlockIndex* operator()(const uint256& hash) {
- mock.m_hash_block = hash;
- return &mock;
- }
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would since CBlockIndex is a very big struct?
but maybe we can make CBlockIndex have an abstract base class with just m_block_hash and then inherit from that for both? but then you have overhead everywhere...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would since CBlockIndex is a very big struct?
but maybe we can make CBlockIndex have an abstract base class with just m_block_hash and then inherit from that for both? but then you have overhead everywhere...
I would probably not do the abstract base thing, since the reason for my suggestion was to simplify code, and trying to overload find that way would seem to add more complexity. Current approach here does seems fine. If dropping the hasher method would be bad for performance, I'd just rename it from operator()
to FakeBlockIndex
or something for clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i probably won't have time to work on this until mid april FYI, but i can pick it up sometime then.
if you'd like to pick it off and rebase it and push it through i would review it + ack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know. I'm working on some other CBlockIndex changes so might want to do this. No hurry though!
There hasn't been much activity lately and the patch still needs rebase. What is the status here?
|
Closing as this has needed rebase for more than 2 years. Feel free to reopen if you get a chance to work on this again in the future, thanks! |
Mark up for grabs? |
Currently we use an
unordered_map<uint256, CBlockIndex*>
for our BlockMap. This is a bit of a peculiar choice because we then store a pointer to the uint256 from the pair inside of the CBlockIndex. It would be conceptually simpler if we used aunordered_map<CBlockIndex>
and modifiedCBlockIndex
to directly own the uint256 for the phashBlock.That's what this patch attempts doing. The only additional complexity is where we do want to query the map by key, the key equivalence stuff is only in C++20, so we add a helper that lets us construct a mock element easily for using with find.
The result is nicer since we get rid of an additional indirection for accessing the phashblock and we get rid of a lot of std::pair de-structuring. We also save a bit of memory per index (I haven't computed it precisely, but my guess is we save something like 24 bytes total per index by eliminating 1 pointer, which saves 8 bytes, and then could save us something like 16 bytes of padding). We also no longer can be in an inconsistent state where phashblock does not point to the entries own hash (although modifying phashblock once it is inserted into the map would be invalid -- it must be removed, modified, and reinserted).
Future work can likely further improve on this by making CBlockIndex owned* by the unordered_set directly, which will eliminate even more indirection/pointer chasing and make a lot of the code using BlockMap simpler (I don't think we ever rely on CBlockIndex being non-owned anywhere, nor should we have reason to in the future).