Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch BlockMap to use an unordered_set under the hood #19677

Closed

Conversation

JeremyRubin
Copy link
Contributor

@JeremyRubin JeremyRubin commented Aug 7, 2020

Currently we use an unordered_map<uint256, CBlockIndex*> for our BlockMap. This is a bit of a peculiar choice because we then store a pointer to the uint256 from the pair inside of the CBlockIndex. It would be conceptually simpler if we used a unordered_map<CBlockIndex> and modified CBlockIndex to directly own the uint256 for the phashBlock.

That's what this patch attempts doing. The only additional complexity is where we do want to query the map by key, the key equivalence stuff is only in C++20, so we add a helper that lets us construct a mock element easily for using with find.

The result is nicer since we get rid of an additional indirection for accessing the phashblock and we get rid of a lot of std::pair de-structuring. We also save a bit of memory per index (I haven't computed it precisely, but my guess is we save something like 24 bytes total per index by eliminating 1 pointer, which saves 8 bytes, and then could save us something like 16 bytes of padding). We also no longer can be in an inconsistent state where phashblock does not point to the entries own hash (although modifying phashblock once it is inserted into the map would be invalid -- it must be removed, modified, and reinserted).

Future work can likely further improve on this by making CBlockIndex owned* by the unordered_set directly, which will eliminate even more indirection/pointer chasing and make a lot of the code using BlockMap simpler (I don't think we ever rely on CBlockIndex being non-owned anywhere, nor should we have reason to in the future).

     References and pointers to data stored in the container are only invalidated by erasing that element, even when the corresponding iterator is invalidated. 

@@ -1875,8 +1876,8 @@ static unsigned int GetBlockScriptFlags(const CBlockIndex* pindex, const Consens
// mainnet and testnet), so for simplicity, always leave P2SH
// on except for the one violating block.
if (consensusparams.BIP16Exception.IsNull() || // no bip16 exception on this chain
pindex->phashBlock == nullptr || // this is a new candidate block, eg from TestBlockValidity()
*pindex->phashBlock != consensusparams.BIP16Exception) // this block isn't the historical exception
pindex->m_hash_block.IsNull() || // this is a new candidate block, eg from TestBlockValidity()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this kosher 100% of the time as a replacement for nullptr?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should fix itself in the next rebase?

@DrahtBot
Copy link
Contributor

DrahtBot commented Aug 7, 2020

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@JeremyRubin JeremyRubin force-pushed the refactor-blockmap-blockset branch 3 times, most recently from fe727c3 to c80841e Compare August 7, 2020 10:27
This was referenced Aug 7, 2020
Copy link
Member

@promag promag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK!

This is a bit of a peculiar choice because we then store a pointer to the uint256 from the pair inside of the CBlockIndex

Indeed.

src/chain.h Outdated Show resolved Hide resolved
@JeremyRubin JeremyRubin changed the title WIP: Switch BlockMap to use an unordered_set under the hood Switch BlockMap to use an unordered_set under the hood Aug 10, 2020
@JeremyRubin
Copy link
Contributor Author

Thanks for the review! Converted GetBlockHash to be a const ref, good suggestion!

I've removed the WIP status here as I eliminated the earlier bug.

src/bench/rpc_blockchain.cpp Outdated Show resolved Hide resolved
src/chain.h Outdated Show resolved Hide resolved
src/chain.h Show resolved Hide resolved
src/test/fuzz/chain.cpp Outdated Show resolved Hide resolved
src/test/fuzz/chain.cpp Outdated Show resolved Hide resolved
@DrahtBot
Copy link
Contributor

DrahtBot commented Sep 7, 2020

🐙 This pull request conflicts with the target branch and needs rebase.

Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a "draft".

@DrahtBot
Copy link
Contributor

There hasn't been much activity lately and the patch still needs rebase. What is the status here?
  • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
  • Is it no longer relevant? ➡️ Please close.
  • Did the author lose interest or time to work on this? ➡️ Please close it and mark it 'Up for grabs' with the label, so that it can be picked up in the future.

@JeremyRubin
Copy link
Contributor Author

if there is a concept ACK from someone who would merge it on this approach I will rebase it, but since it touches a lot of places I don't have time to rebase it continually.

@maflcko
Copy link
Member

maflcko commented Dec 15, 2021

Yeah, it is non-trivial to review so hard to tell.

I wonder if this tiny layers can be peeled off of this or if this needs to be one chunk. Maybe the phashBlock change can be split out?

@JeremyRubin
Copy link
Contributor Author

hmmm...

it's possible? but why?

I see two split-up paths:

Make phashblock m_hash_block and keep the map structure first

This uses 24 bytes more memory at first, but doesn't change the lookup data structure.

Next, we could change the data structure to an indirect_map type and change the key to a pointer.

Then, we could change to a set.

Make the data structure a set first and keep phashblock

This changes the data structure, and changes all the lookup sites.

No extra storage, no savings.

Uses of phashblock remain consistent, although it's just a self-pointer.

phashblock could become an inline method too, so you don't change the memory usage, but do need to add ().

Eventually switch to using m_hash_block.

@DrahtBot
Copy link
Contributor

There hasn't been much activity lately and the patch still needs rebase. What is the status here?
  • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
  • Is it no longer relevant? ➡️ Please close.
  • Did the author lose interest or time to work on this? ➡️ Please close it and mark it 'Up for grabs' with the label, so that it can be picked up in the future.

@maflcko
Copy link
Member

maflcko commented Mar 21, 2022

Is this still relevant after #24050?

@JeremyRubin
Copy link
Contributor Author

yes, it's completely orthogonal to #24050

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK. I started making the same change after reviewing #24050 and #24199. As noted in the PR description, C++20 will allow a simpler implementation of this because it adds unordered_set find and count method overloads that can accept block hashes, instead of CBlockIndex pointers (heterogenous lookups, https://www.cppstories.com/2021/heterogeneous-access-cpp20/). But since the find method is mostly wrapped anyway and not called directly, this is not a big deal.

size_t operator()(CBlockIndex* const& ptr) const { return ReadLE64(ptr->GetBlockHash().begin()); }
// Helper for querying by hash
// e.g., map.find(map.hash_function()(h))
CBlockIndex* operator()(const uint256& hash) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "Refactor BlockMap to use an unordered_set instead of an unordered_map" (fdc2f15)

I was confused by the idx.hash_function()(hash) stuff initially, and think it could be clearer if you gave this method a different name like FakeBlockIndex instead of operator(). idx.hash_function().FakeBlockIndex(hash) would give a clue that the method is returning a fake block index pointer, not taking the hash of a hash, and also avoid overloading the same operator in two ways that do different things conceptually.

Also I guess you are using a shared mock CBlockIndex instance rather than creating temporary CBlockIndex objects for performance reasons, but it it would be interesting to know if taking a more straightforward with temporary mock objects would actually hurt performance:

struct FakeBlockIndex : public CBlockIndex
{
    FakeBlockIndex(const uint256& hash) { m_hash_block = hash; }
    FakeBlockIndex* pointer() { return this; }
};

Since it would allow simplifying:

-        BlockMap::const_iterator  it = m_blockman.m_block_index.find(m_blockman.m_block_index.hash_function()(hashAssumeValid));
+        BlockMap::const_iterator  it = m_blockman.m_block_index.find(FakeBlockIndex(hashAssumeValid).pointer());

and:

 struct BlockHasher
 {
     // this used to call `GetCheapHash()` in uint256, which was later moved; the
     // cheap hash function simply calls ReadLE64() however, so the end result is
     // identical
-    mutable CBlockIndex mock;
-    BlockHasher() : mock() {};
     size_t operator()(CBlockIndex* const& ptr) const { return ReadLE64(ptr->GetBlockHash().begin()); }
-    // Helper for querying by hash
-    // e.g., map.find(map.hash_function()(h))
-    CBlockIndex* operator()(const uint256& hash) {
-        mock.m_hash_block = hash;
-        return &mock;
-    }
 };

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would since CBlockIndex is a very big struct?

but maybe we can make CBlockIndex have an abstract base class with just m_block_hash and then inherit from that for both? but then you have overhead everywhere...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would since CBlockIndex is a very big struct?

but maybe we can make CBlockIndex have an abstract base class with just m_block_hash and then inherit from that for both? but then you have overhead everywhere...

I would probably not do the abstract base thing, since the reason for my suggestion was to simplify code, and trying to overload find that way would seem to add more complexity. Current approach here does seems fine. If dropping the hasher method would be bad for performance, I'd just rename it from operator() to FakeBlockIndex or something for clarity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i probably won't have time to work on this until mid april FYI, but i can pick it up sometime then.

if you'd like to pick it off and rebase it and push it through i would review it + ack.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know. I'm working on some other CBlockIndex changes so might want to do this. No hurry though!

@DrahtBot
Copy link
Contributor

There hasn't been much activity lately and the patch still needs rebase. What is the status here?
  • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
  • Is it no longer relevant? ➡️ Please close.
  • Did the author lose interest or time to work on this? ➡️ Please close it and mark it 'Up for grabs' with the label, so that it can be picked up in the future.

@glozow
Copy link
Member

glozow commented Sep 26, 2022

Closing as this has needed rebase for more than 2 years. Feel free to reopen if you get a chance to work on this again in the future, thanks!

@glozow glozow closed this Sep 26, 2022
@Rspigler
Copy link
Contributor

Mark up for grabs?

@bitcoin bitcoin locked and limited conversation to collaborators Sep 27, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants