-
Notifications
You must be signed in to change notification settings - Fork 35.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prune, rpc: Check undo data when finding pruneheight #29668
base: master
Are you sure you want to change the base?
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code CoverageFor detailed information about the code coverage, see the test coverage report. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
892d6fc
to
296243c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 296243c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
I think this approach works well, but I think I'd prefer passing a mask instead of a bool, which prevents combinatorial explosion of params if we want to add more filtering options in the future. I personally find that a bit easier to quickly understand too. What do you think about something like this (didn't add static asserts and doc updates etc yet):
git diff on 296243c
diff --git a/src/node/blockstorage.cpp b/src/node/blockstorage.cpp
index 74a233f43b..eaefa602d3 100644
--- a/src/node/blockstorage.cpp
+++ b/src/node/blockstorage.cpp
@@ -595,16 +595,12 @@ bool BlockManager::IsBlockPruned(const CBlockIndex& block)
return m_have_pruned && !(block.nStatus & BLOCK_HAVE_DATA) && (block.nTx > 0);
}
-const CBlockIndex* BlockManager::GetFirstStoredBlock(const CBlockIndex& upper_block, const CBlockIndex* lower_block, const bool check_undo)
+const CBlockIndex* BlockManager::GetFirstStoredBlock(const CBlockIndex& upper_block, const CBlockIndex* lower_block, uint32_t status_mask)
{
AssertLockHeld(::cs_main);
- uint32_t check_status{BLOCK_HAVE_DATA};
- if (check_undo) {
- check_status |= BLOCK_HAVE_UNDO;
- }
const CBlockIndex* last_block = &upper_block;
- assert((last_block->nStatus & check_status) == check_status); // 'upper_block' must have data
- while (last_block->pprev && ((last_block->pprev->nStatus & check_status) == check_status)) {
+ assert((last_block->nStatus & status_mask) == status_mask); // 'upper_block' must have data
+ while (last_block->pprev && ((last_block->pprev->nStatus & status_mask) == status_mask)) {
if (lower_block) {
// Return if we reached the lower_block
if (last_block == lower_block) return lower_block;
@@ -617,7 +613,7 @@ const CBlockIndex* BlockManager::GetFirstStoredBlock(const CBlockIndex& upper_bl
assert(last_block != nullptr);
// In the special case that all blocks up to the Genesis block are not
// pruned, we return the Genesis block instead of block 1, even though
- // it can not have any undo data, since it is properly handled as an
+ // it may be missing (e.g. undo) data, since it is properly handled as an
// exception everywhere.
if (last_block->nHeight == 1) {
return last_block->pprev;
diff --git a/src/node/blockstorage.h b/src/node/blockstorage.h
index 33d78259e6..766a170946 100644
--- a/src/node/blockstorage.h
+++ b/src/node/blockstorage.h
@@ -338,7 +338,7 @@ public:
//! Find the first stored ancestor of start_block immediately after the last
//! pruned ancestor. Return value will never be null. Caller is responsible
//! for ensuring that start_block has data.
- const CBlockIndex* GetFirstStoredBlock(const CBlockIndex& start_block LIFETIMEBOUND, const CBlockIndex* lower_block=nullptr, const bool check_undo=false) EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
+ const CBlockIndex* GetFirstStoredBlock(const CBlockIndex& start_block LIFETIMEBOUND, const CBlockIndex* lower_block=nullptr, uint32_t status_mask=BLOCK_HAVE_DATA) EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
/** True if any block files have ever been pruned. */
bool m_have_pruned = false;
diff --git a/src/rpc/blockchain.cpp b/src/rpc/blockchain.cpp
index 3cd9421343..88ed7ed964 100644
--- a/src/rpc/blockchain.cpp
+++ b/src/rpc/blockchain.cpp
@@ -836,7 +836,8 @@ static RPCHelpMan pruneblockchain()
PruneBlockFilesManual(active_chainstate, height);
const CBlockIndex& block{*CHECK_NONFATAL(active_chain.Tip())};
- return block.nStatus & BLOCK_HAVE_MASK ? active_chainstate.m_blockman.GetFirstStoredBlock(block, nullptr, true)->nHeight - 1 : block.nHeight;
+ const uint32_t has_undo{BLOCK_HAVE_DATA | BLOCK_HAVE_UNDO};
+ return block.nStatus & BLOCK_HAVE_MASK ? active_chainstate.m_blockman.GetFirstStoredBlock(block, nullptr, has_undo)->nHeight - 1 : block.nHeight;
},
};
}
@@ -1290,7 +1291,8 @@ RPCHelpMan getblockchaininfo()
obj.pushKV("pruned", chainman.m_blockman.IsPruneMode());
if (chainman.m_blockman.IsPruneMode()) {
bool has_tip_data = tip.nStatus & BLOCK_HAVE_MASK;
- obj.pushKV("pruneheight", has_tip_data ? chainman.m_blockman.GetFirstStoredBlock(tip, nullptr, true)->nHeight : tip.nHeight + 1);
+ const uint32_t has_undo{BLOCK_HAVE_DATA | BLOCK_HAVE_UNDO};
+ obj.pushKV("pruneheight", has_tip_data ? chainman.m_blockman.GetFirstStoredBlock(tip, nullptr, has_undo)->nHeight : tip.nHeight + 1);
const bool automatic_pruning{chainman.m_blockman.GetPruneTarget() != BlockManager::PRUNE_TARGET_MANUAL};
obj.pushKV("automatic_pruning", automatic_pruning);
Code LGTM 296243c, and I verified that the (updated) feature_pruning.py
test fails on master as expected. I'm happy to proceed with your approach too if you don't like mine.
296243c
to
bc36604
Compare
@stickies-v thanks for the feedback! I don't have strong feelings between the approaches but I agree yours is more flexible and maybe a bit more expressive, so I applied that. I also addressed the rest of the feedback and added some more detail to the docs. |
🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the Possibly this is due to a silent merge conflict (the changes in this pull request being Leave a comment here, if you need help tracking down a confusing failure. |
bc36604
to
23ac561
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
q: doesn't this mean that if we generate a chain, prune half of it, then fetch the pruned blocks through getblockfrompeer
, any fresh indexes sync will throw a different init error after this changes? (StartIndexBackgroundSync()
should fail at CheckBlockDataAvailability()
now).
If this is correct, it would be nice to introduce a test presenting the behavior change.
Good question but no, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for incorporating the mask suggestion. Had a deeper look at the code, I think this logic maybe warrants a bit more cleaning up, because with the status_mask, using GetFirstStoredBlock
is not very intuitive anymore and potentially a bit footgunny.
GetFirstStoredBlock
: rename toGetFirstBlock
since theStored
bit depends on thestatus_mask
now. Consequently, makestatus_mask
required to avoid footguns.- let callsites handle genesis block exceptions, because it depends on the status_mask. As a bonus, everything is a bit more explicit?
- some other tidying up, e.g. make parameter naming consistent between header and implementation, improve documentation
Wdyt about this further iteration? I understand it's quite a bit of overhaul compared to your first proposal for what should be a relatively simple change. I'm just hesitant about introducing new boolean parameters when we can avoid it, and I think it's worth improving overall code robustness?
git diff on 23ac561
diff --git a/src/init.cpp b/src/init.cpp
index 1a4fce4678..a8a5ff0dfd 100644
--- a/src/init.cpp
+++ b/src/init.cpp
@@ -2032,7 +2032,7 @@ bool StartIndexBackgroundSync(NodeContext& node)
if (indexes_start_block) {
LOCK(::cs_main);
const CBlockIndex* start_block = *indexes_start_block;
- if (!start_block) start_block = chainman.ActiveChain().Genesis();
+ if (!start_block) start_block = chainman.ActiveChain()[1];
if (!chainman.m_blockman.CheckBlockDataAvailability(*index_chain.Tip(), *Assert(start_block))) {
return InitError(strprintf(Untranslated("%s best block of the index goes beyond pruned data. Please disable the index or reindex (which will download the whole blockchain again)"), older_index_name));
}
diff --git a/src/node/blockstorage.cpp b/src/node/blockstorage.cpp
index eaefa602d3..8556504453 100644
--- a/src/node/blockstorage.cpp
+++ b/src/node/blockstorage.cpp
@@ -595,11 +595,12 @@ bool BlockManager::IsBlockPruned(const CBlockIndex& block)
return m_have_pruned && !(block.nStatus & BLOCK_HAVE_DATA) && (block.nTx > 0);
}
-const CBlockIndex* BlockManager::GetFirstStoredBlock(const CBlockIndex& upper_block, const CBlockIndex* lower_block, uint32_t status_mask)
+const CBlockIndex* BlockManager::GetFirstBlock(const CBlockIndex& upper_block, uint32_t status_mask,
+ const CBlockIndex* lower_block)
{
AssertLockHeld(::cs_main);
const CBlockIndex* last_block = &upper_block;
- assert((last_block->nStatus & status_mask) == status_mask); // 'upper_block' must have data
+ assert((last_block->nStatus & status_mask) == status_mask); // 'upper_block' must satisfy the status mask
while (last_block->pprev && ((last_block->pprev->nStatus & status_mask) == status_mask)) {
if (lower_block) {
// Return if we reached the lower_block
@@ -610,21 +611,13 @@ const CBlockIndex* BlockManager::GetFirstStoredBlock(const CBlockIndex& upper_bl
}
last_block = last_block->pprev;
}
- assert(last_block != nullptr);
- // In the special case that all blocks up to the Genesis block are not
- // pruned, we return the Genesis block instead of block 1, even though
- // it may be missing (e.g. undo) data, since it is properly handled as an
- // exception everywhere.
- if (last_block->nHeight == 1) {
- return last_block->pprev;
- }
- return last_block;
+ return Assert(last_block);
}
bool BlockManager::CheckBlockDataAvailability(const CBlockIndex& upper_block, const CBlockIndex& lower_block)
{
if (!(upper_block.nStatus & BLOCK_HAVE_DATA)) return false;
- return GetFirstStoredBlock(upper_block, &lower_block) == &lower_block;
+ return GetFirstBlock(upper_block, BLOCK_HAVE_DATA, &lower_block) == &lower_block;
}
// If we're using -prune with -reindex, then delete block files that will be ignored by the
diff --git a/src/node/blockstorage.h b/src/node/blockstorage.h
index 8f83b68558..8f309c31b6 100644
--- a/src/node/blockstorage.h
+++ b/src/node/blockstorage.h
@@ -335,13 +335,29 @@ public:
//! (part of the same chain).
bool CheckBlockDataAvailability(const CBlockIndex& upper_block LIFETIMEBOUND, const CBlockIndex& lower_block LIFETIMEBOUND) EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
- //! Find the first stored ancestor of start_block immediately after the last
- //! ancestor that does not match the status mask. Return value will never be
- //! null. Caller is responsible for ensuring that start_block has the status
- //! mask data. If the whole chain matched the status mask the genesis block
- //! will be returned regardless of it's status match because, for example,
- //! it can not have undo data by nature.
- const CBlockIndex* GetFirstStoredBlock(const CBlockIndex& start_block LIFETIMEBOUND, const CBlockIndex* lower_block=nullptr, uint32_t status_mask=BLOCK_HAVE_DATA) EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
+ /**
+ * @brief Finds the furthest away ancestor of `upper_block` of which the nStatus matches the `status_mask`
+ *
+ * Starts from `upper_block` and iterates backwards through its ancestors for as long as the block's
+ * nStatus keeps matching the `status_mask` mask, or until it encounters `lower_block`.
+ *
+ * @pre `upper_block` must match the `status_mask`.
+ *
+ * @param upper_block The block from which to start the search, must meet the `status_mask` criteria.
+ * @param status_mask A bitmask representing the conditions to check against each block's nStatus.
+ * @param lower_block An optional pointer to the `CBlockIndex` that serves as the lower boundary of
+ * the search (inclusive). If `nullptr`, the search includes up to the genesis
+ * block.
+ * @return A (non-null) pointer to the `CBlockIndex` of the furthest away ancestor (including
+ * `upper_block` itself) that matches the `status_mask`, up to and including
+ * `lower_block` if it is part of the ancestry.
+ */
+ const CBlockIndex* GetFirstBlock(
+ const CBlockIndex& upper_block LIFETIMEBOUND,
+ uint32_t status_mask,
+ const CBlockIndex* lower_block = nullptr
+ ) EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
+
/** True if any block files have ever been pruned. */
bool m_have_pruned = false;
diff --git a/src/rpc/blockchain.cpp b/src/rpc/blockchain.cpp
index fa83d2e397..15d6f9e9b9 100644
--- a/src/rpc/blockchain.cpp
+++ b/src/rpc/blockchain.cpp
@@ -94,6 +94,18 @@ double GetDifficulty(const CBlockIndex& blockindex)
return dDiff;
}
+//! Return height of highest block that has been pruned, or -1 if no blocks have been pruned
+static int GetPruneHeight(const Chainstate& active_chainstate) EXCLUSIVE_LOCKS_REQUIRED(!cs_main) {
+ AssertLockHeld(cs_main);
+
+ const CBlockIndex& chain_tip{*CHECK_NONFATAL(active_chainstate.m_chain.Tip())};
+ if (!(chain_tip.nStatus & BLOCK_HAVE_MASK)) return chain_tip.nHeight;
+ const auto first_pruned{active_chainstate.m_blockman.GetFirstBlock(chain_tip, BLOCK_HAVE_MASK)->nHeight - 1};
+
+ // special case: genesis block is never pruned
+ return first_pruned == 0 ? -1 : first_pruned;
+}
+
static int ComputeNextBlockAndDepth(const CBlockIndex& tip, const CBlockIndex& blockindex, const CBlockIndex*& next)
{
next = tip.GetAncestor(blockindex.nHeight + 1);
@@ -835,9 +847,7 @@ static RPCHelpMan pruneblockchain()
}
PruneBlockFilesManual(active_chainstate, height);
- const CBlockIndex& block{*CHECK_NONFATAL(active_chain.Tip())};
- const uint32_t has_undo{BLOCK_HAVE_DATA | BLOCK_HAVE_UNDO};
- return block.nStatus & BLOCK_HAVE_MASK ? active_chainstate.m_blockman.GetFirstStoredBlock(block, /*lower_block=*/nullptr, /*status_mask=*/has_undo)->nHeight - 1 : block.nHeight;
+ return GetPruneHeight(active_chainstate);
},
};
}
@@ -1290,10 +1300,7 @@ RPCHelpMan getblockchaininfo()
obj.pushKV("size_on_disk", chainman.m_blockman.CalculateCurrentUsage());
obj.pushKV("pruned", chainman.m_blockman.IsPruneMode());
if (chainman.m_blockman.IsPruneMode()) {
- bool has_tip_data = tip.nStatus & BLOCK_HAVE_DATA;
- const uint32_t has_undo{BLOCK_HAVE_DATA | BLOCK_HAVE_UNDO};
- obj.pushKV("pruneheight", has_tip_data ? chainman.m_blockman.GetFirstStoredBlock(tip, /*lower_block=*/nullptr, /*status_mask=*/has_undo)->nHeight : tip.nHeight + 1);
-
+ obj.pushKV("pruneheight", GetPruneHeight(active_chainstate) + 1);
const bool automatic_pruning{chainman.m_blockman.GetPruneTarget() != BlockManager::PRUNE_TARGET_MANUAL};
obj.pushKV("automatic_pruning", automatic_pruning);
if (automatic_pruning) {
diff --git a/src/test/blockmanager_tests.cpp b/src/test/blockmanager_tests.cpp
index d7ac0bf823..12f3257700 100644
--- a/src/test/blockmanager_tests.cpp
+++ b/src/test/blockmanager_tests.cpp
@@ -2,6 +2,7 @@
// Distributed under the MIT software license, see the accompanying
// file COPYING or http://www.opensource.org/licenses/mit-license.php.
+#include <chain.h>
#include <chainparams.h>
#include <clientversion.h>
#include <node/blockstorage.h>
@@ -113,7 +114,7 @@ BOOST_FIXTURE_TEST_CASE(blockmanager_block_data_availability, TestChain100Setup)
};
// 1) Return genesis block when all blocks are available
- BOOST_CHECK_EQUAL(blockman.GetFirstStoredBlock(tip), chainman->ActiveChain()[0]);
+ BOOST_CHECK_EQUAL(blockman.GetFirstBlock(tip, BLOCK_HAVE_DATA), chainman->ActiveChain()[0]);
BOOST_CHECK(blockman.CheckBlockDataAvailability(tip, *chainman->ActiveChain()[0]));
// 2) Check lower_block when all blocks are available
@@ -127,7 +128,7 @@ BOOST_FIXTURE_TEST_CASE(blockmanager_block_data_availability, TestChain100Setup)
func_prune_blocks(last_pruned_block);
// 3) The last block not pruned is in-between upper-block and the genesis block
- BOOST_CHECK_EQUAL(blockman.GetFirstStoredBlock(tip), first_available_block);
+ BOOST_CHECK_EQUAL(blockman.GetFirstBlock(tip, BLOCK_HAVE_DATA), first_available_block);
BOOST_CHECK(blockman.CheckBlockDataAvailability(tip, *first_available_block));
BOOST_CHECK(!blockman.CheckBlockDataAvailability(tip, *last_pruned_block));
}
23ac561
to
c77e847
Compare
@stickies-v Thank you, I mostly took your suggestions with some minor tweaks and made you co-author of the refactoring commit. I was a bit torn about letting the caller handle the Genesis block issue but since this was actually the previous behavior I guess it's better to not change it if you like it the way it is. I have adapted EDIT: I had forgotten that |
928141d
to
edbb6a3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think the last 3 commits should be squashed
src/rpc/blockchain.cpp
Outdated
const auto prune_height{GetPruneHeight(active_chainstate)}; | ||
obj.pushKV("pruneheight", prune_height ? prune_height.value() + 1 : 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: more consistent with the (suggested) approach in pruneblockchain
, but then "plus one (only present if pruning is enabled)" as per the docs in this RPC
const auto prune_height{GetPruneHeight(active_chainstate)}; | |
obj.pushKV("pruneheight", prune_height ? prune_height.value() + 1 : 0); | |
const auto prune_height{GetPruneHeight(active_chainstate).value_or(-1)}; | |
obj.pushKV("pruneheight", prune_height + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to keep this one as is because I find it more intuitive.
edbb6a3
to
697d0a2
Compare
Adressed feedback from @stickies-v , thanks again!
The doc change is only tangentially related so I will keep it separate. But I squashed the 2nd and 3rd. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review 697d0a2. Approach seems good and first commit seems ok in its current form but I think version of GetPruneHeight in the second commit is not handling the genesis block correctly.
GetFirstStoredBlock is generalized to check for any data status with a status mask that needs to be passed as a parameter. To reflect this the function is also renamed to GetFirstBlock. Co-authored-by: stickies-v <stickies-v@protonmail.com>
697d0a2
to
0501ec1
Compare
Addressed comments from @ryanofsky , thank you for the thoughtful review! |
The function
GetFirstStoredBlock()
helps us find the first block for which we have data. So far this function only looked for a block withBLOCK_HAVE_DATA
. However, this doesn't mean that we also have the undo data of that block, and undo data might be required for what a user would like to do with those blocks. One example of how this might happen is if some blocks were fetched using thegetblockfrompeer
RPC. Blocks fetched from a peer will have data but no undo data.The first commit here allows
GetFirstStoredBlock()
to check for undo data as well by passing a parameter. This alone is useful for #29553 and I would use it there.In the second commit I am applying the undo check to the RPCs that report
pruneheight
to the user. I find this much more intuitive because I think the user expects to be able to do all operations on blocks up until thepruneheight
but that is not the case if undo data is missing. I personally ran into this once before and now again when testing for assumeutxo when I had usedgetblockfrompeer
. The following commit adds test coverage for this change of behavior.The last commit adds a note in the docs of
getblockfrompeer
that undo data will not be available.