New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
validation: Stricter assumeutxo error handling when renaming chainstates #27862
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
src/node/chainstate.cpp
Outdated
AbortNode("Background chainstate cleanup failed unexpectedly."); | ||
const auto error = Untranslated("Background chainstate cleanup failed unexpectedly."); | ||
AbortNode(error.original); | ||
return {ChainstateLoadStatus::FAILURE, error}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an observation on the behavior change here. In the current behavior, the function continues execution until it reaches CompleteChainstateInitialization
further below. There it calls options.check_interrupt()
, which will lead it to return ChainstateLoadStatus::INTERRUPTED
. I think making it return a ChainstateLoadStatus::FAILURE
now makes more sense, since it is not stopped due to an external interrupt.
7564b87
to
1ac09b9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated 7564b87 -> 1ac09b9 (pr/assumeabort.1
-> pr/assumeabort.2
, compare) splitting this into two commits and making more changes to InvalidateCoinsDBOnDisk
to normalize its error handling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 1ac09b9
That windows build failure is weird though. How can there be a conflict between std::format
and tfm::format
, if we are only in the latter's namespace? Why is this only popping up now?
For ref, the msvc error is:
|
No idea if #27892 fixes the compile error, but the best way to find out would be to rebase this pull on top of that? |
The other possible solution could be as follows: --- a/src/util/translation.h
+++ b/src/util/translation.h
@@ -63,8 +63,8 @@ inline T const& TranslateArg(const T& arg, bool translated)
template <typename... Args>
bilingual_str format(const bilingual_str& fmt, const Args&... args)
{
- return bilingual_str{format(fmt.original, TranslateArg(args, false)...),
- format(fmt.translated, TranslateArg(args, true)...)};
+ return bilingual_str{tinyformat::format(fmt.original, TranslateArg(args, false)...),
+ tinyformat::format(fmt.translated, TranslateArg(args, true)...)};
}
} // namespace tinyformat
|
1ac09b9
to
3984b7c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated 1ac09b9 -> 3984b7c (pr/assumeabort.2
-> pr/assumeabort.3
, compare) pulling in #27892 to try to fix the MSVC error, and dropping the AbortNode call in the first commit to make the shutdown sequence more straightforward
src/node/chainstate.cpp
Outdated
@@ -207,7 +207,7 @@ ChainstateLoadResult LoadChainstate(ChainstateManager& chainman, const CacheSize | |||
} else if (snapshot_completion == SnapshotCompletionResult::SUCCESS) { | |||
LogPrintf("[snapshot] cleaning up unneeded background chainstate, then reinitializing\n"); | |||
if (!chainman.ValidatedSnapshotCleanup()) { | |||
AbortNode("Background chainstate cleanup failed unexpectedly."); | |||
return {ChainstateLoadStatus::FAILURE, Untranslated("Background chainstate cleanup failed unexpectedly.")}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be FAILURE_FATAL
now?
…en formatting, Fix ADL violation fa8ef7d refactor: Avoid copy of bilingual_str when formatting, Fix ADL violation (MarcoFalke) Pull request description: This refactor shouldn't change behavior, but may fix compile errors such as bitcoin/bitcoin#27862 (comment) ACKs for top commit: achow101: ACK fa8ef7d ryanofsky: Code review ACK fa8ef7d. Looks great! Thanks for updating hebasto: ACK fa8ef7d, I have reviewed the code and it looks OK. Tree-SHA512: 903019962f27b5432b8e3af052b472238ef68d3ee165148c9d2232bf290309075f9f17d8d06c9b5c7fddb89c1a9c3a4c09c6310af01e8561adc0244a30db0857
Make LoadChainstate return an explicit error when snapshot validation succeeds, but there is an error trying to replace the background chainstate with the snapshot chainstate. Previously in this case LoadChainstate would trigger a shutdown and return INTERRUPTED, now it will return an actual error code. There's no real change to behavior other than error message being formatted a little differently. Motivation for this change is to replace error handling via callbacks with error handling via return value ahead of bitcoin#27861
…Disk Currently InvalidateCoinsDBOnDisk is calling AbortNode without an error to the caller if it fails. Change it to return just return util::Result, and update the caller to handle the error itself. This causes the secondary error to be shown below the main error instead of the other way around.
3984b7c
to
1c7d08b
Compare
Rebased 3984b7c -> 1c7d08b ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 1c7d08b
ACK 1c7d08b As someone slightly unfamiliar with init and shutdown, |
…tting, Fix ADL violation fa8ef7d refactor: Avoid copy of bilingual_str when formatting, Fix ADL violation (MarcoFalke) Pull request description: This refactor shouldn't change behavior, but may fix compile errors such as bitcoin#27862 (comment) ACKs for top commit: achow101: ACK fa8ef7d ryanofsky: Code review ACK fa8ef7d. Looks great! Thanks for updating hebasto: ACK fa8ef7d, I have reviewed the code and it looks OK. Tree-SHA512: 903019962f27b5432b8e3af052b472238ef68d3ee165148c9d2232bf290309075f9f17d8d06c9b5c7fddb89c1a9c3a4c09c6310af01e8561adc0244a30db0857
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 1c7d08b (jamesob/ackr/27862.1.ryanofsky.validation_stricter_assu
)
Good, common sense changes. Thanks for this cleanup. Reviewed and built each commit locally.
…hen renaming chainstates 1c7d08b validation: Stricter assumeutxo error handling in InvalidateCoinsDBOnDisk (Ryan Ofsky) 9047337 validation: Stricter assumeutxo error handling in LoadChainstate (Ryan Ofsky) Pull request description: There are two places in assumeutxo code where it is calling `AbortNode` to trigger asynchronous shutdowns without returning errors to calling functions. One case, in `LoadChainstate`, happens when snapshot validation succeeds, and there is an error trying to replace the background chainstate with the snapshot chainstate. The other case, in `InvalidateCoinsDBOnDisk`, happens when snapshot validatiion fails, and there is an error trying to remove the snapshot chainstate. In both cases the node is being forced to shut down, so it makes sense for these functions to raise errors so callers can know that an error happened without having to infer it from the shutdown state. Noticed these cases while reviewing bitcoin#27861, which replaces the `AbortNode` function with a `FatalError` function. ACKs for top commit: achow101: ACK 1c7d08b TheCharlatan: ACK 1c7d08b jamesob: ACK 1c7d08b ([`jamesob/ackr/27862.1.ryanofsky.validation_stricter_assu`](https://github.com/jamesob/bitcoin/tree/ackr/27862.1.ryanofsky.validation_stricter_assu)) Tree-SHA512: fb1dcde3fa0e77b4ba0c48507d289552b939c2866781579c8e994edc209abc3cd29cf81c89380057199323a8eec484956abb1fd3a43c957ecd0e7f7bbfd63fd8
d8041d4 blockstorage: Return on fatal undo file flush error (TheCharlatan) f0207e0 blockstorage: Return on fatal block file flush error (TheCharlatan) 5671c15 blockstorage: Mark FindBlockPos as nodiscard (TheCharlatan) Pull request description: The goal of this PR is to establish that fatal blockstorage flush errors should be treated as errors at their call site. Prior to this patch `FlushBlockFile` may have failed without returning in `Chainstate::FlushStateToDisk`, leading to a potential write from `WriteBlockIndexDB` that may refer to a block that is not fully flushed to disk yet. By returning if either `FlushUndoFile` or `FlushBlockFile` fail, we ensure that no further write operations take place that may lead to an inconsistent database when crashing. Add `[[nodiscard]]` annotations to them such that they are not ignored in future. Functions that call either `FlushUndoFile` or `FlushBlockFile`, need to handle these extra abort cases properly. Since `Chainstate::FlushStateToDisk` already produces an abort error in case of `WriteBlockIndexDB` failing, no extra logic for functions calling `Chainstate::FlushStateToDisk` is required. Besides `Chainstate::FlushStateToDisk`, `FlushBlockFile` is also called by `FindBlockPos`, while `FlushUndoFile` is only called by `FlushBlockFile` and `WriteUndoDataForBlock`. For both these cases, the flush error is not further bubbled up. Instead, the error is logged and a comment is provided why bubbling up an error would be less desirable in these cases. --- This pull request is part of a larger effort towards improving the shutdown / abort / fatal error handling in validation code. It is a first step towards implementing proper fatal error return type enforcement similar as proposed by theuni in this pull request [comment](#27711 (comment)). For ease of review of these critical changes, a first step would be checking that `AbortNode` leads to early and error-conveying returns at its call site. Further work for enforcing returns when `AbortNode` is called is done in #27862. ACKs for top commit: stickies-v: re-ACK d8041d4 ryanofsky: Code review ACK d8041d4 Tree-SHA512: 47ade9b873b15e567c8f60ca538d5a0daf32163e1031be3212a3a45eb492b866664b225f2787c9e40f3e0c089140157d8fd1039abc00c7bdfeec1b52ecd7e219
d8041d4 blockstorage: Return on fatal undo file flush error (TheCharlatan) f0207e0 blockstorage: Return on fatal block file flush error (TheCharlatan) 5671c15 blockstorage: Mark FindBlockPos as nodiscard (TheCharlatan) Pull request description: The goal of this PR is to establish that fatal blockstorage flush errors should be treated as errors at their call site. Prior to this patch `FlushBlockFile` may have failed without returning in `Chainstate::FlushStateToDisk`, leading to a potential write from `WriteBlockIndexDB` that may refer to a block that is not fully flushed to disk yet. By returning if either `FlushUndoFile` or `FlushBlockFile` fail, we ensure that no further write operations take place that may lead to an inconsistent database when crashing. Add `[[nodiscard]]` annotations to them such that they are not ignored in future. Functions that call either `FlushUndoFile` or `FlushBlockFile`, need to handle these extra abort cases properly. Since `Chainstate::FlushStateToDisk` already produces an abort error in case of `WriteBlockIndexDB` failing, no extra logic for functions calling `Chainstate::FlushStateToDisk` is required. Besides `Chainstate::FlushStateToDisk`, `FlushBlockFile` is also called by `FindBlockPos`, while `FlushUndoFile` is only called by `FlushBlockFile` and `WriteUndoDataForBlock`. For both these cases, the flush error is not further bubbled up. Instead, the error is logged and a comment is provided why bubbling up an error would be less desirable in these cases. --- This pull request is part of a larger effort towards improving the shutdown / abort / fatal error handling in validation code. It is a first step towards implementing proper fatal error return type enforcement similar as proposed by theuni in this pull request [comment](bitcoin#27711 (comment)). For ease of review of these critical changes, a first step would be checking that `AbortNode` leads to early and error-conveying returns at its call site. Further work for enforcing returns when `AbortNode` is called is done in bitcoin#27862. ACKs for top commit: stickies-v: re-ACK d8041d4 ryanofsky: Code review ACK d8041d4 Tree-SHA512: 47ade9b873b15e567c8f60ca538d5a0daf32163e1031be3212a3a45eb492b866664b225f2787c9e40f3e0c089140157d8fd1039abc00c7bdfeec1b52ecd7e219
Summary: > validation: Stricter assumeutxo error handling in LoadChainstate > > Make LoadChainstate return an explicit error when snapshot validation succeeds, > but there is an error trying to replace the background chainstate with the > snapshot chainstate. Previously in this case LoadChainstate would trigger a > shutdown and return INTERRUPTED, now it will return an actual error code. > > There's no real change to behavior other than error message being formatted a > little differently. > > Motivation for this change is to replace error handling via callbacks with > error handling via return value ahead of > bitcoin/bitcoin#27861 bitcoin/bitcoin@1c7d08b > validation: Stricter assumeutxo error handling in InvalidateCoinsDBOnDisk > > Currently InvalidateCoinsDBOnDisk is calling AbortNode without an error to the > caller if it fails. Change it to return just return util::Result, and update > the caller to handle the error itself. > > This causes the secondary error to be shown below the main error instead of the > other way around. bitcoin/bitcoin@1c7d08b This is a backport of [[bitcoin/bitcoin#27862 | core#27862]] Test Plan: The only change in behavior is the order of the error messages in logs, but there are currently no tests for these error messages, and there is currently no easy way to test this. Just check that the code still compiles and does not break existing tested behavior. `ninja all check-all` Reviewers: #bitcoin_abc, Fabien Reviewed By: #bitcoin_abc, Fabien Subscribers: Fabien Differential Revision: https://reviews.bitcoinabc.org/D15624
There are two places in assumeutxo code where it is calling
AbortNode
to trigger asynchronous shutdowns without returning errors to calling functions.One case, in
LoadChainstate
, happens when snapshot validation succeeds, and there is an error trying to replace the background chainstate with the snapshot chainstate.The other case, in
InvalidateCoinsDBOnDisk
, happens when snapshot validatiion fails, and there is an error trying to remove the snapshot chainstate.In both cases the node is being forced to shut down, so it makes sense for these functions to raise errors so callers can know that an error happened without having to infer it from the shutdown state.
Noticed these cases while reviewing #27861, which replaces the
AbortNode
function with aFatalError
function.