Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

validation: Stricter assumeutxo error handling when renaming chainstates #27862

Merged
merged 2 commits into from Jun 22, 2023

Conversation

ryanofsky
Copy link
Contributor

There are two places in assumeutxo code where it is calling AbortNode to trigger asynchronous shutdowns without returning errors to calling functions.

One case, in LoadChainstate, happens when snapshot validation succeeds, and there is an error trying to replace the background chainstate with the snapshot chainstate.

The other case, in InvalidateCoinsDBOnDisk, happens when snapshot validatiion fails, and there is an error trying to remove the snapshot chainstate.

In both cases the node is being forced to shut down, so it makes sense for these functions to raise errors so callers can know that an error happened without having to infer it from the shutdown state.

Noticed these cases while reviewing #27861, which replaces the AbortNode function with a FatalError function.

@DrahtBot
Copy link
Contributor

DrahtBot commented Jun 12, 2023

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK TheCharlatan, achow101, jamesob

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #27861 (kernel: Add interrupt class by TheCharlatan)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

AbortNode("Background chainstate cleanup failed unexpectedly.");
const auto error = Untranslated("Background chainstate cleanup failed unexpectedly.");
AbortNode(error.original);
return {ChainstateLoadStatus::FAILURE, error};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an observation on the behavior change here. In the current behavior, the function continues execution until it reaches CompleteChainstateInitialization further below. There it calls options.check_interrupt(), which will lead it to return ChainstateLoadStatus::INTERRUPTED. I think making it return a ChainstateLoadStatus::FAILURE now makes more sense, since it is not stopped due to an external interrupt.

src/validation.cpp Outdated Show resolved Hide resolved
Copy link
Contributor Author

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated 7564b87 -> 1ac09b9 (pr/assumeabort.1 -> pr/assumeabort.2, compare) splitting this into two commits and making more changes to InvalidateCoinsDBOnDisk to normalize its error handling

src/validation.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@TheCharlatan TheCharlatan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 1ac09b9

That windows build failure is weird though. How can there be a conflict between std::format and tfm::format, if we are only in the latter's namespace? Why is this only popping up now?

@maflcko
Copy link
Member

maflcko commented Jun 15, 2023

For ref, the msvc error is:

  validation.cpp
C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\src\util\translation.h(66,26): error C2666: 'tinyformat::format': 2 overloads have similar conversions [C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\build_msvc\libbitcoin_node\libbitcoin_node.vcxproj]
C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\src\tinyformat.h(1151,13): message : could be 'std::string tinyformat::format<std::string,std::string>(const std::string &,const std::string &,const std::string &)' [C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\build_msvc\libbitcoin_node\libbitcoin_node.vcxproj]
C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.32.31326\include\format(3103,20): message : or       'std::wstring std::format<std::string,std::string>(const std::_Basic_format_string<wchar_t,std::basic_string<char,std::char_traits<char>,std::allocator<char>>,std::basic_string<char,std::char_traits<char>,std::allocator<char>>>,std::string &&,std::string &&)' [found using argument-dependent lookup] [C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\build_msvc\libbitcoin_node\libbitcoin_node.vcxproj]
C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.32.31326\include\format(3098,19): message : or       'std::string std::format<std::string,std::string>(const std::_Basic_format_string<char,std::basic_string<char,std::char_traits<char>,std::allocator<char>>,std::basic_string<char,std::char_traits<char>,std::allocator<char>>>,std::string &&,std::string &&)' [found using argument-dependent lookup] [C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\build_msvc\libbitcoin_node\libbitcoin_node.vcxproj]
C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\src\util\translation.h(65,1): message : while trying to match the argument list '(const std::string, std::string, std::string)' [C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\build_msvc\libbitcoin_node\libbitcoin_node.vcxproj]
C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\src\util\translation.h(65,1): message : note: qualification adjustment (const/volatile) may be causing the ambiguity [C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\build_msvc\libbitcoin_node\libbitcoin_node.vcxproj]
C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\src\validation.cpp(5433): message : see reference to function template instantiation 'bilingual_str tinyformat::format<bilingual_str,bilingual_str>(const bilingual_str &,const bilingual_str &,const bilingual_str &)' being compiled [C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\build_msvc\libbitcoin_node\libbitcoin_node.vcxproj]
C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\src\util\translation.h(67,26): error C2666: 'tinyformat::format': 2 overloads have similar conversions [C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\build_msvc\libbitcoin_node\libbitcoin_node.vcxproj]
C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\src\tinyformat.h(1151,13): message : could be 'std::string tinyformat::format<std::string,std::string>(const std::string &,const std::string &,const std::string &)' [C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\build_msvc\libbitcoin_node\libbitcoin_node.vcxproj]
C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.32.31326\include\format(3103,20): message : or       'std::wstring std::format<std::string,std::string>(const std::_Basic_format_string<wchar_t,std::basic_string<char,std::char_traits<char>,std::allocator<char>>,std::basic_string<char,std::char_traits<char>,std::allocator<char>>>,std::string &&,std::string &&)' [found using argument-dependent lookup] [C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\build_msvc\libbitcoin_node\libbitcoin_node.vcxproj]
C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.32.31326\include\format(3098,19): message : or       'std::string std::format<std::string,std::string>(const std::_Basic_format_string<char,std::basic_string<char,std::char_traits<char>,std::allocator<char>>,std::basic_string<char,std::char_traits<char>,std::allocator<char>>>,std::string &&,std::string &&)' [found using argument-dependent lookup] [C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\build_msvc\libbitcoin_node\libbitcoin_node.vcxproj]
C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\src\util\translation.h(65,1): message : while trying to match the argument list '(const std::string, std::string, std::string)' [C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\build_msvc\libbitcoin_node\libbitcoin_node.vcxproj]
C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\src\util\translation.h(65,1): message : note: qualification adjustment (const/volatile) may be causing the ambiguity [C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\build_msvc\libbitcoin_node\libbitcoin_node.vcxproj]
  rawtransaction.cpp

@maflcko
Copy link
Member

maflcko commented Jun 15, 2023

No idea if #27892 fixes the compile error, but the best way to find out would be to rebase this pull on top of that?

@hebasto
Copy link
Member

hebasto commented Jun 15, 2023

No idea if #27892 fixes the compile error, but the best way to find out would be to rebase this pull on top of that?

It works.

@hebasto
Copy link
Member

hebasto commented Jun 15, 2023

For ref, the msvc error is:

The other possible solution could be as follows:

--- a/src/util/translation.h
+++ b/src/util/translation.h
@@ -63,8 +63,8 @@ inline T const& TranslateArg(const T& arg, bool translated)
 template <typename... Args>
 bilingual_str format(const bilingual_str& fmt, const Args&... args)
 {
-    return bilingual_str{format(fmt.original, TranslateArg(args, false)...),
-                         format(fmt.translated, TranslateArg(args, true)...)};
+    return bilingual_str{tinyformat::format(fmt.original, TranslateArg(args, false)...),
+                         tinyformat::format(fmt.translated, TranslateArg(args, true)...)};
 }
 } // namespace tinyformat
 

Copy link
Contributor Author

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated 1ac09b9 -> 3984b7c (pr/assumeabort.2 -> pr/assumeabort.3, compare) pulling in #27892 to try to fix the MSVC error, and dropping the AbortNode call in the first commit to make the shutdown sequence more straightforward

@@ -207,7 +207,7 @@ ChainstateLoadResult LoadChainstate(ChainstateManager& chainman, const CacheSize
} else if (snapshot_completion == SnapshotCompletionResult::SUCCESS) {
LogPrintf("[snapshot] cleaning up unneeded background chainstate, then reinitializing\n");
if (!chainman.ValidatedSnapshotCleanup()) {
AbortNode("Background chainstate cleanup failed unexpectedly.");
return {ChainstateLoadStatus::FAILURE, Untranslated("Background chainstate cleanup failed unexpectedly.")};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be FAILURE_FATAL now?

achow101 added a commit to bitcoin-core/gui that referenced this pull request Jun 15, 2023
…en formatting, Fix ADL violation

fa8ef7d refactor: Avoid copy of bilingual_str when formatting, Fix ADL violation (MarcoFalke)

Pull request description:

  This refactor shouldn't change behavior, but may fix compile errors such as bitcoin/bitcoin#27862 (comment)

ACKs for top commit:
  achow101:
    ACK fa8ef7d
  ryanofsky:
    Code review ACK fa8ef7d. Looks great! Thanks for updating
  hebasto:
    ACK fa8ef7d, I have reviewed the code and it looks OK.

Tree-SHA512: 903019962f27b5432b8e3af052b472238ef68d3ee165148c9d2232bf290309075f9f17d8d06c9b5c7fddb89c1a9c3a4c09c6310af01e8561adc0244a30db0857
Make LoadChainstate return an explicit error when snapshot validation succeeds,
but there is an error trying to replace the background chainstate with the
snapshot chainstate. Previously in this case LoadChainstate would trigger a
shutdown and return INTERRUPTED, now it will return an actual error code.

There's no real change to behavior other than error message being formatted a
little differently.

Motivation for this change is to replace error handling via callbacks with
error handling via return value ahead of
bitcoin#27861
…Disk

Currently InvalidateCoinsDBOnDisk is calling AbortNode without an error to the
caller if it fails. Change it to return just return util::Result, and update
the caller to handle the error itself.

This causes the secondary error to be shown below the main error instead of the
other way around.
@ryanofsky
Copy link
Contributor Author

Rebased 3984b7c -> 1c7d08b (pr/assumeabort.3 -> pr/assumeabort.4, compare) on top of #27892 and switched to ChainstateLoadStatus::FAILURE_FATAL in the first commit to avoid triggering reindex suggestion

Copy link
Contributor

@TheCharlatan TheCharlatan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 1c7d08b

@achow101
Copy link
Member

ACK 1c7d08b

As someone slightly unfamiliar with init and shutdown, AbortNode not actually killing the node when it is called is surprising to me. The changes in this PR make these snapshot failure cases line up better with what I had expected.

sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Jun 19, 2023
…tting, Fix ADL violation

fa8ef7d refactor: Avoid copy of bilingual_str when formatting, Fix ADL violation (MarcoFalke)

Pull request description:

  This refactor shouldn't change behavior, but may fix compile errors such as bitcoin#27862 (comment)

ACKs for top commit:
  achow101:
    ACK fa8ef7d
  ryanofsky:
    Code review ACK fa8ef7d. Looks great! Thanks for updating
  hebasto:
    ACK fa8ef7d, I have reviewed the code and it looks OK.

Tree-SHA512: 903019962f27b5432b8e3af052b472238ef68d3ee165148c9d2232bf290309075f9f17d8d06c9b5c7fddb89c1a9c3a4c09c6310af01e8561adc0244a30db0857
Copy link
Member

@jamesob jamesob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 1c7d08b (jamesob/ackr/27862.1.ryanofsky.validation_stricter_assu)

Good, common sense changes. Thanks for this cleanup. Reviewed and built each commit locally.

@achow101 achow101 merged commit 6a47337 into bitcoin:master Jun 22, 2023
15 checks passed
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Jun 22, 2023
…hen renaming chainstates

1c7d08b validation: Stricter assumeutxo error handling in InvalidateCoinsDBOnDisk (Ryan Ofsky)
9047337 validation: Stricter assumeutxo error handling in LoadChainstate (Ryan Ofsky)

Pull request description:

  There are two places in assumeutxo code where it is calling `AbortNode` to trigger asynchronous shutdowns without returning errors to calling functions.

  One case, in `LoadChainstate`, happens when snapshot validation succeeds, and there is an error trying to replace the background chainstate with the snapshot chainstate.

  The other case, in `InvalidateCoinsDBOnDisk`, happens when snapshot validatiion fails, and there is an error trying to remove the snapshot chainstate.

  In both cases the node is being forced to shut down, so it makes sense for these functions to raise errors so callers can know that an error happened without having to infer it from the shutdown state.

  Noticed these cases while reviewing bitcoin#27861, which replaces the `AbortNode` function with a `FatalError` function.

ACKs for top commit:
  achow101:
    ACK 1c7d08b
  TheCharlatan:
    ACK 1c7d08b
  jamesob:
    ACK 1c7d08b ([`jamesob/ackr/27862.1.ryanofsky.validation_stricter_assu`](https://github.com/jamesob/bitcoin/tree/ackr/27862.1.ryanofsky.validation_stricter_assu))

Tree-SHA512: fb1dcde3fa0e77b4ba0c48507d289552b939c2866781579c8e994edc209abc3cd29cf81c89380057199323a8eec484956abb1fd3a43c957ecd0e7f7bbfd63fd8
ryanofsky added a commit that referenced this pull request Sep 29, 2023
d8041d4 blockstorage: Return on fatal undo file flush error (TheCharlatan)
f0207e0 blockstorage: Return on fatal block file flush error (TheCharlatan)
5671c15 blockstorage: Mark FindBlockPos as nodiscard (TheCharlatan)

Pull request description:

  The goal of this PR is to establish that fatal blockstorage flush errors should be treated as errors at their call site.

  Prior to this patch `FlushBlockFile` may have failed without returning in `Chainstate::FlushStateToDisk`, leading to a potential write from `WriteBlockIndexDB` that may refer to a block that is not fully flushed to disk yet. By returning if either `FlushUndoFile` or `FlushBlockFile` fail, we ensure that no further write operations take place that may lead to an inconsistent database when crashing. Add `[[nodiscard]]` annotations to them such that they are not ignored in future.

  Functions that call either `FlushUndoFile` or `FlushBlockFile`, need to handle these extra abort cases properly. Since `Chainstate::FlushStateToDisk` already produces an abort error in case of `WriteBlockIndexDB` failing, no extra logic for functions calling `Chainstate::FlushStateToDisk` is required.

  Besides `Chainstate::FlushStateToDisk`, `FlushBlockFile` is also called by `FindBlockPos`, while `FlushUndoFile` is only called by `FlushBlockFile` and `WriteUndoDataForBlock`. For both these cases, the flush error is not further bubbled up. Instead, the error is logged and a comment is provided why bubbling up an error would be less desirable in these cases.

  ---

  This pull request is part of a larger effort towards improving the shutdown / abort / fatal error handling in validation code. It is a first step towards implementing proper fatal error return type enforcement similar as proposed by theuni in this pull request [comment](#27711 (comment)). For ease of review of these critical changes, a first step would be checking that `AbortNode` leads to early and error-conveying returns at its call site. Further work for enforcing returns when `AbortNode` is called is done in #27862.

ACKs for top commit:
  stickies-v:
    re-ACK d8041d4
  ryanofsky:
    Code review ACK d8041d4

Tree-SHA512: 47ade9b873b15e567c8f60ca538d5a0daf32163e1031be3212a3a45eb492b866664b225f2787c9e40f3e0c089140157d8fd1039abc00c7bdfeec1b52ecd7e219
Frank-GER pushed a commit to syscoin/syscoin that referenced this pull request Oct 5, 2023
d8041d4 blockstorage: Return on fatal undo file flush error (TheCharlatan)
f0207e0 blockstorage: Return on fatal block file flush error (TheCharlatan)
5671c15 blockstorage: Mark FindBlockPos as nodiscard (TheCharlatan)

Pull request description:

  The goal of this PR is to establish that fatal blockstorage flush errors should be treated as errors at their call site.

  Prior to this patch `FlushBlockFile` may have failed without returning in `Chainstate::FlushStateToDisk`, leading to a potential write from `WriteBlockIndexDB` that may refer to a block that is not fully flushed to disk yet. By returning if either `FlushUndoFile` or `FlushBlockFile` fail, we ensure that no further write operations take place that may lead to an inconsistent database when crashing. Add `[[nodiscard]]` annotations to them such that they are not ignored in future.

  Functions that call either `FlushUndoFile` or `FlushBlockFile`, need to handle these extra abort cases properly. Since `Chainstate::FlushStateToDisk` already produces an abort error in case of `WriteBlockIndexDB` failing, no extra logic for functions calling `Chainstate::FlushStateToDisk` is required.

  Besides `Chainstate::FlushStateToDisk`, `FlushBlockFile` is also called by `FindBlockPos`, while `FlushUndoFile` is only called by `FlushBlockFile` and `WriteUndoDataForBlock`. For both these cases, the flush error is not further bubbled up. Instead, the error is logged and a comment is provided why bubbling up an error would be less desirable in these cases.

  ---

  This pull request is part of a larger effort towards improving the shutdown / abort / fatal error handling in validation code. It is a first step towards implementing proper fatal error return type enforcement similar as proposed by theuni in this pull request [comment](bitcoin#27711 (comment)). For ease of review of these critical changes, a first step would be checking that `AbortNode` leads to early and error-conveying returns at its call site. Further work for enforcing returns when `AbortNode` is called is done in bitcoin#27862.

ACKs for top commit:
  stickies-v:
    re-ACK d8041d4
  ryanofsky:
    Code review ACK d8041d4

Tree-SHA512: 47ade9b873b15e567c8f60ca538d5a0daf32163e1031be3212a3a45eb492b866664b225f2787c9e40f3e0c089140157d8fd1039abc00c7bdfeec1b52ecd7e219
Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Apr 19, 2024
Summary:
> validation: Stricter assumeutxo error handling in LoadChainstate
>
> Make LoadChainstate return an explicit error when snapshot validation succeeds,
> but there is an error trying to replace the background chainstate with the
> snapshot chainstate. Previously in this case LoadChainstate would trigger a
> shutdown and return INTERRUPTED, now it will return an actual error code.
>
> There's no real change to behavior other than error message being formatted a
> little differently.
>
> Motivation for this change is to replace error handling via callbacks with
> error handling via return value ahead of
> bitcoin/bitcoin#27861

bitcoin/bitcoin@1c7d08b

> validation: Stricter assumeutxo error handling in InvalidateCoinsDBOnDisk
>
> Currently InvalidateCoinsDBOnDisk is calling AbortNode without an error to the
> caller if it fails. Change it to return just return util::Result, and update
> the caller to handle the error itself.
>
> This causes the secondary error to be shown below the main error instead of the
> other way around.

bitcoin/bitcoin@1c7d08b

This is a backport of [[bitcoin/bitcoin#27862 | core#27862]]

Test Plan:
The only change in behavior is the order of the error messages in logs, but there are currently no tests for these error messages, and there is currently no easy way to test this.
Just check that the code still compiles and does not break existing tested behavior.
`ninja all check-all`

Reviewers: #bitcoin_abc, Fabien

Reviewed By: #bitcoin_abc, Fabien

Subscribers: Fabien

Differential Revision: https://reviews.bitcoinabc.org/D15624
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done or Closed or Rethinking
Development

Successfully merging this pull request may close these issues.

None yet

7 participants