-
Notifications
You must be signed in to change notification settings - Fork 36.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wallet: Fix wallet loading race during node start #19876
Conversation
Thanks for looking at this, will review. |
7f34859
to
dc729f8
Compare
We currently track what wallets are currently loading (see |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just needs a call to syncwithvalidationinterfacequeue
before getbalance
?
Yes, this fixes the test. Initially, I was hoping I could fix the test and the underlying race issue in one PR but since the race fix is a bit more complicated I think it makes sense to split it from the test fix, so the CI failures stop as fast as possible. I opened it in #19887 and made you co-author. |
Is this really a bug? The wallet is just "catching up" (?) |
Yes, it's catching up but while it's doing that it responds with a false balance to a |
The balance is fine I think, the wallet isn't aware of the mempool yet right? So, the wallet itself is loaded, but not synced. |
As a user, I would expect that the wallet would report my own change output in the balance even if the mempool is still syncing. But I will have to review that code and maybe I am off on that. The many reports in the CI failure issue in the span of 2 days suggest that this is hit pretty frequently on startup and I think we should minimize the potential of reporting unexpected balances to users. But if others agree that this is not a big deal, I will close it. |
56b018c test: Fix flaky wallet_basic test (Fabian Jahr) Pull request description: Fixes bitcoin#19853 I investigated the issue in bitcoin#19876 and I still intend to fix the underlying issue of a race when using wallet RPCs right after starting a node in that PR. However, since that is a bit more complicated than I initially thought it makes sense to merge the fix of the test so the intermittent test failures stop. This fix in the test is going to be needed, either way, bitcoin#19876 will only provide an error where before it was reporting a false balance. Top commit has no ACKs. Tree-SHA512: 52bb2388a3e77aa20d26ab0fd45796bc1781483b1cffe49cbb44e2488a72e76998edfb1198495373f9c6fd2ec26064d4176bd1a64dd59806622d5e50a4f4e870
When this bug is fixed, it would be good to remove the temporary workarounds.
|
This is used to prevent race conditions when using wallet RPCs right after starting a node. A wallet that is not synced with the mempool could temporarily report false balances because it's own transactions with change outputs do not appear to be in the mempool and are not trusted because of that.
I think this is ready for review now. It got harder to reproduce the issue with my script for some reason I can not explain. But through detailed logging I am certain that the issue is that We only want to ensure that the mempool sync runs once on startup. A flag indicating that it has in fact run still seems a reasonable solution after thinking about it a bit more, but happy to consider other approaches. Otherwise, I have improved the implementation a lot by not failing anymore when the issue is encountered and I also gave the flag a more reasonable name. I have combined this with a refactor which let me remove a lot of repetitive code. I have also removed the temporary fixes for the tests which used |
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not ensure wallet is ready on load/startup? No changes would be needed on RPC code.
public: | ||
/* | ||
* Main wallet lock. | ||
* This lock protects all the fields added by CWallet. | ||
*/ | ||
mutable RecursiveMutex cs_wallet; | ||
|
||
/** Indicated the wallet was initially synced with the mempool. */ | ||
void setMempoolSynced() { m_mempool_synced = true; }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be private?
pwallet->BlockUntilSyncedToCurrentChain(); | ||
|
||
// Ensure that the mempool has been synced at least once on startup | ||
if (!pwallet->mempoolSynced()) pwallet->syncMempool(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just call pwallet->syncMempool()
and early return there - no need to expose mempool synced flag.
@@ -725,13 +725,22 @@ class CWallet final : public WalletStorage, public interfaces::Chain::Notificati | |||
|
|||
bool CreateTransactionInternal(const std::vector<CRecipient>& vecSend, CTransactionRef& tx, CAmount& nFeeRet, int& nChangePosInOut, bilingual_str& error, const CCoinControl& coin_control, bool sign); | |||
|
|||
/** Wallet was initially synced with the mempool. */ | |||
bool m_mempool_synced{false}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make it atomic? Currently accessed in multiple threads.
@@ -77,6 +77,20 @@ static bool ParseIncludeWatchonly(const UniValue& include_watchonly, const CWall | |||
} | |||
|
|||
|
|||
static CWallet* GetReadyWallet(std::shared_ptr<CWallet> const wallet) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NAK the dereferencing because the wallet can be destroyed after the call, like
CWallet* const pwallet = GetReadyWallet(GetWalletForJSONRPCRequest(request));
Also it's not clear the function does the dereference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the following signature might be better?
static CWallet& GetReadyWallet(CWallet& wallet)
{
🐙 This pull request conflicts with the target branch and needs rebase. Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a "draft". |
56b018c test: Fix flaky wallet_basic test (Fabian Jahr) Pull request description: Fixes bitcoin#19853 I investigated the issue in bitcoin#19876 and I still intend to fix the underlying issue of a race when using wallet RPCs right after starting a node in that PR. However, since that is a bit more complicated than I initially thought it makes sense to merge the fix of the test so the intermittent test failures stop. This fix in the test is going to be needed, either way, bitcoin#19876 will only provide an error where before it was reporting a false balance. Top commit has no ACKs. Tree-SHA512: 52bb2388a3e77aa20d26ab0fd45796bc1781483b1cffe49cbb44e2488a72e76998edfb1198495373f9c6fd2ec26064d4176bd1a64dd59806622d5e50a4f4e870
56b018c test: Fix flaky wallet_basic test (Fabian Jahr) Pull request description: Fixes bitcoin#19853 I investigated the issue in bitcoin#19876 and I still intend to fix the underlying issue of a race when using wallet RPCs right after starting a node in that PR. However, since that is a bit more complicated than I initially thought it makes sense to merge the fix of the test so the intermittent test failures stop. This fix in the test is going to be needed, either way, bitcoin#19876 will only provide an error where before it was reporting a false balance. Top commit has no ACKs. Tree-SHA512: 52bb2388a3e77aa20d26ab0fd45796bc1781483b1cffe49cbb44e2488a72e76998edfb1198495373f9c6fd2ec26064d4176bd1a64dd59806622d5e50a4f4e870
56b018c test: Fix flaky wallet_basic test (Fabian Jahr) Pull request description: Fixes bitcoin#19853 I investigated the issue in bitcoin#19876 and I still intend to fix the underlying issue of a race when using wallet RPCs right after starting a node in that PR. However, since that is a bit more complicated than I initially thought it makes sense to merge the fix of the test so the intermittent test failures stop. This fix in the test is going to be needed, either way, bitcoin#19876 will only provide an error where before it was reporting a false balance. Top commit has no ACKs. Tree-SHA512: 52bb2388a3e77aa20d26ab0fd45796bc1781483b1cffe49cbb44e2488a72e76998edfb1198495373f9c6fd2ec26064d4176bd1a64dd59806622d5e50a4f4e870
56b018c test: Fix flaky wallet_basic test (Fabian Jahr) Pull request description: Fixes bitcoin#19853 I investigated the issue in bitcoin#19876 and I still intend to fix the underlying issue of a race when using wallet RPCs right after starting a node in that PR. However, since that is a bit more complicated than I initially thought it makes sense to merge the fix of the test so the intermittent test failures stop. This fix in the test is going to be needed, either way, bitcoin#19876 will only provide an error where before it was reporting a false balance. Top commit has no ACKs. Tree-SHA512: 52bb2388a3e77aa20d26ab0fd45796bc1781483b1cffe49cbb44e2488a72e76998edfb1198495373f9c6fd2ec26064d4176bd1a64dd59806622d5e50a4f4e870
56b018c test: Fix flaky wallet_basic test (Fabian Jahr) Pull request description: Fixes bitcoin#19853 I investigated the issue in bitcoin#19876 and I still intend to fix the underlying issue of a race when using wallet RPCs right after starting a node in that PR. However, since that is a bit more complicated than I initially thought it makes sense to merge the fix of the test so the intermittent test failures stop. This fix in the test is going to be needed, either way, bitcoin#19876 will only provide an error where before it was reporting a false balance. Top commit has no ACKs. Tree-SHA512: 52bb2388a3e77aa20d26ab0fd45796bc1781483b1cffe49cbb44e2488a72e76998edfb1198495373f9c6fd2ec26064d4176bd1a64dd59806622d5e50a4f4e870
56b018c test: Fix flaky wallet_basic test (Fabian Jahr) Pull request description: Fixes bitcoin#19853 I investigated the issue in bitcoin#19876 and I still intend to fix the underlying issue of a race when using wallet RPCs right after starting a node in that PR. However, since that is a bit more complicated than I initially thought it makes sense to merge the fix of the test so the intermittent test failures stop. This fix in the test is going to be needed, either way, bitcoin#19876 will only provide an error where before it was reporting a false balance. Top commit has no ACKs. Tree-SHA512: 52bb2388a3e77aa20d26ab0fd45796bc1781483b1cffe49cbb44e2488a72e76998edfb1198495373f9c6fd2ec26064d4176bd1a64dd59806622d5e50a4f4e870
56b018c test: Fix flaky wallet_basic test (Fabian Jahr) Pull request description: Fixes bitcoin#19853 I investigated the issue in bitcoin#19876 and I still intend to fix the underlying issue of a race when using wallet RPCs right after starting a node in that PR. However, since that is a bit more complicated than I initially thought it makes sense to merge the fix of the test so the intermittent test failures stop. This fix in the test is going to be needed, either way, bitcoin#19876 will only provide an error where before it was reporting a false balance. Top commit has no ACKs. Tree-SHA512: 52bb2388a3e77aa20d26ab0fd45796bc1781483b1cffe49cbb44e2488a72e76998edfb1198495373f9c6fd2ec26064d4176bd1a64dd59806622d5e50a4f4e870
Summary: > I investigated the issue in [[bitcoin/bitcoin#19876 | core#19876]] and I still intend to fix the underlying issue of a race when using wallet RPCs right after starting a node in that PR. However, since that is a bit more complicated than I initially thought it makes sense to merge the fix of the test so the intermittent test failures stop. This fix in the test is going to be needed, either way, #19876 will only provide an error where before it was reporting a false balance Co-authored-by: João Barbosa <joao.paulo.barbosa@gmail.com> This is a backport of [[bitcoin/bitcoin#19887 | core#19887]] Test Plan: `ninja check-functional` Reviewers: #bitcoin_abc, Fabien Reviewed By: #bitcoin_abc, Fabien Differential Revision: https://reviews.bitcoinabc.org/D10183
There hasn't been much activity lately and the patch still needs rebase. What is the status here?
|
1 similar comment
There hasn't been much activity lately and the patch still needs rebase. What is the status here?
|
Closing this as it has not had any activity in a while. If you are interested in continuing work on this, please leave a comment so that it can be reopened. |
Intended to fix 19853There appears to be a race condition when using the wallet RPCs right after starting a node and that causes the tests to be flaky: in the
wallet_basic.py
a node is restarted andgetbalance
gets called right after.getbalance
then sometimes reports a balance of 0 because the wallet is not loaded yet. It looks like this only appeared after thezapwallettxs
tests were removed from this test but it should not directly be caused by the removal, it rather hides the issue less often. I wouldn't be surprised if this is a known issue that has popped up in a different context already but I didn't find anything on it yet.My naive solution is to track a ready state for
CWallet
but I am very much looking for concept feedback on that. ReturningNull
seems to be the default behavior for wallet RPCs in early failure cases like this. An explicit error might be more helpful for users to figure out what happened but I am opting for consistency for now. Either option is much better than reporting a balance 0 that is incorrect but that could be real and could cause alarm systems to go off because all of a sudden a wallet balance is 0 after a node restarts.Whether this ends up being the final solution or not, after the conceptual discussion I think it will make sense to add this check to other wallet RPCs and I will also check if there are other open flaky wallet test issues that might be caused by the same bug.
To reproduce the issue see the commented out block of code that I left in the test file. This reproduces the issue on ~4 out of 5 tries on my machine.
EDIT: The actual error that appears in the CI is hit here where
node0_balance
is 0 and thensendtoaddress
is called with a negative amount.