Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reverse cs_main, cs_wallet lock order and reduce cs_main locking #16426

Open
wants to merge 13 commits into
base: master
from

Conversation

@ariard
Copy link
Contributor

ariard commented Jul 20, 2019

Follow-up in the set of Chain interface refactoring, it builds on top of #15713 and #15931 to finally remove Chain::Lock interface.

Chain::Lock reason was mostly a temporary interface to help separation of Wallet from Node (#10973) without modifying behavior of legacy wallet code relying on chain state. It was a relief of review burden in the aforementioned separation PR, at the cost of keeping synchronicity between Wallet and Node execution.

Caching more state in CWallet (#15931), let us remove the more used calls like getBlockDepth and move toward a more asynchronous interface. The current PR 1) move all remaining Chain::Lock methods to simple one and ensure cs_main lock is taken inside the interface 2) remove locking of cs_main inside any wallet code, which is another asynchronicity gain.
Lock order has to be swapped atomically at once to avoid potential deadlock issues raised by thread sync verification stuff.

Some future works which would enhance Chain interface:

  • move ScanForWalletTransactions and most we can of rescanning logic on the node side, would let us remove directly methods like guessVerificationProgress or getBlockHeight, meanwhile relieving of the the hardship of multiple simultaneous rescanning processing
  • split listsinceblock into a wallet part and node one, #15931 caching block height in tx, I think that's possible
  • move fee estimation queries to a notification interface and let the wallet cache it, fee estimation being driven by node mempool, wallet should be the consumer
  • obviously remove the RPC specific methods by letting the wallet processing RPC requests without going through the node, need to add --server and --rpcserver options

Prior PRs should be reviewed first before this one. Then to review the present PR, most of getting right the move is ensuring any LockAssertion in Chain::Lock method is amended as a LOCK(cs_main). And in final commit, check that any wallet code which was previously locking the chain is now calling a method, enforcing the lock taking job. So far the only exception I found is handleNotifications, which should be corrected.

@DrahtBot

This comment has been minimized.

Copy link
Contributor

DrahtBot commented Jul 20, 2019

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #17579 ([refactor] Merge getreceivedby tally into GetReceived function by andrewtoth)
  • #17564 (rpc: Use mempool from node context instead of global by MarcoFalke)
  • #17484 (wallet: add cached m_is_ibd to remove Chain::isInitialBlockDownload by ariard)
  • #17443 (Use Median Time Past to check finality of wallet transactions by ariard)
  • #17261 (Make ScriptPubKeyMan an actual interface and the wallet to have multiple by achow101)
  • #16963 (wallet: Fix unique_ptr usage in boost::signals2 by promag)
  • #16224 (gui: Bilingual GUI error messages by hebasto)
  • #15606 ([experimental] UTXO snapshots by jamesob)
  • #14707 ([RPC] Include coinbase transactions in receivedby RPCs by andrewtoth)
  • #9381 (Remove CWalletTx merging logic from AddToWallet by ryanofsky)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@ryanofsky

This comment has been minimized.

Copy link
Contributor

ryanofsky commented Jul 20, 2019

Great work! I'd suggest changing the PR title to something like "Reverse cs_main, cs_wallet lock order and reduce cs_main locking" to motivate it better and describe the change.

The current title "Remove Chain::Lock interface" and starting sentence "Follow-up in the set of Chain interface refactoring" make this sound mostly like a refactoring. But this is more of a locking change, and a change to make the wallet more asynchronous and event driven than a refactoring change.

@ariard ariard changed the title Remove Chain::Lock interface Reverse cs_main, cs_wallet lock order and reduce cs_main locking Jul 20, 2019
@practicalswift

This comment has been minimized.

Copy link
Member

practicalswift commented Jul 22, 2019

Concept ACK

Excellent work!

@MarcoFalke

This comment has been minimized.

Copy link
Member

MarcoFalke commented Jul 22, 2019

Concept ACK. This might help IBD, because cs_main had to be acquired due to the lock order requirement. If it doesn't help IBD, the change will hopefully still speed up the msghand thread because the wallets take the main lock less.

@jnewbery

This comment has been minimized.

Copy link
Member

jnewbery commented Jul 22, 2019

Big concept ACK! Thanks @ariard

@promag

This comment has been minimized.

Copy link
Member

promag commented Jul 25, 2019

Concept ACK, mother of god, not locking cs_main!

{
// Quick answer in most cases
if (!locked_chain.checkFinalTx(*tx)) {
if (!pwallet->chain().checkFinalTx(*tx)) {

This comment has been minimized.

Copy link
@ryanofsky

ryanofsky Nov 5, 2019

Contributor

Instead of adding an interfaces::Chain::checkFinalTx method, it might be better to call IsFinalTx directly with the wallet's height and time, avoiding going through CheckFinalTx.

This could perform better since it wouldn't require locking cs_main, and could make wallet calls return more internally consistent information when the wallet is catching up to the chain.

@ariard ariard force-pushed the ariard:2019-07-remove-more-locking-chain branch from a052c37 to f057aed Nov 11, 2019
@DrahtBot DrahtBot removed the Needs rebase label Nov 11, 2019
@ariard

This comment has been minimized.

Copy link
Contributor Author

ariard commented Nov 11, 2019

Finally rebased after merge of #15931, PR should be ready for review.

Apart of 3efe38d which use the new m_last_block_processed_height to avoid querying the chainstate and introduce some modifications, other commits are pretty straight-forward. It's just taking cs_main inside the Chain implementation instead of using Chain::lock. Lock order is reversed only in last commit f057aed to avoid any deadlock issue.

If you still feel PR is hard to review, I can subsplit easily in another PR.

@ryanofsky

This comment has been minimized.

Copy link
Contributor

ryanofsky commented Nov 11, 2019

Approach ACK. Code changes here are very simple after #15931.

All the changes before the last commit f057aed seem straightforward and don't change behavior other than locking cs_main in new places recursively, so all the new locks are no-ops.

Only the last commit f057aed is the big scary change that removes cs_main locks all over the wallet, leaving us to hope that remaining locking is sufficient and that stretches of wallet code that used to run under cs_main aren't making assumptions about the tip not changing and other wallet threads not running.


I'll review this PR more closely this week. It might be nice to simplify the PR description now that #15931 is merged. I think the description just basically needs to say that interfaces::Chain::Lock is a wrapper around around cs_main, and that this PR changes wallet code to not lock and unlock cs_main directly anymore, and not keep cs_main locked while it locks cs_wallet and does other work. Instead, after this PR, wallet code only locks cs_main indirectly and intermittently when it needs to look up bits of chain information, and never holds onto cs_main while it does other work.

@jnewbery

This comment has been minimized.

Copy link
Member

jnewbery commented Nov 11, 2019

Restating my concept ACK. I plan to review this fully soon.

Thanks for rebasing this so quickly

@ariard ariard force-pushed the ariard:2019-07-remove-more-locking-chain branch from f057aed to 8aea39a Nov 11, 2019
@ariard

This comment has been minimized.

Copy link
Contributor Author

ariard commented Nov 11, 2019

Only the last commit f057aed is the big scary change that removes cs_main locks all over the wallet, leaving us to hope that remaining locking is sufficient and that stretches of wallet code that used to run under cs_main aren't making assumptions about the tip not changing and other wallet threads not running.

Most of the code making assumptions about the tip is confined in the rescan logic, so I think this one should get the focus, you can grep for all methods fetching tip like getHeight, getBlockHeight getX and check no decision is made on return value of 2 different calls of them. Concerning other wallet threads, it should be cover by wallet lock in itself.

@ariard ariard force-pushed the ariard:2019-07-remove-more-locking-chain branch 5 times, most recently from 7ef1044 to 0bdbc43 Nov 12, 2019
@ariard ariard force-pushed the ariard:2019-07-remove-more-locking-chain branch from 0bdbc43 to f59c5c2 Nov 21, 2019
@meshcollider

This comment has been minimized.

Copy link
Member

meshcollider commented Nov 22, 2019

Concept ACK

@@ -565,7 +565,7 @@ UniValue importwallet(const JSONRPCRequest& request)
if (!file.is_open()) {
throw JSONRPCError(RPC_INVALID_PARAMETER, "Cannot open wallet dump file");
}
Optional<int> tip_height = locked_chain->getHeight();
Optional<int> tip_height = pwallet->chain().getHeight();

This comment has been minimized.

Copy link
@ryanofsky

ryanofsky Dec 2, 2019

Contributor

Why use chain height instead of wallet height here and other places? Anywhere the wallet is locked and the chain isn't locked it would seem safer to use the wallet height.

This comment has been minimized.

Copy link
@ariard

ariard Dec 3, 2019

Author Contributor

I think I can switch for the one in CreateTransaction, all left are tied to the rescan logic which doesn't call BlockConnected and so doesn't update m_last_block_processed accurately. I can rework a bit the rescan logic to make it work on m_last_block_processed_height but I felt it was a bit of scope and an increased burden for reviewers.

As you know I plan to rework the rescan logic after this PR. At term it would remove all getHeight callsites`.

This comment has been minimized.

Copy link
@ryanofsky

ryanofsky Dec 3, 2019

Contributor

Thanks, chain height seems fine for rescan logic, and is what I'd expect there. I think the other places should use wallet height if there isn't an explicit reason not to.

This comment has been minimized.

Copy link
@ariard

ariard Dec 3, 2019

Author Contributor

Corrected on 24f40fc, there was only one occurrence in CreateTransaction. All others are tied to rescan logic, including ones in rpcdump/rpcwallet.

@practicalswift

This comment has been minimized.

Copy link
Member

practicalswift commented Dec 4, 2019

Would be interesting to see the impact (if any) on IBD :)

@ariard

This comment has been minimized.

Copy link
Contributor Author

ariard commented Dec 4, 2019

@jonatack that's a compiler warning and --enable-debug is about turning on our deadlock detection with conditional compilation they are not related.

@jkczyz

This comment has been minimized.

Copy link
Contributor

jkczyz commented Dec 4, 2019

I'm getting the warning at df508be on MacOS using clang as before.

@ariard ariard force-pushed the ariard:2019-07-remove-more-locking-chain branch from df508be to df17e36 Dec 4, 2019
ariard added 13 commits Jul 10, 2019
Instead of calling getHeight, we rely on CWallet::m_last_block
processed_height where it can.

Apart of tests, we avoid adding new locks and using
m_last_block_processed_height in rescan logic to avoid to
increase burden review.
Add HaveChain to assert chain acces for wallet-tool in LoadToWallet.
Refactor rescanblockchain rpc to avoid unitialization warning
Lock cs_main in handleNotifications as it was relaying on its caller
to do so before.

Remove LockChain method from CWallet.

Finally remove Chain::Lock interface.
@ariard

This comment has been minimized.

Copy link
Contributor Author

ariard commented Dec 4, 2019

@jkczyz at 3773b41, add a AssertLockHeld in CreateTransaction after taking cs_wallet, it should avoid clang false positive warning according to documentation "utility class for indicating to compiler thread analysis that a mutex is locked (when it couldn't be determined otherwise)"

@ariard ariard force-pushed the ariard:2019-07-remove-more-locking-chain branch from df17e36 to 3773b41 Dec 4, 2019
@jkczyz

This comment has been minimized.

Copy link
Contributor

jkczyz commented Dec 4, 2019

@jkczyz at 3773b41, add a AssertLockHeld in CreateTransaction after taking cs_wallet, it should avoid clang false positive warning according to documentation "utility class for indicating to compiler thread analysis that a mutex is locked (when it couldn't be determined otherwise)"

Looks like I had a problem on my end. I don't get the warning with df508be after all. Sorry about the confusion!

@jonatack

This comment has been minimized.

Copy link
Member

jonatack commented Dec 4, 2019

@jonatack that's a compiler warning and --enable-debug is about turning on our deadlock detection with conditional compilation they are not related.

Thanks. Agreed. Verified locally: warning present on 24f40fc with Debian+Clang without passing --enable-debug.

  CC            = /usr/bin/ccache clang
  CFLAGS        = -g -O2
  CPPFLAGS      =   -U_FORTIFY_SOURCE  -D_FORTIFY_SOURCE=2  -Qunused-arguments
                    -DHAVE_BUILD_INFO  -D__STDC_FORMAT_MACROS
  CXX           = /usr/bin/ccache clang++ -std=c++11
  CXXFLAGS      =   -Wstack-protector -fstack-protector-all  -Wall -Wextra -Wformat
                    -Wvla -Wswitch -Wformat-security -Wthread-safety-analysis
                    -Wrange-loop-analysis -Wredundant-decls  -Wno-unused-parameter
                    -Wno-self-assign -Wno-unused-local-typedef -Wno-deprecated-register
                    -Wno-implicit-fallthrough   -g -O2
@jonatack

This comment has been minimized.

Copy link
Member

jonatack commented Dec 4, 2019

@ariard what changed from df508be to df17e36 to 3773b41?

@ariard

This comment has been minimized.

Copy link
Contributor Author

ariard commented Dec 4, 2019

@jonatack Adding the AssertLockHeld to be sure to not raise false positive warning from the compiler.

@jonatack

This comment has been minimized.

Copy link
Member

jonatack commented Dec 5, 2019

@ariard 3773b41 builds without warnings or errors. Where exactly did you add the AssertLockHeld? I don't see it in CreateTransaction, nor in the diff of the last commit, and I'm not getting git range-diff to help me. Help a dumb reviewer out :D

@laanwj laanwj added this to Blockers in High-priority for review Dec 5, 2019
@ariard

This comment has been minimized.

Copy link
Contributor Author

ariard commented Dec 5, 2019

@jonatack src/wallet/wallet.cpp L2553 on head commit. But you should really recompile latest git to get git range-diff it changes your life!

@instagibbs

This comment has been minimized.

Copy link
Member

instagibbs commented Dec 5, 2019

could you update the OP? It's slightly outdated due to dependent PRs being merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
You can’t perform that action at this time.