Call wallet notify callbacks in scheduler thread (without cs_main) #10286

Open
wants to merge 11 commits into
from

Conversation

Projects
None yet
4 participants
Contributor

TheBlueMatt commented Apr 27, 2017

Based on #10179, this effectively reverts #9583, regaining most of the original speedups of #7946.

This concludes the work of #9725, #10178, and #10179.

See individual commit messages for more information.

@ryanofsky

utACK 6fb5719.

Code changes mostly seem great, though as you can tell from my comments I have a somewhat hazy understanding of the semantics and assumptions being made. A little more documentation would make everything clear, I think.

src/txmempool.h
@@ -511,6 +511,9 @@ class CTxMemPool
// to track size/count of descendant transactions. First version of
// addUnchecked can be used to have it call CalculateMemPoolAncestors(), and
// then invoke the second version.
+ // Note that addUnchecked is ONLY called from ATMP during normal operation,
@ryanofsky

ryanofsky May 2, 2017

Contributor

In commit "Add a CValidationInterface::TransactionRemovedFromMempool"

Unclear to me what a normal operation is. Comment might be clearer mentioning a not normal counterexample.

src/txmempool.h
@@ -511,6 +511,9 @@ class CTxMemPool
// to track size/count of descendant transactions. First version of
// addUnchecked can be used to have it call CalculateMemPoolAncestors(), and
// then invoke the second version.
+ // Note that addUnchecked is ONLY called from ATMP during normal operation,
+ // and any other callers may break wallet's in-mempool tracking (due to
+ // lack of CValidationInterface::TransactionAddedToMempool callbacks).
@ryanofsky

ryanofsky May 2, 2017

Contributor

In commit "Add a CValidationInterface::TransactionRemovedFromMempool"

What does this imply? Just that if there are any new calls to addUnchecked, the caller also needs to signal TransactionAddedToMempool not to break the wallet? Would say this in the comment explicitly if this is the case.

@TheBlueMatt

TheBlueMatt May 3, 2017

Contributor

Updated the comment to mention that addUnchecked is only called from ATMP outside of tests period. I think the implication is that we need to fix the strong-coupling here.

src/wallet/wallet.cpp
+ if (this->lastBlockProcessed == chainActive.Tip()) {
+ return true;
+ }
+ // If the user called invalidatechain some things might block
@ryanofsky

ryanofsky May 2, 2017

Contributor

In commit "Add CWallet::BlockUntilSyncedToCurrentChain()"

Does "some things might block forever" just mean this wait might block forever? If so, maybe be more concrete and say something like "lastBlockProcessed will not be rewound back to chainActive.Tip()." Otherwise it would be good to clarify what some things is referring to.

src/wallet/wallet.h
+ * Blocks until the wallet state is up-to-date to /at least/ the current
+ * chain at the time this function is entered
+ * Obviously holding cs_main/cs_wallet when going into this call may cause
+ * deadlock
@ryanofsky

ryanofsky May 2, 2017

Contributor

In commit "Add CWallet::BlockUntilSyncedToCurrentChain()"

Stray tab here

src/wallet/rpcwallet.cpp
@@ -2648,6 +2712,10 @@ UniValue fundrawtransaction(const JSONRPCRequest& request)
RPCTypeCheck(request.params, boost::assign::list_of(UniValue::VSTR));
+ // Make sure the results are valid at least up to the most recent block
@ryanofsky

ryanofsky May 2, 2017

Contributor

In commit "Add calls to CWallet::BlockUntilSyncedToCurrentChain() in RPCs"

Can you give an example of specific bug that could occur without these BlockUntilSynced calls and is prevented by adding them? I looked at some of the old issues (#9584, #9148, etc), but they're confusing and I don't know how much of the information is up to date.

It would be great if BlockUntilSyncedToCurrentChain had a comment that made it clearer when it does and doesn't need to be called, and what consistency issues it is and isn't supposed to solve.

Maybe there should also be a bullet point in the new RPC interface guidelines about what kind of consistency wallet RPCs are expected to have.

test/functional/zmq_test.py
+ if topic == b"hashblock":
+ blkhash = bytes_to_hex_str(body)
+ else:
+ assert_equal(topic, b"hashtx")
msg = self.zmqSubSocket.recv_multipart()
topic = msg[0]
@ryanofsky

ryanofsky May 2, 2017

Contributor

In commit "Fix zmq tests now that txn/blocks are unordered"

Maybe assert msg[0] != topic above this line to confirm actually receive distinct hashtx and hashblock messages (not two hashblocks).

Contributor

TheBlueMatt commented May 3, 2017

Rebased and fixed @ryanofsky's mostly-comment nits :).

@mchrostowski

mchrostowski suggested changes May 3, 2017 edited

Overall appears to be on a good track. It looks to me like the global lock (cs_main) is causing some serious confusion/issues and there is some general mistake in the pattern of locks or their encapsulation that makes this all difficult.

I reviewed everything pretty closely aside from the ZMQ test changes, that went over my head.

Please don't overlook the outdated validationinterface.cpp comments, those took some time to put together.

src/qt/test/rpcnestedtests.cpp
@@ -12,6 +12,7 @@
#include "rpc/server.h"
@mchrostowski

mchrostowski May 3, 2017

These changes do not appear to be related to the rest. Am I missing something or should this be in its own PR?
I believe @sipa made a similar comment on #10179

@TheBlueMatt

TheBlueMatt May 4, 2017

Contributor

Without them test_bitcoin-qt segfaults.

src/validationinterface.cpp
@@ -104,7 +104,9 @@ void UnregisterAllValidationInterfaces() {
void CMainSignals::MempoolEntryRemoved(CTransactionRef ptx, MemPoolRemovalReason reason) {
if (reason != MemPoolRemovalReason::BLOCK && reason != MemPoolRemovalReason::CONFLICT) {
- internals->TransactionRemovedFromMempool(ptx);
+ internals->schedulerClient.AddToProcessQueue([ptx, this] {
@mchrostowski

mchrostowski May 3, 2017

This is a fine solution. All these lock inversion concerns make me wonder if there isn't a more serious issue regarding lack of proper encapsulation with some of these locks. I'm sure global locks (cs_main) don't help either, can't imagine actually needing a global lock.

@TheBlueMatt

TheBlueMatt May 4, 2017

Contributor

This is being run in a background thread, so there are no possible lock inversions? We've had great success with DEBUG_LOCKORDER and havent had any serious deadlock issues afair since the 0.3.X era.

src/validationinterface.cpp
static CMainSignals g_signals;
+CMainSignals::CMainSignals() {
@mchrostowski

mchrostowski May 3, 2017

This would be safer/faster/cleaner with : internals(new CMainSignalsInstance()) {} instead of the body.

Initializer lists guarantee proper cross-thread visibility, otherwise you might init twice and have sharing issues.

@TheBlueMatt

TheBlueMatt May 4, 2017

Contributor

Does that compile? CMainSignalsInstance() is not defined at that time, only declared.

@mchrostowski

mchrostowski May 4, 2017

Since it is only a declaration I believe it should have been fine. I see this code isn't present in the final commit so I suppose it doesn't matter either way.

@TheBlueMatt

TheBlueMatt May 4, 2017

Contributor

Ahh, yes, since the rebase on the latest version of #10179 this code no longer exists, as the scheduler has to be passed into the creation of the internals object.

src/validationinterface.cpp
+ void MaybeScheduleProcessQueue() {
+ {
+ LOCK(cs_callbacksPending);
+ // Try to avoid scheduling too many copies here, but if we
@mchrostowski

mchrostowski May 3, 2017

This comment and issue can be avoided entirely if you move line 56 up to 46.
After that lines 54 and 55 (which will be 55 and 56) can be removed.

@ryanofsky

ryanofsky May 4, 2017

Contributor

This comment and issue can be avoided entirely if you move line 56 up to 46.
After that lines 54 and 55 (which will be 55 and 56) can be removed.

I think this is right (line numbers apply to commit 8daf243). It also seems like you could eliminate the fCallbacksRunning variable if you change ProcessQueue to call pop_front after running the callback and condition the AddToProcessQueue schedule() call on the queue being previously empty.

@TheBlueMatt

TheBlueMatt May 4, 2017

Contributor

Looks like you commented on an outdated version and github wont show me full context, so I have no idea what those line numbers refer to :/

@mchrostowski

mchrostowski May 4, 2017

@ryanofsky I believe the point is to avoid duplicate calls to the scheduler since it may be multi-threaded. So a call to AddToProcessQueue should not schedule() anything if we're already scheduled; it should only schedule() if our previously scheduled function has completed execution (at least beyond the point of it calling schedule() again). Can't see implementing that without knowing if fCallbacksRunning is true.

Still, the code does not guarantee single threaded execution in its current form due to the fCallbacksRunning state being set AFTER the call to schedule(). Being that all the locks are placed appropriately we might as well make this guarantee or else it seems like a bug because the SingleThreadedClient won't be.

I'll see if I can't get a test showing this behavior.

@ryanofsky

ryanofsky May 4, 2017 edited

Contributor

@ryanofsky I believe the point is to avoid duplicate calls to the scheduler since it may be multi-threaded.

I know, this is why the second half of my suggestion was "condition the AddToProcessQueue schedule() call on the queue being previously empty." Anyway, I don't think Matt's particularly interested in these simplifications, and it's easier to communicate these changes as patches rather than english descriptions, so I'd rather just leave any simplifications to followup PRs.

Still, the code does not guarantee single threaded execution in its current form due to the fCallbacksRunning state being set AFTER the call to schedule()

The reason it works in its current form is because of the if (fCallbacksRunning) return; line at the top of ProcessQueue()

Again, I don't think the code in it's current form is the simplest it could be, but it seems safe and easy to clean up later in a followup PR. Also this whole discussion really should be moved to #10179. #10286 is only building on the changes in #10179.

@mchrostowski

mchrostowski May 4, 2017 edited

@ryanofsky I see that now, the extra check does prevent the execution.

Being that this is new code I wouldn't call it a simplification. Here's a patch of the proposed change, less logic with the same function:
scheduler.patch.txt

which reads better if you rename fCallbacksRunning to fCallbacksScheduled
and this patch:
scheduler.patch2.txt
which can be argued reduces code reuse but I think the readability is improved.

@mchrostowski

mchrostowski May 4, 2017

@TheBlueMatt If you're open to these changes in a PR to your branch I can do that, I assume they'll be squashed so either way works.

@TheBlueMatt

TheBlueMatt May 4, 2017

Contributor

@mchrostowski hmm, really, I find that it decreases readability (though that may be NIH). It looks harder to reason about whether some callbacks might accidentally get missed to me.

(Other random note, we dont use tabs in our codebase, which your patch added).

@mchrostowski

mchrostowski May 4, 2017

@TheBlueMatt Well, in that case I feel like either patch gets funky, especially since the use of fCallbacksRunning becomes inconsistent if you apply the first patch without the second (unless some alternative name for fCallbacksRunning works).

The extra safety check and inconsistency of scheduling bothers me but I wouldn't expect it to actually cause issues so I have no grounds for objection.

I think my inquiry stemmed from it not being immediately apparent that ProcessQueue() only runs once and the extra check is just an extra check. Perhaps the "not a big deal" part of the comment could be "because ProcessQueue() already checks" for clarity, I would not have looked so deeply into the code except that I thought "not a big deal" meant "sometimes interweaving calls is okay."

src/validationinterface.cpp
+ callbacksPending.pop_front();
+ }
+
+ // RAII the setting of fCallbacksRunning and calling MaybeScheduleProcessQueue
@mchrostowski

mchrostowski May 3, 2017

RAII is great and all but exists for the acquisition of resources. Why not try{} catch{}?
try { callback(); } catch(...) { { LOCK(cs_callbacksPending); fCallbacksRunning = false; } MaybeScheduleProcessQueue(); }

@ryanofsky

ryanofsky May 4, 2017

Contributor

Why not try{} catch{}?

My guess about this was that it allows the processqueue to take advantage of whatever error handling or reporting cscheduler provides, and to not have to repeat the finalization logic both inside and after the catch clause. Either approach seems fine to me, though.

@TheBlueMatt

TheBlueMatt May 4, 2017

Contributor

@laanwj previously requested that any exceptions be thrown all the way up, so this was an easier way to do that. That request seemed reasonable.

@mchrostowski

mchrostowski May 4, 2017

Understood, makes perfect sense. Didn't occur to me we'd have to duplicate the logic, spoiled by finally.

src/validationinterface.cpp
+ LOCK(instance->cs_callbacksPending);
+ instance->fCallbacksRunning = false;
+ }
+ instance->MaybeScheduleProcessQueue();
@mchrostowski

mchrostowski May 3, 2017

Is this really what we want? A scheduler call for each callback?
It does prevent starving any other scheduled tasks in case of a long queue, but it also generates a lot of lock contention which can be a performance killer.

Unless there is evidence of this queue processing messing with other scheduling I feel strongly we should avoid this design. It will be much harder to detect performance issues from lock contention than performance issues from the processing of a long queue.

I recommend replacing line 45 with a size() query, -- the size (to account for your pop), and put the entire thing in a do {} while (size > 0);. You can then avoid calling MaybeScheduleProcessQueue() inside of ProcessQueue() itself.

@ryanofsky

ryanofsky May 4, 2017

Contributor

Is this really what we want? A scheduler call for each callback?

I'm guessing the work done in any of these callbacks far outweighs the cost of scheduling one of them but I could be wrong.

@TheBlueMatt

TheBlueMatt May 4, 2017

Contributor

Right now we only have one scheduler thread, so there should be limited lock contention, not to mention anything running is gonna take much longer than any locks held (locks are only held to push/pop/whatever, which should be very, very fast).

@mchrostowski

mchrostowski May 4, 2017

Okay, sounds like blocking/starving is the biggest risk. Can't disagree with that.

Feels like a bit of wasted work to call schedule on things you know are intended to execute now, but that's just a performance (not important) concern.
A bit surprising there is only one scheduler thread when the scheduler counts threads, but indeed it is only one.

src/scheduler.h
@@ -43,6 +43,9 @@ class CScheduler
// Call func at/after time t
void schedule(Function f, boost::chrono::system_clock::time_point t);
@mchrostowski

mchrostowski May 3, 2017

You can accomplish this entire commit by changing this line to
void schedule(Function f, boost::chrono::system_clock::time_point t = boost::chrono::system_clock::now());

@TheBlueMatt

TheBlueMatt May 4, 2017

Contributor

Took this on #10179, will be here when I next rebase.

src/scheduler.h
+ * to be executed on the same thread, but no two jobs will be executed
+ * at the same time.
+ */
+class CSingleThreadedSchedulerClient {
@mchrostowski

mchrostowski May 3, 2017

This is no longer a scheduler. It has one public method, void AddToProcessQueue(std::function<void (void)> func);, which does not take any 'schedule' information.

This class is neat, more of a SingleThreadedExecutor that happens to use a scheduler to execute. Really its treating the scheduler as a thread pool.

I'm all for keeping this if it's not named 'scheduler' and if a thread pool abstraction can be extracted from CScheduler then both this class and CScheduler can use that pool for execution. Also to consider, is this used anywhere else yet or is it expected to be used?

@TheBlueMatt

TheBlueMatt May 4, 2017

Contributor

Its unlikely to be used elsewhere, but as it is more intimate with the CScheduler than the validation interface, it was abstracted out and put here. We should probably tweak up how it all works in a later PR (as we move off of the big boost threadGroup in init), but for now I'll leave it.

+ /**
+ * Blocks until the wallet state is up-to-date to /at least/ the current
+ * chain at the time this function is entered
+ * Obviously holding cs_main/cs_wallet when going into this call may cause
@mchrostowski

mchrostowski May 3, 2017

Odd comment, "Obviously... may..." is concerning enough that we should have a comment explaining how to avoid a deadlock rather than this vagueness or perhaps removing the statement altogether?

@TheBlueMatt

TheBlueMatt May 4, 2017

Contributor

Any suggestions? The statement indicates that "holding cs_main/cs_wallet may cause deadlock", this is true, deadlock is not guaranteed, but may appear, thus you should obviously never call with cs_main or cs_wallet held.

@mchrostowski

mchrostowski May 4, 2017

I'd just drop "Obviously" now that I'm more familiar with the method.

@@ -1147,6 +1153,50 @@ void CWallet::BlockDisconnected(const std::shared_ptr<const CBlock>& pblock) {
+void CWallet::BlockUntilSyncedToCurrentChain() {
@mchrostowski

mchrostowski May 3, 2017

This method is concerning. It may be that it is being used in a safe manner but the method itself is quite dangerous. Preliminary observation suggests this can be called from both the command line RPC and JSON RPC at the same time but I don't know how true this is.

Calling it from two different threads appears to be not okay, so it is "Not thread safe" and should likely be labeled as such (though I don't see this as a standard in the project codebase so maybe that's going a bit far).

@TheBlueMatt

TheBlueMatt May 4, 2017

Contributor

I'm not sure how it is not thread safe? It blocks the current thread, not any other work.

@mchrostowski

mchrostowski May 4, 2017

I thought unsafe due to the "AssertLockNotHeld" that I misunderstood. My above comment is totally wrong and can be disregarded.

src/wallet/wallet.cpp
+ initialChainTip = chainActive.Tip();
+ }
+ AssertLockNotHeld(cs_main);
+ AssertLockNotHeld(cs_wallet);
@mchrostowski

mchrostowski May 3, 2017

If these assertions need to be held for this method to execute correctly then the method cannot be thread safe as itself being called twice, in two threads, is enough to cause a failure.

@TheBlueMatt

TheBlueMatt May 4, 2017

Contributor

The AssertLockNotHeld call only fails if the current thread holds the lock, not any thread.

@mchrostowski

mchrostowski May 4, 2017

boost::thread_specific_ptr<LockStack> lockstack;
Didn't see that, makes perfect sense then.

So far looks good to me, I'm going to poke around wallet<->blockchain interaction so I can better understand the wallet.cpp changes you made before I comment further.

That said I feel like there is something fundamentally wrong with the interaction between CWallet and the blockchain (I don't even know where that code lives yet). This feels like a solution to current issues but I would hope a refactor prevents needing such frequently occurring checks and thread specific execution.

That said, it would be interesting if a single "blockchain operations" thread is needed to get the code working in a reasonable manner. GUI systems tend to need such threads because of the nature of problem which is events coming from both ends of the system (rendering thread vs input thread) which would have to, in any logical design, invert lock ordering or else excessively block.

@ryanofsky

utACK 9432174.

Changes since previous ACK were rebasing, adding developer notes, tweaking some comments, adding a check to the zmq test

doc/developer-notes.md
+
+- Wallet RPCs call BlockUntilSyncedToCurrentChain to maintain consistency with
+ `getblockchaininfo`'s state immediately prior to the call's execution. Wallet
+ RPCs who's behavior does *not* depend on the current chainstate may omit this
@ryanofsky

ryanofsky May 4, 2017

Contributor

In commit "Add a dev notes document describing the new wallet RPC blocking"

s/who's/whose

doc/developer-notes.md
+ - *Rationale*: In previous versions of Bitcoin Core, the wallet was always
+ in-sync with the chainstate (by virtue of them all being updated in the
+ same cs_main lock). In order to maintain the behavior that wallet RPCs
+ return restults as of at least the highest best-known-block an RPC
@ryanofsky

ryanofsky May 4, 2017

Contributor

In commit "Add a dev notes document describing the new wallet RPC blocking"

s/restults/results, s/best-known-block/best-known block,/

Contributor

TheBlueMatt commented May 4, 2017

@mchrostowski thanks for the review! Generally, wallet and blockchain (essentially validation.cpp's stuff) have historically been pretty tightly coupled (updated all under the same cs_main lock). This PR is a step, however small, towards decoupling that a bit. Because the wallet still relies on "is it in our mempool?" as a proxy for "is this possibly going to get confirmed/is it spendable with the result making it into my mempool", the chainstate and mempool still need to be updated in-sync and the wallet notified of the updates in a single notification, which come in in the order they happened. Still, this is much better than the wallet actually querying the mempool/validation logic directly instead of tracking the stuff it cares about out of them.

Contributor

TheBlueMatt commented May 4, 2017

Rebased on latest #10179, current master, and fixed @ryanofsky's english corrections.

Contributor

ryanofsky commented May 5, 2017 edited

utACK 2c306d7. Changes since previous were some documentation tweaks and new block calls in "Add calls to CWallet::BlockUntilSyncedToCurrentChain()", along with the rebase.

@TheBlueMatt The stated purpose of this PR is to reduce locking on cs_main so as to reducing code coupling. I see one change in this PR that actually deletes a LOCK(cs_main) which is in CWallet::InMempool(). This looks like a step in the right direction.

That said, the remaining changes seem to be all about getting the signals into a background thread. What does this gain us for decoupling?

If the purpose is to not hold cs_main from whatever call sites hit GetMainSignals() then I don't see a benefit in using new threads when we can release the lock before making the call. That is, running in another thread is not decoupling synchronization or interface dependencies. The goal of "move wallet updates out of cs_main into a background thread" seems unrelated to decoupling because "using a background thread" and "not holding cs_main" are not dependent on each other, at least in the cases I observed.

Of particular concern is the re-ordering of calls, which you're avoiding with the single threaded scheduler, but if this isn't required for decoupling then it is just added complexity and overhead.

Contributor

ryanofsky commented May 30, 2017

This needs rebase due to a minor conflict in listunspent.

src/wallet/wallet.cpp
+ // This should be exceedingly rare in regular usage, so potentially
+ // eating 100ms to retry this lock should be fine (not TRY_LOCKing
+ // here would be a lock inversion against lastBlockProcessedMutex)
+ TRY_LOCK(cs_main, mainLocked);
@ryanofsky

ryanofsky May 31, 2017

Contributor

In commit "Add CWallet::BlockUntilSyncedToCurrentChain()"

Maybe consider dropping the try-lock and replacing it with lastBlockProcessedMutex.unlock(); LOCK(cs_main); lastBlockProcessedMutex.lock();. Maybe this would be a little slower in the average case where this code runs (which is rare to begin with), but it would avoid the 100ms worst case, and make the code simpler because you could also replace the while loop and timeout with a plain cv.wait(lock, pred) call.

src/wallet/wallet.cpp
+ // If the user called invalidateblock our wait here might block
+ // forever, so we check if we're ahead of the tip (a state
+ // which should otherwise never be exposed outside of validation)
+ return this->lastBlockProcessed->nChainWork > chainActive.Tip()->nChainWork;
@ryanofsky

ryanofsky May 31, 2017

Contributor

In commit "Add CWallet::BlockUntilSyncedToCurrentChain()"

I don't understand why we would wait forever without this check. Does InvalidateBlock not trigger notifications that would lead to lastBlockProcessed being updated? And if it doesn't, shouldn't this just be fixed so the right notifications are sent?

Contributor

TheBlueMatt commented Jun 8, 2017

Rebased and rewrote CWallet::BlockUntilSyncedToCurrentChain(). Instead of the complicated fallback logic, it now just tests if it is caught up, and if it is not, it puts a callback into the CValidationInterface queue and waits for it to trigger. I wanted to avoid having this function previously, but I ended up needing it for a different branch which moves more CValidationInterface callbacks to the background and the logic is so simple, that I went ahead with it.

@ryanofsky

utACK 91aad9f. Changes since last review were rebase, style guide fixes, BlockUntilSyncedToCurrentChain rewrite.

src/validationinterface.cpp
@@ -102,6 +102,10 @@ void UnregisterAllValidationInterfaces() {
g_signals.m_internals->NewPoWValidBlock.disconnect_all_slots();
}
+void CallFunctionInValidationInterfaceQueue(std::function<void ()> func) {
+ g_signals.m_internals->m_schedulerClient.AddToProcessQueue(func);
@ryanofsky

ryanofsky Jun 8, 2017

Contributor

In commit "Add CallFunctionInQueue to wait on validation interface queue drain"

Would std::move(func)

src/wallet/wallet.cpp
+ LOCK(cs_main);
+ const CBlockIndex* initialChainTip = chainActive.Tip();
+
+ if (m_last_block_processed == initialChainTip) {
@ryanofsky

ryanofsky Jun 8, 2017

Contributor

In commit "Add CWallet::BlockUntilSyncedToCurrentChain()"

Maybe drop this check. Seems to be a special case of the check below which isn't actually more expensive.

src/wallet/wallet.cpp
+ }
+ }
+
+ std::condition_variable callbacks_done_cv;
@ryanofsky

ryanofsky Jun 8, 2017 edited

Contributor

In commit "Add CWallet::BlockUntilSyncedToCurrentChain()"

I think all the mutex/cv/lambda/looping stuff below could be replaced by:

std::promise<void> promise;
CallFunctionInValidationInterfaceQueue([&promise]() { promise.set_value(); });
promise.get_future().wait();
@ryanofsky

utACK ae0c833. Same as previous except the last 3 review suggestions are now incorporated. There is a minor conflict now with master, but since this depends on #10179 anyway, there's probably no hurry to rebase.

src/rpc/rawtransaction.cpp
+ // where a user might call sendrawtransaction with a transaction
+ // to/from their wallet, immediately call some wallet RPC, and get
+ // a stale result because callbacks have not yet been processed.
+ pwalletMain->TransactionAddedToMempool(tx);
@ryanofsky

ryanofsky Jun 19, 2017

Contributor

In commit "Fix wallet RPC race by informing wallet of tx in sendrawtransaction"

Seems like this will result in the wallet getting two TransactionAddedToMempool notifications, which is fine but might be worth noting in the comment.

Also, not asking for this change, but would another way to do this without referencing the wallet here be to release cs_main and then wait for the other notification to be processed? (Maybe using CallFunctionInValidationInterfaceQueue like in BlockUntilSyncedToCurrentChain?)

@TheBlueMatt

TheBlueMatt Jun 21, 2017

Contributor

Yes, much better to CallFunctionInValidatioInterface. Done.

@ryanofsky

utACK db1d712. Changes since last review: rebase, sendraw change, naming style updates, added std::function includes.

src/sync.cpp
@@ -156,6 +156,15 @@ void AssertLockHeldInternal(const char* pszName, const char* pszFile, int nLine,
abort();
}
+void AssertLockNotHeldInternal(const char* pszName, const char* pszFile, int nLine, void* cs)
+{
+ BOOST_FOREACH (const PAIRTYPE(void*, CLockLocation) & i, *lockstack)
@ryanofsky

ryanofsky Jun 21, 2017

Contributor

In commit "Add ability to assert a lock is not held in DEBUG_LOCKORDER"

Looks like there are travis errors compiling this code (undefined FOREACH/PAIRTYPE).

src/rpc/rawtransaction.cpp
+ CallFunctionInValidationInterfaceQueue([&promise] {
+ promise.set_value();
+ });
+ promise.get_future().wait();
@ryanofsky

ryanofsky Jun 21, 2017

Contributor

In commit "Fix wallet RPC race by waiting for callbacks in sendrawtransaction"

I think you might need to release cs_main before waiting for the promise, because the wallet handler in the notification thread will want to acquire it.

FWIW, the way I implemented this in my wallet ipc branch was to call CallFunctionInValidationInterfaceQueue the same way you are doing here, but release cs_main before the PushInventory calls, and wait for the promise after the PushInventory calls. I also updated the comment.

--- a/src/rpc/rawtransaction.cpp
+++ b/src/rpc/rawtransaction.cpp
@@ -29,6 +29,7 @@
 #include "wallet/wallet.h"
 #endif
 
+#include <future>
 #include <stdint.h>
 
 #include <univalue.h>
@@ -846,7 +847,6 @@ UniValue sendrawtransaction(const JSONRPCRequest& request)
             + HelpExampleRpc("sendrawtransaction", "\"signedhex\"")
         );
 
-    LOCK(cs_main);
     RPCTypeCheck(request.params, {UniValue::VSTR, UniValue::VBOOL});
 
     // parse hex string from parameter
@@ -860,6 +860,9 @@ UniValue sendrawtransaction(const JSONRPCRequest& request)
     if (request.params.size() > 1 && request.params[1].get_bool())
         nMaxRawTxFee = 0;
 
+    std::promise<void> sent_notifications;
+    {
+    LOCK(cs_main);
     CCoinsViewCache &view = *pcoinsTip;
     bool fHaveChain = false;
     for (size_t o = 0; !fHaveChain && o < tx->vout.size(); o++) {
@@ -881,15 +884,6 @@ UniValue sendrawtransaction(const JSONRPCRequest& request)
                 }
                 throw JSONRPCError(RPC_TRANSACTION_ERROR, state.GetRejectReason());
             }
-#ifdef ENABLE_WALLET
-        } else {
-            // If wallet is enabled, ensure that the wallet has been made aware
-            // of the new transaction prior to returning. This prevents a race
-            // where a user might call sendrawtransaction with a transaction
-            // to/from their wallet, immediately call some wallet RPC, and get
-            // a stale result because callbacks have not yet been processed.
-            pwalletMain->TransactionAddedToMempool(tx);
-#endif
         }
     } else if (fHaveChain) {
         throw JSONRPCError(RPC_TRANSACTION_ALREADY_IN_CHAIN, "transaction already in block chain");
@@ -897,11 +891,22 @@ UniValue sendrawtransaction(const JSONRPCRequest& request)
     if(!g_connman)
         throw JSONRPCError(RPC_CLIENT_P2P_DISABLED, "Error: Peer-to-peer functionality missing or disabled");
 
+    CallFunctionInValidationInterfaceQueue([&sent_notifications] { sent_notifications.set_value(); });
+    } // LOCK(cs_main)
+
     CInv inv(MSG_TX, hashTx);
     g_connman->ForEachNode([&inv](CNode* pnode)
     {
         pnode->PushInventory(inv);
     });
+
+    // Wait for any TransactionAddedToMempool notifications sent above to be
+    // processed by thw wallet. This prevents a race where a user might call
+    // sendrawtransaction with a transaction to/from their wallet, immediately
+    // call some wallet RPC, and get a stale result because callbacks have not
+    // yet been processed.
+    sent_notifications.get_future().wait();
+
     return hashTx.GetHex();
 }
@TheBlueMatt

TheBlueMatt Jun 21, 2017

Contributor

Heh, beat you to it.

@ryanofsky

utACK e361774. Only change since last review was sendraw locking fix.

src/rpc/rawtransaction.cpp
+ CallFunctionInValidationInterfaceQueue([&promise] {
+ promise.set_value();
+ });
+ promise.get_future().wait();
@ryanofsky

ryanofsky Jun 21, 2017

Contributor

In commit "Fix wallet RPC race by waiting for callbacks in sendrawtransaction"

This seems right. Possible tweaks:

  • should_wait_on_validationinterface seems like it is always true at this point, maybe the variable is not needed.
  • Might be better to call CallFunctionInValidationInterfaceQueue before releasing cs_main, to avoid waiting for notifications that might be queued in the meantime that we don't care about.
  • Might be cheaper / simpler just to hold on to the tx reference (remove std::move) instead of copying the hash to a temporary variable.
  • Maybe release cs_main before the PushInventory loop.
@ryanofsky

utACK 68201ec. Just AssertLockNotHeldInternal foreach fix and sendrawtransaction tweaks since last review.

+
+ } // cs_main
+
+ promise.get_future().wait();
@ryanofsky

ryanofsky Jun 22, 2017

Contributor

In commit "Fix wallet RPC race by waiting for callbacks in sendrawtransaction"

Might be more efficient to wait for the promise after the PushInventory calls so they aren't blocked waiting for wallets.

@TheBlueMatt

TheBlueMatt Jun 22, 2017

Contributor

I figured put it above the !g_connman check to make sure we block even if at some point in the future we support running without net/connman. Dont feel strongly either way, though

laanwj added the Wallet label Jun 28, 2017

TheBlueMatt added some commits Jan 20, 2017

@TheBlueMatt TheBlueMatt Add a CValidationInterface::TransactionRemovedFromMempool
This is currently unused, but will by used by wallet to cache when
transactions are in the mempool, obviating the need for calls to
mempool from CWalletTx::InMempool()
3e3dfec
@TheBlueMatt TheBlueMatt Add ability to assert a lock is not held in DEBUG_LOCKORDER a000c66
@TheBlueMatt TheBlueMatt Add CWallet::BlockUntilSyncedToCurrentChain()
This blocks until the wallet has synced up to the current height.
71f168a
@TheBlueMatt TheBlueMatt Use callbacks to cache whether wallet transactions are in mempool
This avoid calling out to mempool state during coin selection,
balance calculation, etc. In the next commit we ensure all wallet
callbacks from CValidationInterface happen in the same queue,
serialized with each other. This helps to avoid re-introducing one
of the issues described in #9584 [1] by further disconnecting
wallet from current chain/mempool state.

Thanks to @morcos for the suggestion to do this.

Note that there are several race conditions introduced here:

 * If a user calls sendrawtransaction from RPC, adding a
   transaction which is "trusted" (ie from them) and pays them
   change, it may not be immediately used by coin selection until
   the notification callbacks finish running. No such race is
   introduced in normal transaction-sending RPCs as this case is
   explicitly handled.

 * Until BlockConnectedDisconnected and TransactionAddedToMempool
   calls also run in the CSceduler background threa, there is a
   race where TransactionAddedToMempool might be called after a
   BlockConnectedDisconnected call happens.

 * Wallet will write a new best chain from the SetBestChain
   callback prior to having processed the transaction from that
   block.

[1] "you could go to select coins, need to use 0-conf change, but
such 0-conf change may have been included in a block who's
callbacks have not yet been processed - resulting in thinking they
are not in mempool and, thus, not selectable."
b9bd729
@TheBlueMatt TheBlueMatt Call TransactionRemovedFromMempool in the CScheduler thread
This is both good practice (we want to move all such callbacks
into a background thread eventually) and prevents a lock inversion
when we go to use this in wallet (mempool.cs->cs_wallet and
cs_wallet->mempool.cs would otherwise both be used).
0320be0
@TheBlueMatt TheBlueMatt Add CallFunctionInQueue to wait on validation interface queue drain c83ae72
@TheBlueMatt TheBlueMatt Add calls to CWallet::BlockUntilSyncedToCurrentChain() in RPCs
This prevents the wallet-RPCs-return-stale-info issue from being
re-introduced when new-block callbacks no longer happen in the
block-connection cs_main lock
15e5d05
@TheBlueMatt TheBlueMatt Fix zmq tests now that txn/blocks are unordered 8e1d03b
@TheBlueMatt TheBlueMatt Add a dev notes document describing the new wallet RPC blocking 964f376
@TheBlueMatt TheBlueMatt Also call other wallet notify callbacks in scheduler thread
This runs BlockConnectedDisconnected, TransactionAddedToMempool,
SetBestChain, and Inventory on the background scheduler thread.

This partially reverts #9583, re-enabling some of the gains from
 #7946. This does not, however, re-enable the gains achieved by
repeatedly releasing cs_main between each transaction processed.
62e416d
@TheBlueMatt TheBlueMatt Fix wallet RPC race by waiting for callbacks in sendrawtransaction 5d4a81a
Contributor

TheBlueMatt commented Jul 11, 2017

Rebased. Would be nice to get this in early in 16 to let it simmer on master.

@ryanofsky

utACK 964f376. No changes since last review, only rebase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment