Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add balances cache / GUI: use a signal instead of a poll thread #10251

Closed
wants to merge 3 commits into
base: master
from

Conversation

Projects
None yet
6 participants
@jonasschnelli
Copy link
Member

jonasschnelli commented Apr 21, 2017

There are two major parts in this PR.

A) wallet: introduce a per-balance-type cache

Additionally to the per CWalletTx debit, etc. caches, this adds an atomic cache for each balance type (available, immature, watchonly, etc.).
If the balance has been cached, no lock will be acquired when calling Get*Balance().
As always with caches, the problematic parts is where to invalidate it (that is why there are some calls to MarkBalancesDirty()).

B) Signal for balance changes

This PR exposes a new wallet signal that will signal the GUI when a balance change had happened (instead of the 250ms polling timer).

@laanwj

This comment has been minimized.

Copy link
Member

laanwj commented Apr 21, 2017

Concept ACK, nice!

src/wallet/wallet.cpp Outdated

CAmount CWallet::GetBalance() const
{
CAmount nTotal = 0;
if (!fBalancesDirty) {
return nBalanceCache;

This comment has been minimized.

@ryanofsky

ryanofsky Apr 21, 2017

Contributor

I think think you need to acquire cs_wallet lock here, otherwise this could return a partial sum if GetBalance got called from two threads at the same time and one thread returned nBalanceCache while the other thread was in the middle of the nBalanceCache += loop.

Alternately, you could bring back the nTotal local variable and just set nBalanceCache = nTotal atomically.

This comment has been minimized.

@jonasschnelli

jonasschnelli Apr 21, 2017

Author Member

Oh.. good point.
I'll re-add the temp variable then.

src/wallet/wallet.cpp Outdated
return nBalanceCache;
}

nBalanceCache = 0;

This comment has been minimized.

@ryanofsky

ryanofsky Apr 21, 2017

Contributor

I think you need to move this line after the LOCK2(cs_main, cs_wallet); line. Otherwise if this method is called from two different threads simultaneously, this line could zero out a balance which is in the middle of being added up by the other thread.

Or alternately, delete this line and and set nBalanceCache = nTotal atomically after the loop.

src/wallet/wallet.h Outdated
std::atomic<bool> fBalancesDirty;

/** Notification for balance changes */
boost::signals2::signal<void ()> BalancesDidChange;

This comment has been minimized.

@ryanofsky

ryanofsky Apr 21, 2017

Contributor

Maybe call it BalancesChanged instead of BalancesDidChange to be consistent with the naming of other signals (StatusChanged, AddressBookChanged, etc).

This comment has been minimized.

@jonasschnelli

jonasschnelli Apr 21, 2017

Author Member

Yes. I can do that. I kinda like the signal naming convention from apple. XYWillChange, XYDidChange, etc.: because it allows the listener to know (without reading to much code) if the event was synchronous or has triggered a process with a later update. But lets not overdo it here,.. will change therefore.

@jonasschnelli jonasschnelli force-pushed the jonasschnelli:2017/04/gui_rm_locks branch Apr 21, 2017

@jonasschnelli

This comment has been minimized.

Copy link
Member Author

jonasschnelli commented Apr 21, 2017

Force pushed fixes for issues found by @ryanofsky.

@ryanofsky
Copy link
Contributor

ryanofsky left a comment

See other comments, but it looks like the caching here is broken. The Qt changes seem fine but I think would be easier to understand as 1 commit instead of 3.

src/wallet/wallet.h Outdated
@@ -780,6 +780,7 @@ class CWallet : public CCryptoKeyStore, public CValidationInterface
nRelockTime = 0;
fAbortRescan = false;
fScanningWallet = false;
fBalancesDirty = true;

This comment has been minimized.

@ryanofsky

ryanofsky May 1, 2017

Contributor

In commit "Add atomic cache for balances"

The whole caching portion of this PR actually seems broken because fBalancesDirty is initialized to true but never set to false anywhere. I think this can be fixed, but maybe simplest thing to do would be to drop the caching, and just keep the signalling part of this PR.

This comment has been minimized.

@jonasschnelli

jonasschnelli May 4, 2017

Author Member

Oh. Yes indeed. I now switched to a cache-invalidation-atomic per balance type and set it to false once the cache has been built.

@@ -137,12 +136,6 @@ void WalletModel::updateBalance()
}
}

void WalletModel::updateTransaction()

This comment has been minimized.

@ryanofsky

ryanofsky May 1, 2017

Contributor

In commit "[Qt] remove unused polling code"

I think it would be good to squash this commit into "[Qt] use the BalancesChanged signal instead of a 250ms poll timer." It's confusing to see code that checks the fForceCheckBalanceChanged variable and some code that sets it removed, while other code that sets it sticks around.

This comment has been minimized.

@jonasschnelli

jonasschnelli May 4, 2017

Author Member

Good point. Squashed those two commits together.

@jonasschnelli jonasschnelli force-pushed the jonasschnelli:2017/04/gui_rm_locks branch May 4, 2017

@jonasschnelli

This comment has been minimized.

Copy link
Member Author

jonasschnelli commented May 4, 2017

Force push fixed @ryanofsky points.

@ryanofsky
Copy link
Contributor

ryanofsky left a comment

utACK a6903788aa105bd26dc6bd5a2784db5109467848

src/wallet/wallet.cpp Outdated
@@ -1914,13 +1935,18 @@ CAmount CWallet::GetBalance() const
if (pcoin->IsTrusted())
nTotal += pcoin->GetAvailableCredit();
}
nBalanceCache = nTotal;
fBalancesDirty = false;

This comment has been minimized.

@ryanofsky

ryanofsky May 4, 2017

Contributor

In commit "Add atomic cache for balances"

Maybe rename fBalancesDirty to fBalanceDirty since this only applies to the balance returned from CWallet::GetBalance and not the other balances.

@jonasschnelli jonasschnelli force-pushed the jonasschnelli:2017/04/gui_rm_locks branch May 4, 2017

@jonasschnelli

This comment has been minimized.

Copy link
Member Author

jonasschnelli commented May 4, 2017

Force push fixed @ryanofsky's nit (and also added the "other balances" atomics to SetNull()).

@ryanofsky

This comment has been minimized.

Copy link
Contributor

ryanofsky commented May 5, 2017

ACK 84904a9d8ac5e98558b366bc69338d4ad07dc2f7.

Changes since last review were those Jonas mentioned. Seeing the missing initializations in SetNull() was a little disconcerting, because I would have expected tests to catch errors they would cause. Actually one test did fail (wallet_tests/rescan) but I guess I didn't notice it.

@jonasschnelli jonasschnelli force-pushed the jonasschnelli:2017/04/gui_rm_locks branch Jun 14, 2017

@jonasschnelli

This comment has been minimized.

Copy link
Member Author

jonasschnelli commented Jun 14, 2017

Needed a trivial rebase (BOOST_FOREACH -> for change).

@kallewoof
Copy link
Member

kallewoof left a comment

utACK 9ca1140a19d3d39b86d7a6d6dc7ed907fcea2a88 (my QT skills are limited though)

src/wallet/wallet.cpp Outdated
}
MarkBalancesDirty();

LOCK(cs_wallet);

This comment has been minimized.

@kallewoof

kallewoof Jun 14, 2017

Member

Am I mistaken when I think that the lock on cs_wallet will actually happen before the call to MarkBalancesDirty() due to scope? If so I would swap these lines for clarity.

This comment has been minimized.

@jonasschnelli

jonasschnelli Jun 14, 2017

Author Member

I guess I should move the LOCK() up to the top. Because its RAII, I think it doesn't matter in this case (no extra block). The lock should take affect when we enter the function/scope (before MarkBalancesDirty()). But maybe I'm wrong.

@sipa

This comment has been minimized.

Copy link
Member

sipa commented Jun 14, 2017

@kallewoof No, the lock on cs_wallet will happen after MarkBalancesDirty

@sipa

This comment has been minimized.

Copy link
Member

sipa commented Jun 14, 2017

But maybe I'm wrong

You're wrong.

jonasschnelli added some commits Apr 21, 2017

@jonasschnelli jonasschnelli force-pushed the jonasschnelli:2017/04/gui_rm_locks branch to 187b3ee Jun 14, 2017

@jonasschnelli

This comment has been minimized.

Copy link
Member Author

jonasschnelli commented Jun 14, 2017

But maybe I'm wrong

You're wrong.

Okay. Then I'll better be quite.
Moved up the LOCK().

@ryanofsky
Copy link
Contributor

ryanofsky left a comment

utACK 187b3ee. Only change since last review was acquiring wallet lock before the one markdirty call.

@laanwj

This comment has been minimized.

Copy link
Member

laanwj commented Jun 15, 2017

I'm trying to understand whether this will introduce a hang in the GUI thread in some cases.

Right now, the recurrent poll timer (not thread! it's a qt timer which runs in the GUI thread) updates the balance if it's possible to get the locks using TRY_LOCK. Then it will get them, once, and process all balances.

After this change the locks are no longer TRY, so the BalancesChanged signal will always cause the GUI thread to take the wallet locks (through CWallet::get*Balance()). Not once, but up to once per type of balance, if they all turn out to be dirty (which is likely).
Taking the cs_main and cs_wallet lock up to 6 times can take ages when the cs_main lock is contended: during initial sync, catching up. This all happens in the GUI thread (at the receiving end of updateBalance signal).

Maybe I'm misunderstanding the code, but if I'm right about this, I'm not convinced that this is an improvement in user experience.

@jonasschnelli

This comment has been minimized.

Copy link
Member Author

jonasschnelli commented Jun 16, 2017

Some of the current heavy "freezes" are happening because of the balance poll thread... it seems to be a significant CPU eater.
Right now, we TRY_LOCK every 250ms (which often LOCKS successfully and then iterates (multiple times) through the complete mapWallet).

bildschirmfoto 2017-06-16 um 14 00 59

@jonasschnelli

This comment has been minimized.

Copy link
Member Author

jonasschnelli commented Jun 16, 2017

Here the complete profiling (master):

It seems like that the pollBalanceThread took 50% of the complete CPU time for the measured run.
(30 seconds, opening debug window and 10-20 times sendtoaddress)

bildschirmfoto 2017-06-16 um 14 09 00

@jonasschnelli

This comment has been minimized.

Copy link
Member Author

jonasschnelli commented Jun 16, 2017

I testes again with this PR and the responsiveness is much better. @laanwj points make sense to me.

Could it be, that the TRY_LOCK every 250ms will result in getting the LOCK very often and then do the heavy balance calculation on the main thread because QTimer runs on the Main/GUI thread?

@ryanofsky

This comment has been minimized.

Copy link
Contributor

ryanofsky commented Jun 16, 2017

Could it be, that the TRY_LOCK every 250ms will result in getting the LOCK very often and then do the heavy balance calculation on the main thread because QTimer runs on the Main/GUI thread?

That's what I would expect. Have you thought about keeping the cache but bringing back the polling and try lock to address laanwj's point?

@jonasschnelli jonasschnelli force-pushed the jonasschnelli:2017/04/gui_rm_locks branch from 187b3ee to 8688671 Jun 16, 2017

@jonasschnelli

This comment has been minimized.

Copy link
Member Author

jonasschnelli commented Jun 16, 2017

@ryanofsky I profiled this PR with only the first commit (atomic caches, keep the polling).
The results are better, but not as good as with the signal/non-polling approach:

Test run: Startup, call sendtoaddress(getnewaddress(), 1) 20 times manually

Master with only the first commit (pure atomic caches):

The polling thread gets measured (through significants) in my profiler with ~33% (very high IMO)
bildschirmfoto 2017-06-16 um 14 58 42

Master with this PR

The balance updates seems no longer be significant:
bildschirmfoto 2017-06-16 um 15 04 14

@ryanofsky

This comment has been minimized.

Copy link
Contributor

ryanofsky commented Jun 21, 2017

Unclear to me if @laanwj's concern about losing the try lock is sufficiently addressed with the new testing and discussion at https://botbot.me/freenode/bitcoin-core-dev/msg/87346236/. If it is, maybe this PR is close to ready for merge (so far two utACKs from kallewoof and me). If not, it seems like there is some advantage (50% -> 30% poll thread cpu decrease) and no downside to just merging the first commit for now?

@jonasschnelli

This comment has been minimized.

Copy link
Member Author

jonasschnelli commented Jun 21, 2017

I guess it would be great if someone with (g)perf experience on Linux/Ubuntu and maybe on Windows could benchmark this PR against master.

@ryanofsky

This comment has been minimized.

Copy link
Contributor

ryanofsky commented Jun 21, 2017

Linux and windows performance tests seem like they would be a good sanity check on the numbers you collected (50% cpu usage spent on balance computation on master, 33% with caching commit, and ~0% with full pr), but is there some reason to think this behavior would be dependent on the platform?

I guess the main thing I'm not clear on is whether the results you collected (assuming they do hold up on other platforms) address laanwj's concern above, or if there is some kind of different testing that needs to be done, or some other safeguard that should be added to prevent the possibility of new hangs.

@promag

This comment has been minimized.

Copy link
Member

promag commented Jul 20, 2017

@jonasschnelli if still relevant I can do it.

@jonasschnelli

This comment has been minimized.

Copy link
Member Author

jonasschnelli commented Jul 20, 2017

@promag: A benchmark would be very good to have.
Ideally you compare master against this PR and against only the first commit in this PR (3 versions).
Thanks!

@jonasschnelli

This comment has been minimized.

Copy link
Member Author

jonasschnelli commented Oct 5, 2017

Closing for now...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.