Remove UTXO cache entries when the tx they were added for is removed/does not enter mempool #6872

Merged
merged 6 commits into from Dec 2, 2015

Conversation

Projects
None yet
9 participants
@TheBlueMatt
Contributor

TheBlueMatt commented Oct 22, 2015

No description provided.

@pstratem pstratem referenced this pull request Oct 23, 2015

Closed

Apparent memory leak #6876

@sipa

View changes

src/coins.cpp
+void CCoinsViewCache::Uncache(const uint256& hash)
+{
+ CCoinsMap::iterator it = cacheCoins.find(hash);
+ if (it != cacheCoins.end() && it->second.flags == 0)

This comment has been minimized.

@sipa

sipa Oct 26, 2015

Member

You can do slightly better:
if (it != cacheCoins.end() && (((it->second.flags & CCoinsCacheEntry::DIRTY) == 0) || ((it->second.flags & CCoinsCacheEntry::FRESH) && it->second.coins.IsPruned())))

This will also allow removing entries that were created and spent entirely within a cache.

@sipa

sipa Oct 26, 2015

Member

You can do slightly better:
if (it != cacheCoins.end() && (((it->second.flags & CCoinsCacheEntry::DIRTY) == 0) || ((it->second.flags & CCoinsCacheEntry::FRESH) && it->second.coins.IsPruned())))

This will also allow removing entries that were created and spent entirely within a cache.

This comment has been minimized.

@TheBlueMatt

TheBlueMatt Oct 26, 2015

Contributor

Such an entry should probably be removed without a call to Uncache, no?

@TheBlueMatt

TheBlueMatt Oct 26, 2015

Contributor

Such an entry should probably be removed without a call to Uncache, no?

This comment has been minimized.

@sipa

sipa Oct 26, 2015

Member

Uh, right, and it already is... never mind!

@sipa

sipa Oct 26, 2015

Member

Uh, right, and it already is... never mind!

@sipa

View changes

src/main.cpp
if (!view.HaveCoins(txin.prevout.hash)) {
if (pfMissingInputs)
*pfMissingInputs = true;
return false; // fMissingInputs and !state.IsInvalid() is used to detect this condition, don't set state.Invalid()
- }
+ } else if (fMaybeUncache)

This comment has been minimized.

@sipa

sipa Oct 26, 2015

Member

Even if HaveCoins returns false, the entry may have been pulled into the cache (the parent cache may have a pruned entry), so this probably works better outside of the else branch.

@sipa

sipa Oct 26, 2015

Member

Even if HaveCoins returns false, the entry may have been pulled into the cache (the parent cache may have a pruned entry), so this probably works better outside of the else branch.

@sipa

View changes

src/main.cpp
+ size_t mempoolUsage = pool.DynamicMemoryUsage();
+ size_t coinsUsage = pcoinsTip->DynamicMemoryUsage();
+
+ while (mempoolUsage > limit || coinsUsage + mempoolUsage > nCoinCacheUsage + limit) {

This comment has been minimized.

@sipa

sipa Oct 26, 2015

Member

If coinsUsage is larger than nCoinCacheUsage + limit, this results in an infinite loop?

@sipa

sipa Oct 26, 2015

Member

If coinsUsage is larger than nCoinCacheUsage + limit, this results in an infinite loop?

This comment has been minimized.

@TheBlueMatt

TheBlueMatt Oct 26, 2015

Contributor

Huh? I could have sworn I deleted the loop as it is too easy for an attacker to fuck with your mempool, even if it did work.

@TheBlueMatt

TheBlueMatt Oct 26, 2015

Contributor

Huh? I could have sworn I deleted the loop as it is too easy for an attacker to fuck with your mempool, even if it did work.

This comment has been minimized.

@sipa

sipa Oct 26, 2015

Member

Yeah, I think that loop and the link with coincache size should go away. The coincache size shouldn't exceed its own limit, and the mempool shouldn't exceed its own limit. We may want to move to a model where the size for both is set using a single limit, but that will require other changes elsewhere too.

@sipa

sipa Oct 26, 2015

Member

Yeah, I think that loop and the link with coincache size should go away. The coincache size shouldn't exceed its own limit, and the mempool shouldn't exceed its own limit. We may want to move to a model where the size for both is set using a single limit, but that will require other changes elsewhere too.

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Oct 26, 2015

Member

Do you feel like pulling in sipa@4186ac9 ?

Member

sipa commented Oct 26, 2015

Do you feel like pulling in sipa@4186ac9 ?

@TheBlueMatt

This comment has been minimized.

Show comment
Hide comment
@TheBlueMatt

TheBlueMatt Oct 26, 2015

Contributor

sipa@4186ac9 seems pretty nasty to me....completely clearing the cache when we add a transaction that pushes our cache over 90% usage seems like a really bad idea.

Contributor

TheBlueMatt commented Oct 26, 2015

sipa@4186ac9 seems pretty nasty to me....completely clearing the cache when we add a transaction that pushes our cache over 90% usage seems like a really bad idea.

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Oct 26, 2015

Member
Member

sipa commented Oct 26, 2015

@TheBlueMatt

This comment has been minimized.

Show comment
Hide comment
@TheBlueMatt

TheBlueMatt Oct 26, 2015

Contributor

Wiping the cache sounds like a dirty hack to me. I'd rather evict random elements until the cache is reasonably-sized than that.

Contributor

TheBlueMatt commented Oct 26, 2015

Wiping the cache sounds like a dirty hack to me. I'd rather evict random elements until the cache is reasonably-sized than that.

@TheBlueMatt

This comment has been minimized.

Show comment
Hide comment
@TheBlueMatt

TheBlueMatt Oct 26, 2015

Contributor

Also, we should fix it to not dump the entire cache after each new block.

Contributor

TheBlueMatt commented Oct 26, 2015

Also, we should fix it to not dump the entire cache after each new block.

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Oct 26, 2015

Member
Member

sipa commented Oct 26, 2015

@TheBlueMatt

This comment has been minimized.

Show comment
Hide comment
@TheBlueMatt

TheBlueMatt Oct 26, 2015

Contributor

Sure, making the cache smarter would be nice, but lets not rush to fix it with a dirty hack when we're not even gonna ship the fix for a few months anyway.

Contributor

TheBlueMatt commented Oct 26, 2015

Sure, making the cache smarter would be nice, but lets not rush to fix it with a dirty hack when we're not even gonna ship the fix for a few months anyway.

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Oct 26, 2015

Member
Member

sipa commented Oct 26, 2015

@morcos

This comment has been minimized.

Show comment
Hide comment
@morcos

morcos Oct 26, 2015

Member

Have you guys looked into how expensive all this work is?

@sipa Once we've made it so that you can't bloat memory usage with txs that aren't accepted (as accomplished by this pull) then it seems to me the right approach is to just set your maxmempool size low enough that the unknown and variable multiple of that that is the size of your cache is also small enough for you. IMHO, the default dbcache size is too small. I'm with @TheBlueMatt that it seems bad to randomly wipe it unless we think that'll be very rare.

I'd actually argue we might not even need to remove cache entries that are due to evicted tx's, because there is a relay cost that is paid to create those.

Member

morcos commented Oct 26, 2015

Have you guys looked into how expensive all this work is?

@sipa Once we've made it so that you can't bloat memory usage with txs that aren't accepted (as accomplished by this pull) then it seems to me the right approach is to just set your maxmempool size low enough that the unknown and variable multiple of that that is the size of your cache is also small enough for you. IMHO, the default dbcache size is too small. I'm with @TheBlueMatt that it seems bad to randomly wipe it unless we think that'll be very rare.

I'd actually argue we might not even need to remove cache entries that are due to evicted tx's, because there is a relay cost that is paid to create those.

@sdaftuar

View changes

src/main.cpp
@@ -830,13 +842,17 @@ bool AcceptToMemoryPool(CTxMemPool& pool, CValidationState &state, const CTransa
view.SetBackend(viewMemPool);
// do we already have it?
- if (view.HaveCoins(hash))
+ if (view.HaveCoins(hash)) {
+ vHashTxnToUncache.push_back(hash);

This comment has been minimized.

@sdaftuar

sdaftuar Oct 26, 2015

Member

If I understand correctly, then we might remove this transaction from the cache even though there may be in-mempool spends of it, right? I think we want some kind of check like you have in TrimToSize, where you only uncache if there's nothing in the mempool that spends it.

(Alternatively, if HaveCoins somehow reported whether any new caching happened, that might be helpful, but that seems like a harder change to make.)

@sdaftuar

sdaftuar Oct 26, 2015

Member

If I understand correctly, then we might remove this transaction from the cache even though there may be in-mempool spends of it, right? I think we want some kind of check like you have in TrimToSize, where you only uncache if there's nothing in the mempool that spends it.

(Alternatively, if HaveCoins somehow reported whether any new caching happened, that might be helpful, but that seems like a harder change to make.)

@TheBlueMatt

This comment has been minimized.

Show comment
Hide comment
@TheBlueMatt

TheBlueMatt Oct 27, 2015

Contributor

Sure, if the eventual solution is to tweak FlushStateToDisk to be
smarter (LRU or some priority metric based on what's in mempool), then
that would be reasonable. I am, however, worried we'd merge it and
forget to tweak the coins cache, making it even easier (maybe, though
maybe there are easier ways anyway) for an attacker to cause lots of
random disk I/O.

On 10/26/15 05:17, Pieter Wuille wrote:

It's orthogonal. We need to enforce memory usage limits after each
transaction, so let's call the function to do that at that time.

How that function works is a different matter, and can be arbitrarily
improved later.

I don't understand your resistance here. I think it's the only solution.


Reply to this email directly or view it on GitHub
#6872 (comment).

Contributor

TheBlueMatt commented Oct 27, 2015

Sure, if the eventual solution is to tweak FlushStateToDisk to be
smarter (LRU or some priority metric based on what's in mempool), then
that would be reasonable. I am, however, worried we'd merge it and
forget to tweak the coins cache, making it even easier (maybe, though
maybe there are easier ways anyway) for an attacker to cause lots of
random disk I/O.

On 10/26/15 05:17, Pieter Wuille wrote:

It's orthogonal. We need to enforce memory usage limits after each
transaction, so let's call the function to do that at that time.

How that function works is a different matter, and can be arbitrarily
improved later.

I don't understand your resistance here. I think it's the only solution.


Reply to this email directly or view it on GitHub
#6872 (comment).

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Oct 27, 2015

Member

I don't think it's worse than what we have. The cache will already be flushed on the next block.

Member

sipa commented Oct 27, 2015

I don't think it's worse than what we have. The cache will already be flushed on the next block.

@TheBlueMatt

This comment has been minimized.

Show comment
Hide comment
@TheBlueMatt

TheBlueMatt Oct 27, 2015

Contributor

Depends on which you find worse, memory usage or DB I/O. Because its easy to optimize for loading lots of crap into the cache, an attacker can obviously tickle the cache to make it do more DB I/O with sipa@4186ac9, but can cause an equivalent increase in memory usage without.

Contributor

TheBlueMatt commented Oct 27, 2015

Depends on which you find worse, memory usage or DB I/O. Because its easy to optimize for loading lots of crap into the cache, an attacker can obviously tickle the cache to make it do more DB I/O with sipa@4186ac9, but can cause an equivalent increase in memory usage without.

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Oct 27, 2015

Member

I think we should move to a model of thinking where exceeding some reasonable but fixed memory limit is equivalent in badness to crashing. People need to be able to rely on the memory usage to remain bounded. So if it's a choice between more I/O and that, I think the first is far better.

Member

sipa commented Oct 27, 2015

I think we should move to a model of thinking where exceeding some reasonable but fixed memory limit is equivalent in badness to crashing. People need to be able to rely on the memory usage to remain bounded. So if it's a choice between more I/O and that, I think the first is far better.

@TheBlueMatt

This comment has been minimized.

Show comment
Hide comment
@TheBlueMatt

TheBlueMatt Oct 27, 2015

Contributor

I think thats a fine idea...except that we're so far away from it, it seems a bit premature to be optimizing for it now.

Contributor

TheBlueMatt commented Oct 27, 2015

I think thats a fine idea...except that we're so far away from it, it seems a bit premature to be optimizing for it now.

@dcousens

This comment has been minimized.

Show comment
Hide comment
@dcousens

dcousens Oct 27, 2015

Contributor

concept ACK

Contributor

dcousens commented Oct 27, 2015

concept ACK

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Oct 27, 2015

Member

Observation:

2015-10-27 20:49:07 dd310afee248d4699e462e5d3826e879cc19d1d2d34e9de0070811d1d5f6e64a from peer=15 was not accepted: rate limited free transaction (code 66)
2015-10-27 20:49:07 ATMT increases coincache by 159 (to 614694) KiB

So when ATMT rejects due to rate limiting, the coincache entries pulled in don't seem to be removed.

Member

sipa commented Oct 27, 2015

Observation:

2015-10-27 20:49:07 dd310afee248d4699e462e5d3826e879cc19d1d2d34e9de0070811d1d5f6e64a from peer=15 was not accepted: rate limited free transaction (code 66)
2015-10-27 20:49:07 ATMT increases coincache by 159 (to 614694) KiB

So when ATMT rejects due to rate limiting, the coincache entries pulled in don't seem to be removed.

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Oct 27, 2015

Member

Observation:

2015-10-27 20:52:57 a94894230e8c647d11dc67c2f7d56e91913aab8c6b3f7257f36d2cea30ddfbeb from peer=5 was not accepted: non-mandatory-script-verify-flag (No error) (code 64)
2015-10-27 20:52:57 ATMT increases coincache by 329 (to 991099) KiB
Member

sipa commented Oct 27, 2015

Observation:

2015-10-27 20:52:57 a94894230e8c647d11dc67c2f7d56e91913aab8c6b3f7257f36d2cea30ddfbeb from peer=5 was not accepted: non-mandatory-script-verify-flag (No error) (code 64)
2015-10-27 20:52:57 ATMT increases coincache by 329 (to 991099) KiB
@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Oct 27, 2015

Member

My observations are likely explained by this patch not updating coinsCachedUsage, so the pcoinsTip->DynamicMemoryUsage() going off the charts.

Member

sipa commented Oct 27, 2015

My observations are likely explained by this patch not updating coinsCachedUsage, so the pcoinsTip->DynamicMemoryUsage() going off the charts.

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Oct 27, 2015

Member

New observation:

2015-10-27 21:27:43 cd55b134090117cc7fbf4ad8d1039642f1dd63460daa27c59d1c0ae2aba3e582 from peer=14 was not accepted: mempool min fee not met, 15000 < 24788 (code 66)
2015-10-27 21:27:43 ATMT decreases coincache by 105 (to 67916) KiB
Member

sipa commented Oct 27, 2015

New observation:

2015-10-27 21:27:43 cd55b134090117cc7fbf4ad8d1039642f1dd63460daa27c59d1c0ae2aba3e582 from peer=14 was not accepted: mempool min fee not met, 15000 < 24788 (code 66)
2015-10-27 21:27:43 ATMT decreases coincache by 105 (to 67916) KiB

@laanwj laanwj added the Mempool label Oct 29, 2015

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Oct 30, 2015

Member

Tested ACK.

Member

sipa commented Oct 30, 2015

Tested ACK.

@morcos

This comment has been minimized.

Show comment
Hide comment
@morcos

morcos Oct 30, 2015

Member

I'm a bit hesitant to merge this instead of a cleaner way of smart flushing and I'd still like to see some performance analysis.

Member

morcos commented Oct 30, 2015

I'm a bit hesitant to merge this instead of a cleaner way of smart flushing and I'd still like to see some performance analysis.

@gmaxwell gmaxwell added this to the 0.12.0 milestone Nov 5, 2015

@gmaxwell

This comment has been minimized.

Show comment
Hide comment
@gmaxwell

gmaxwell Nov 8, 2015

Member

Without this, my mining node with an elevated relay fee (and running mempool limiting) and default dbcache is at 4GB RES. We do need to do more to control the size of the coins cache, if not precisely this change.

Member

gmaxwell commented Nov 8, 2015

Without this, my mining node with an elevated relay fee (and running mempool limiting) and default dbcache is at 4GB RES. We do need to do more to control the size of the coins cache, if not precisely this change.

@morcos

This comment has been minimized.

Show comment
Hide comment
@morcos

morcos Nov 9, 2015

Member

I did some benchmarking on this pull. I built it on top of all the other improvements in #6976 except I tried it with and without the hot cache. It caused a significant performance boost to ConnectBlock. Unfortunately on my test days, 10/6 and 10/7 and using 100 M mempool and 200 M dbcache, the total cpu time shot up by over 10%. This seems to be entirely due to the 15kb very low fee transaction spam and increased time to reject these txs. I did not break down where the time was spent, but my guess is that it is not all in the uncaching code and that part of the problem is that these large txs often spent from the same prev hashes. So by uncaching them we cause more time to be spent just calculating the fee of new txs before we can reject them.
It also had the desired effect of reducing cache size, and reduced cache flushes from 38 to 16 over the 2 days.

Here is a summary of the results. I haven't fully reviewed the code yet, but I think this might be a performance hit we could tolerate.

Code Avg time for tx rejected by ATMP (ms) Avg time to connect block (ms)
baseline (#6976 except #6936) 1.7 134
baseline + #6936 1.7 128
baseline + this PR 2.9 109
baseline + this PR + #6936 2.9 104
Member

morcos commented Nov 9, 2015

I did some benchmarking on this pull. I built it on top of all the other improvements in #6976 except I tried it with and without the hot cache. It caused a significant performance boost to ConnectBlock. Unfortunately on my test days, 10/6 and 10/7 and using 100 M mempool and 200 M dbcache, the total cpu time shot up by over 10%. This seems to be entirely due to the 15kb very low fee transaction spam and increased time to reject these txs. I did not break down where the time was spent, but my guess is that it is not all in the uncaching code and that part of the problem is that these large txs often spent from the same prev hashes. So by uncaching them we cause more time to be spent just calculating the fee of new txs before we can reject them.
It also had the desired effect of reducing cache size, and reduced cache flushes from 38 to 16 over the 2 days.

Here is a summary of the results. I haven't fully reviewed the code yet, but I think this might be a performance hit we could tolerate.

Code Avg time for tx rejected by ATMP (ms) Avg time to connect block (ms)
baseline (#6976 except #6936) 1.7 134
baseline + #6936 1.7 128
baseline + this PR 2.9 109
baseline + this PR + #6936 2.9 104
@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Nov 28, 2015

Member

Needs rebase.

Member

sipa commented Nov 28, 2015

Needs rebase.

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Dec 1, 2015

Member

Still needs rebase

Member

laanwj commented Dec 1, 2015

Still needs rebase

@TheBlueMatt

This comment has been minimized.

Show comment
Hide comment
@TheBlueMatt

TheBlueMatt Dec 1, 2015

Contributor

Yes, sorry, I thought the plan was to go forward with #6936 and skip this one entirely, but it seems there is some consensus to do this as an intermediate until #6936 is fully ready to replace it (probably 0.13). Will try to rebase this tomorrow.

Contributor

TheBlueMatt commented Dec 1, 2015

Yes, sorry, I thought the plan was to go forward with #6936 and skip this one entirely, but it seems there is some consensus to do this as an intermediate until #6936 is fully ready to replace it (probably 0.13). Will try to rebase this tomorrow.

@TheBlueMatt

This comment has been minimized.

Show comment
Hide comment
@TheBlueMatt

TheBlueMatt Dec 1, 2015

Contributor

Rebased, squashed, pulled in sipa@4186ac9 for 0.12.

Contributor

TheBlueMatt commented Dec 1, 2015

Rebased, squashed, pulled in sipa@4186ac9 for 0.12.

@MarcoFalke

This comment has been minimized.

Show comment
Hide comment
@MarcoFalke

MarcoFalke Dec 1, 2015

Member

@TheBlueMatt Needs rebase. (Travis fail is unrelated: #6540)

Member

MarcoFalke commented Dec 1, 2015

@TheBlueMatt Needs rebase. (Travis fail is unrelated: #6540)

@TheBlueMatt

This comment has been minimized.

Show comment
Hide comment
@TheBlueMatt

TheBlueMatt Dec 2, 2015

Contributor

Rebased. Again.

Contributor

TheBlueMatt commented Dec 2, 2015

Rebased. Again.

@laanwj laanwj merged commit dd5862c into bitcoin:master Dec 2, 2015

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

laanwj added a commit that referenced this pull request Dec 2, 2015

Merge pull request #6872
dd5862c Flush coins cache also after transaction processing (Pieter Wuille)
bde953e Uncache input txn in utxo cache if a tx is not accepted to mempool (Matt Corallo)
97bf377 Add CCoinsViewCache::HaveCoinsInCache to check if a tx is cached (Matt Corallo)
677aa3d Discard txn cache entries that were loaded for removed mempool txn (Matt Corallo)
b2e74bd Get the set of now-uncacheable-txn from CTxMemPool::TrimToSize (Matt Corallo)
74d0f90 Add method to remove a tx from CCoinsViewCache if it is unchanged (Matt Corallo)
@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Dec 2, 2015

Member

Merged this because I want to branch off 0.12 soon, not happy it has no ACKs at all though (oh it does, sorry @sipa). Would still be nice if some reviewers could poshumously ACK this

Member

laanwj commented Dec 2, 2015

Merged this because I want to branch off 0.12 soon, not happy it has no ACKs at all though (oh it does, sorry @sipa). Would still be nice if some reviewers could poshumously ACK this

@morcos

This comment has been minimized.

Show comment
Hide comment
@morcos

morcos Dec 2, 2015

Member

code review and tested for performance only ACK

(I think most of us tested this before #5967 was merged, but I can't think why that should cause any problems)

Member

morcos commented Dec 2, 2015

code review and tested for performance only ACK

(I think most of us tested this before #5967 was merged, but I can't think why that should cause any problems)

@arielgabizon

This comment has been minimized.

Show comment
Hide comment
@arielgabizon

arielgabizon Dec 27, 2017

What was the justification for pulling in the flush commit? It seemed from the conversation that there was agreement of @TheBlueMatt and @morcos that it was not a good idea.

What was the justification for pulling in the flush commit? It seemed from the conversation that there was agreement of @TheBlueMatt and @morcos that it was not a good idea.

@TheBlueMatt

This comment has been minimized.

Show comment
Hide comment
@TheBlueMatt

TheBlueMatt Dec 27, 2017

Contributor
Contributor

TheBlueMatt commented Dec 27, 2017

@dagurval dagurval referenced this pull request in bitcoinxt/bitcoinxt May 7, 2018

Merged

Extend coins_tests #419

@dagurval dagurval referenced this pull request in bitcoinxt/bitcoinxt Jun 13, 2018

Merged

Flush coins cache ofter #453

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment