-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keep every snapshot in memory and remove the concept of snapEvery #2459
Conversation
9c27ff7
to
c3267f7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep up the good work!
ouroboros-consensus/src/Ouroboros/Consensus/Storage/ChainDB/Impl/LgrDB.hs
Outdated
Show resolved
Hide resolved
ouroboros-consensus/src/Ouroboros/Consensus/Storage/ChainDB/Impl/LgrDB.hs
Outdated
Show resolved
Hide resolved
ouroboros-consensus/src/Ouroboros/Consensus/Storage/LedgerDB/InMemory.hs
Outdated
Show resolved
Hide resolved
ouroboros-consensus/src/Ouroboros/Consensus/Storage/LedgerDB/InMemory.hs
Outdated
Show resolved
Hide resolved
ouroboros-consensus/src/Ouroboros/Consensus/Storage/LedgerDB/InMemory.hs
Outdated
Show resolved
Hide resolved
ouroboros-consensus/src/Ouroboros/Consensus/Storage/LedgerDB/InMemory.hs
Outdated
Show resolved
Hide resolved
ouroboros-consensus/src/Ouroboros/Consensus/Storage/LedgerDB/InMemory.hs
Show resolved
Hide resolved
ouroboros-consensus/src/Ouroboros/Consensus/Storage/LedgerDB/InMemory.hs
Outdated
Show resolved
Hide resolved
ouroboros-consensus/test-storage/Test/Ouroboros/Storage/LedgerDB/InMemory.hs
Outdated
Show resolved
Hide resolved
ouroboros-consensus/test-storage/Test/Ouroboros/Storage/LedgerDB/OnDisk.hs
Outdated
Show resolved
Hide resolved
c3267f7
to
96ff405
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some minor things.
Can you set #2446 to be the base branch of this PR?
ouroboros-consensus/src/Ouroboros/Consensus/Storage/LedgerDB/InMemory.hs
Outdated
Show resolved
Hide resolved
ouroboros-consensus/src/Ouroboros/Consensus/Storage/LedgerDB/InMemory.hs
Outdated
Show resolved
Hide resolved
ouroboros-consensus/src/Ouroboros/Consensus/Storage/LedgerDB/InMemory.hs
Outdated
Show resolved
Hide resolved
ouroboros-consensus/src/Ouroboros/Consensus/Storage/LedgerDB/InMemory.hs
Outdated
Show resolved
Hide resolved
ouroboros-consensus/src/Ouroboros/Consensus/Storage/LedgerDB/InMemory.hs
Show resolved
Hide resolved
96ff405
to
c4d4435
Compare
data LedgerDB l r = LedgerDB { | ||
-- | The ledger state at the tip of the chain | ||
ledgerDbCurrent :: !l | ||
|
||
-- | Older ledger states |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that ledgerDbCurrent
is no longer a field, we only have these "older ledger states", which includes the current one. So rename it to:
-- | Older ledger states | |
-- | Ledger state checkpoints |
Just as a sanity check, I tried this out using
Using a database recently synced with mainnet, in the Shelley era. I disabled syncing. Very important: the VolatileDB must contain a chain of at least Trace output:
I looked at the memory usage of Without this PR: 911 MB RAM I repeated this a few times and always got similar results. Trying with This means we should definitely not merge this PR now. Thoughts:
|
As a sanity check, I repeated my measurements using mainnet at the end of the Byron: Trace output:
Without this PR: 305 MB RAM Not the catastrophic increase we saw for Shelley, but still more than expected from #1936. That's 150 MB more instead of the expected <10 MB 🤔. I'm not sure how the Haskell heap grows, maybe that plays a role here? I'm wondering whether we're overlooking something 🤔. |
With 8e7b5e2 and cherry-picked IntersectMBO/cardano-ledger#1775 I still see very big memory usage (maximum residency) on Shelley. snapEvery = 1:
snapEvery = 100:
snapEvery = k:
|
It looks like what caused the issue was the delegationTransition fixed on IntersectMBO/cardano-ledger#1779. Creating a new reward Map (with over 40k entries) probably killed sharing between consecutive ledger. snapEvery = 1, without delegation fix: With the fix cherry-picked, the graph looks much different snapEvery = 1, with delegation fix: This looks similar with |
Some points:
|
Great, thanks for this investigation! Also nice to see I unknowingly already fixed the issue 🙂
I believe it corresponds to this line: { _utxo = eval ((txins txb ⋪ utxo) ∪ txouts txb), which is the main expected contributor to the memory growth, i.e., UTxO changes. Unless that
Which fixes do you mean? IntersectMBO/cardano-ledger#1779 has been merged and IntersectMBO/cardano-ledger#1775 should not affect sharing, right? Do you mean IntersectMBO/cardano-ledger#1785? I would actually be interested in seeing the impact of an epoch transition on the memory usage (with IntersectMBO/cardano-ledger#1785 applied). I'm relieved that we can still go through with this simplification after all 😌. But I agree, let's first optimise the overlay schedule so we can reuse the memory freed by that optimisation for this simplification. |
On epoch boundaries the results also seem similar. This is using the db-analyser, starting from a snapshot at the very beginning of epoch 208 and validating up to a very recent block: Both cases report a maximum residency of 2,500,000,000 bytes and I see max 4GB used by top/ps (What's this PINNED memory?). snapEvery = 100:
snapEvery = 1:
|
Good, that's what I was expecting/hoping: there can be at most one epoch boundary transition in a range of PINNED memory is typically the bytes in
More things to traverse, so more time, that's expected. Note that profiling skews this result, running with |
c4d4435
to
ddb227e
Compare
The results after the fix #2532 look pretty similar again: I also tested what happened if we never call prune (keep all ledger on memory). This shows that sharing is pretty good, if we take into account that by the end there are ~90000 ledger states stored. I also tested times and memory with psrecord (no ghc profiling involved) and the pr doesn't create any visible difference: |
Great, let's merge! |
bors merge |
👎 Rejected by too few approved reviews |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bors merge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bors merge
bors ping |
pong |
bors merge |
Fixes #1935. Consider the following situation: A -> B -- current chain \ > B' -> C' -- fork To validate the header of C', we need a ledger view that is valid for the slot of C'. We can't always use the current ledger to produce a ledger view, because our chain might include changes (that applied after the intersection A) to the ledger view that are not present in the fork. So we must get a ledger view at the intersection, A, and use that to forecast the ledger view at C'. Previously, we only kept a ledger state in memory for every 100 blocks, so obtaining a ledger state at the intersection point might require reading blocks from disk and reapplying them. As this is expensive for us, but cheap to trigger by attackers (create headers and serve them to us), this would lead to DoS possibilities. For that reason, each ledger state stored snapshots of past ledger views. So to obtain a ledger view for A, we could ask B for a past ledger view at the slot of A. After #2459, we keep all `k` past ledgers in memory, which makes it cheap to ask for a past ledger and thus past ledger view. This means we no longer need to store ledger view snapshots in each ledger state. Both for Byron and Shelley we can remove the ledger view history from the ledger. To remain backwards binary compatible with existing ledger snapshots, we still allow the ledger view history in the decoders, but ignore it, and don't encode it anymore. Consequently, `ledgerViewForecastAtTip` no longer needs to specify *at* which slot (i.e., A's slot) to make the forecast (i.e., which past ledger view snapshot to use). Instead, we first get the right past ledger with `getPastLedger` and use that to make forecasts. This results in some simplifications in the ChainSyncClient.
Fixes #1935. Consider the following situation: A -> B -- current chain \ > B' -> C' -- fork To validate the header of C', we need a ledger view that is valid for the slot of C'. We can't always use the current ledger to produce a ledger view, because our chain might include changes (that applied after the intersection A) to the ledger view that are not present in the fork. So we must get a ledger view at the intersection, A, and use that to forecast the ledger view at C'. Previously, we only kept a ledger state in memory for every 100 blocks, so obtaining a ledger state at the intersection point might require reading blocks from disk and reapplying them. As this is expensive for us, but cheap to trigger by attackers (create headers and serve them to us), this would lead to DoS possibilities. For that reason, each ledger state stored snapshots of past ledger views. So to obtain a ledger view for A, we could ask B for a past ledger view at the slot of A. After #2459, we keep all `k` past ledgers in memory, which makes it cheap to ask for a past ledger and thus past ledger view. This means we no longer need to store ledger view snapshots in each ledger state. Both for Byron and Shelley we can remove the ledger view history from the ledger. To remain backwards binary compatible with existing ledger snapshots, we still allow the ledger view history in the decoders, but ignore it, and don't encode it anymore. Consequently, `ledgerViewForecastAtTip` no longer needs to specify *at* which slot (i.e., A's slot) to make the forecast (i.e., which past ledger view snapshot to use). Instead, we first get the right past ledger with `getPastLedger` and use that to make forecasts. This results in some simplifications in the ChainSyncClient.
Fixes #1935. Consider the following situation: A -> B -- current chain \ > B' -> C' -- fork To validate the header of C', we need a ledger view that is valid for the slot of C'. We can't always use the current ledger to produce a ledger view, because our chain might include changes to the ledger view that are not present in the fork (they were activated after the intersection A). So we must get a ledger view at the intersection, A, and use that to forecast the ledger view at C'. Previously, we only kept a ledger state in memory for every 100 blocks, so obtaining a ledger state at the intersection point might require reading blocks from disk and reapplying them. As this is expensive for us, but cheap to trigger by attackers (create headers and serve them to us), this would lead to DoS possibilities. For that reason, each ledger state stored snapshots of past ledger views. So to obtain a ledger view for A, we could ask B for a past ledger view at the slot of A (and use that to forecast for C'). After #2459, we keep all `k` past ledgers in memory, which makes it cheap to ask for a past ledger and thus past ledger view. This means we no longer need to store ledger view snapshots in each ledger state. This was awkward, because we had a double history: we stored snapshots of the ledger state and each ledger state stored snapshots of the ledger view. Both for Byron and Shelley we can remove the ledger view history from the ledger. To remain backwards binary compatible with existing ledger snapshots, we still allow the ledger view history in the decoders, but ignore it, and don't encode it anymore. Consequently, `ledgerViewForecastAtTip` no longer needs to specify *at* which slot (i.e., A's slot) to make the forecast (i.e., which past ledger view snapshot to use). Instead, we first get the right past ledger with `getPastLedger` and use its ledger view to make forecasts. This results in some simplifications in the ChainSyncClient.
Fixes #1935, #2506, #2559, and #2562. Consider the following situation: A -> B -- current chain \ > B' -> C' -- fork To validate the header of C', we need a ledger view that is valid for the slot of C'. We can't always use the current ledger to produce a ledger view, because our chain might include changes to the ledger view that are not present in the fork (they were activated after the intersection A). So we must get a ledger view at the intersection, A, and use that to forecast the ledger view at C'. Previously, we only kept a ledger state in memory for every 100 blocks, so obtaining a ledger state at the intersection point might require reading blocks from disk and reapplying them. As this is expensive for us, but cheap to trigger by attackers (create headers and serve them to us), this would lead to DoS possibilities. For that reason, each ledger state stored snapshots of past ledger views. So to obtain a ledger view for A, we asked B for a past ledger view at the slot of A (and use that to forecast for C'). After #2459 we keep all `k` past ledgers in memory, which makes it cheap to ask for a past ledger and thus a past ledger view. This means we no longer need to store ledger view snapshots in each ledger state. This was awkward, because we had a double history: we stored snapshots of the ledger state and each ledger state stored snapshots of the ledger view. Both for Byron and Shelley we can remove the ledger view history from the ledger. To remain backwards binary compatible with existing ledger snapshots, we still allow the ledger view history in the decoders but ignore it, and don't encode it anymore. Consequently, `ledgerViewForecastAtTip` no longer needs to specify *at* which slot (i.e., A's slot) to make the forecast (i.e., which past ledger view snapshot to use). Instead, we first get the right past ledger with `getPastLedger` and use its ledger view to make forecasts. This results in some simplifications in the ChainSyncClient.
#2440
I tried to
but it's possible I missed some of those.