Keep every snapshot in memory and remove the concept of snapEvery #2459

kderme · 2020-07-23T11:05:04Z

#2440
I tried to

remove some tests which became trivial.
update the documentation.
simplify some constraints

but it's possible I missed some of those.

mrBliss

Keep up the good work!

ouroboros-consensus/src/Ouroboros/Consensus/Storage/ChainDB/Impl/LgrDB.hs

ouroboros-consensus/src/Ouroboros/Consensus/Storage/LedgerDB/InMemory.hs

ouroboros-consensus/test-storage/Test/Ouroboros/Storage/LedgerDB/InMemory.hs

ouroboros-consensus/test-storage/Test/Ouroboros/Storage/LedgerDB/OnDisk.hs

mrBliss

Just some minor things.

Can you set #2446 to be the base branch of this PR?

ouroboros-consensus/src/Ouroboros/Consensus/Storage/LedgerDB/InMemory.hs

mrBliss · 2020-08-10T09:20:37Z

ouroboros-consensus/src/Ouroboros/Consensus/Storage/LedgerDB/InMemory.hs

 data LedgerDB l r = LedgerDB {
-      -- | The ledger state at the tip of the chain
-      ledgerDbCurrent :: !l
-
      -- | Older ledger states


Now that ledgerDbCurrent is no longer a field, we only have these "older ledger states", which includes the current one. So rename it to:

Suggested change

-- | Older ledger states

-- | Ledger state checkpoints

mrBliss · 2020-08-10T10:38:00Z

Just as a sanity check, I tried this out using cardano-node:

cardano-node: e8fb486e44aceb4b9962d7b3b76849becd510820
ouroboros-network: ec74d1c + with and without cherry picking c4d4435

Using a database recently synced with mainnet, in the Shelley era. I disabled syncing. Very important: the VolatileDB must contain a chain of at least k blocks, otherwise we won't have k snapshots in memory.

Trace output:

[desktop:cardano.node.ChainDB:Info:5] [2020-08-10 10:16:16.18 UTC] Opened imm db with immutable tip at (Point 4564640, "35b10144ee477ba9f80389fbcdbb4f5ebd7678e1750dbaa2583eeff00de69466") and epoch 211
[desktop:cardano.node.ChainDB:Info:5] [2020-08-10 10:16:16.46 UTC] Opened vol db
[desktop:cardano.node.ChainDB:Info:5] [2020-08-10 10:16:19.91 UTC] Replaying ledger from snapshot DiskSnapshot 28 at (Point 4563080, "2c85088fa88973c87cd0f2aca3aebab30ab9719e72e2b784d2395b7df692071e")
[desktop:cardano.node.ChainDB:Info:5] [2020-08-10 10:16:20.23 UTC] Replayed block: slot SlotNo {unSlotNo = 4563100} of At (SlotNo {unSlotNo = 4564640})
[desktop:cardano.node.ChainDB:Info:5] [2020-08-10 10:16:22.92 UTC] block replay progress (%) = 100.0
[desktop:cardano.node.ChainDB:Info:5] [2020-08-10 10:16:24.87 UTC] before next, messages elided = 4563119
[desktop:cardano.node.ChainDB:Info:5] [2020-08-10 10:16:24.87 UTC] Replayed block: slot SlotNo {unSlotNo = 4564640} of At (SlotNo {unSlotNo = 4564640})
[desktop:cardano.node.ChainDB:Info:5] [2020-08-10 10:16:24.87 UTC] Opened lgr db
[desktop:cardano.node.ChainDB:Info:5] [2020-08-10 10:19:51.55 UTC] Opened db with immutable tip at (Point 4564640, "35b10144ee477ba9f80389fbcdbb4f5ebd7678e1750dbaa2583eeff00de69466") and tip (Point 4608280, "64455962d499039d109e642aafce53408df6c9dcc0f8f726074a837b739682c7")

I looked at the memory usage of cardano-node with htop.

Without this PR: 911 MB RAM
With this PR: 7717 MB RAM (!!!)

I repeated this a few times and always got similar results. Trying with snapEvery = 1 instead of this PR gave similar results.

This means we should definitely not merge this PR now.

Thoughts:

The measurements in Keep _every_ snapshot of the ledger state in memory #1936 were done using the Byron ledger, not the Shelley (or Cardano) one.
I suspect that for some reason, there is much less sharing in the Shelley ledger than in the Byron one.
Recently there has been a lot transactions in the chain, record numbers. This might explain some extra memory usage, but not this much.
~~This is without Fix thunks cardano-ledger#1722, which fixes some thunks. I'll measure again with this fix in, but I'm not expecting much of it.~~ UPDATE: doesn't improve things.

mrBliss · 2020-08-10T11:04:37Z

As a sanity check, I repeated my measurements using mainnet at the end of the Byron:

Trace output:

[desktop:cardano.node.ChainDB:Info:5] [2020-08-10 10:16:16.18 UTC] Opened imm db with immutable tip at (Point 4564640, "35b10144ee477ba9f80389fbcdbb4f5ebd7678e1750dbaa2583eeff00de69466") and epoch 211
[desktop:cardano.node.ChainDB:Info:5] [2020-08-10 10:16:16.46 UTC] Opened vol db
[desktop:cardano.node.ChainDB:Info:5] [2020-08-10 10:16:19.91 UTC] Replaying ledger from snapshot DiskSnapshot 28 at (Point 4563080, "2c85088fa88973c87cd0f2aca3aebab30ab9719e72e2b784d2395b7df692071e")
[desktop:cardano.node.ChainDB:Info:5] [2020-08-10 10:16:20.23 UTC] Replayed block: slot SlotNo {unSlotNo = 4563100} of At (SlotNo {unSlotNo = 4564640})
[desktop:cardano.node.ChainDB:Info:5] [2020-08-10 10:16:22.92 UTC] block replay progress (%) = 100.0
[desktop:cardano.node.ChainDB:Info:5] [2020-08-10 10:16:24.87 UTC] before next, messages elided = 4563119
[desktop:cardano.node.ChainDB:Info:5] [2020-08-10 10:16:24.87 UTC] Replayed block: slot SlotNo {unSlotNo = 4564640} of At (SlotNo {unSlotNo = 4564640})
[desktop:cardano.node.ChainDB:Info:5] [2020-08-10 10:16:24.87 UTC] Opened lgr db
[desktop:cardano.node.ChainDB:Info:5] [2020-08-10 10:19:51.55 UTC] Opened db with immutable tip at (Point 4564640, "35b10144ee477ba9f80389fbcdbb4f5ebd7678e1750dbaa2583eeff00de69466") and tip (Point 4608280, "64455962d499039d109e642aafce53408df6c9dcc0f8f726074a837b739682c7")

Without this PR: 305 MB RAM
With this PR: 455-465 MB RAM

Not the catastrophic increase we saw for Shelley, but still more than expected from #1936. That's 150 MB more instead of the expected <10 MB 🤔. I'm not sure how the Haskell heap grows, maybe that plays a role here?

I'm wondering whether we're overlooking something 🤔.

mrBliss · 2020-08-11T06:46:10Z

@kderme Could you repeat the experiments from #1936 with the Shelley ledger?

kderme · 2020-08-13T20:18:46Z

With 8e7b5e2 and cherry-picked IntersectMBO/cardano-ledger#1775 I still see very big memory usage (maximum residency) on Shelley.

snapEvery = 1:

 4,105,555,200 bytes maximum residency (47 sample(s))
 104,871,740,160 bytes allocated in the heap
 92,482,133,216 bytes copied during GC

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0     97207 colls,     0 par   38.548s  38.702s     0.0004s    0.0033s
  Gen  1        47 colls,     0 par   36.904s  40.847s     0.8691s    4.6105s

  MUT     time   60.596s  ( 62.544s elapsed)
  GC      time   75.452s  ( 79.549s elapsed)
  Total   time  136.048s  (142.094s elapsed)

snapEvery = 100:

 831,904,064 bytes maximum residency (120 sample(s))
 104,871,360,408 bytes allocated in the heap
 91,016,416,384 bytes copied during GC

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0     97134 colls,     0 par   38.149s  38.292s     0.0004s    0.0030s
  Gen  1       120 colls,     0 par   20.708s  21.219s     0.1768s    0.6442s

  MUT     time   61.557s  ( 63.407s elapsed)
  GC      time   58.857s  ( 59.510s elapsed)
  Total   time  120.415s  (122.918s elapsed)

snapEvery = k:

 682,901,344 bytes maximum residency (127 sample(s))
 104,871,422,048 bytes allocated in the heap
 90,606,678,416 bytes copied during GC

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0     97127 colls,     0 par   37.810s  37.952s     0.0004s    0.0035s
  Gen  1       127 colls,     0 par   18.535s  18.971s     0.1494s    0.4656s

  MUT     time   59.905s  ( 61.707s elapsed)
  GC      time   56.345s  ( 56.922s elapsed)
  Total   time  116.251s  (118.630s elapsed)

kderme · 2020-08-16T16:10:36Z

It looks like what caused the issue was the delegationTransition fixed on IntersectMBO/cardano-ledger#1779. Creating a new reward Map (with over 40k entries) probably killed sharing between consecutive ledger.

snapEvery = 1, without delegation fix:

With the fix cherry-picked, the graph looks much different

snapEvery = 1, with delegation fix:

This looks similar with
snapEvery = 100, with delegation fix:

kderme · 2020-08-16T16:58:57Z

Some points:

All these results are with Avoid using Map.keysSet on nesOsched cardano-ledger#1775 cherry-picked
The memory difference between snapEvery=1 vs 100 (with the fix) happens because of utxoInductive (dark green on snapEvery=1 graph). Maybe something we can optimize or see if it's reasonable.
I'd suggest we delay this a bit more, until all fixes are in place and also fixes on epoch boundaries, to make sure we don't cause big memory spikes

mrBliss · 2020-08-17T07:37:34Z

It looks like what caused the issue was the delegationTransition fixed on input-output-hk/cardano-ledger-specs#1779. Creating a new reward Map (with over 40k entries) probably killed sharing between consecutive ledger.

[..]

Great, thanks for this investigation! Also nice to see I unknowingly already fixed the issue 🙂

Some points:

* All these results are with [input-output-hk/cardano-ledger-specs#1775](https://github.com/input-output-hk/cardano-ledger-specs/pull/1775) cherry-picked

* The memory difference between snapEvery=1 vs 100 (with the fix) happens because of `utxoInductive` (dark green on snapEvery=1 graph). Maybe something we can optimize or see if it's reasonable.

I believe it corresponds to this line:

      { _utxo = eval ((txins txb ⋪ utxo) ∪ txouts txb),

which is the main expected contributor to the memory growth, i.e., UTxO changes. Unless that eval is doing something suboptimal, there's not much we can do.

* I'd suggest we delay this a bit more, until all fixes are in place and also fixes on epoch boundaries, to make sure we don't cause big memory spikes

Which fixes do you mean? IntersectMBO/cardano-ledger#1779 has been merged and IntersectMBO/cardano-ledger#1775 should not affect sharing, right? Do you mean IntersectMBO/cardano-ledger#1785?

I would actually be interested in seeing the impact of an epoch transition on the memory usage (with IntersectMBO/cardano-ledger#1785 applied).

I'm relieved that we can still go through with this simplification after all 😌. But I agree, let's first optimise the overlay schedule so we can reuse the memory freed by that optimisation for this simplification.

kderme · 2020-08-20T08:43:23Z

On epoch boundaries the results also seem similar. This is using the db-analyser, starting from a snapshot at the very beginning of epoch 208 and validating up to a very recent block:

snapEvery = 100:

snapEvery = 1:

Both cases report a maximum residency of 2,500,000,000 bytes and I see max 4GB used by top/ps (What's this PINNED memory?).
Total time is slightly higher on snapEvery = 1 (around 10 secs). I believe this is because of more time spent on gc:

snapEvery = 100:

MUT     time   18.213s  ( 56.391s elapsed)
GC      time   64.126s  ( 34.119s elapsed)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      4246 colls,  4246 par    9.601s   5.887s     0.0014s    0.0353s
  Gen  1       146 colls,   145 par   131.197s  66.556s     0.4559s    1.3432s

snapEvery = 1:

MUT     time   13.467s  ( 60.153s elapsed)
GC      time   83.366s  ( 44.040s elapsed)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      4240 colls,  4240 par   11.467s   7.066s     0.0017s    0.0429s
  Gen  1       164 colls,   163 par   165.699s  83.906s     0.5116s    1.3530s

mrBliss · 2020-08-20T09:03:36Z

On epoch boundaries the results also seem similar. This is using the db-analyser, starting from a snapshot at the very beginning of epoch 208 and validating up to a very recent block:

snapEvery = 100:
[..]

snapEvery = 1:
[..]

Both cases report a maximum residency of 2,500,000,000 bytes and I see max 4GB used by top/ps (What's this PINNED memory?).

Good, that's what I was expecting/hoping: there can be at most one epoch boundary transition in a range of k blocks, so snapEvery should not matter.

PINNED memory is typically the bytes in ByteStrings. As explained on Slack, my hypothesis is that these might be deserialised Byron addresses.

Total time is slightly higher on snapEvery = 1 (around 10 secs). I believe this is because of more time spent on gc:

snapEvery = 100:

MUT     time   18.213s  ( 56.391s elapsed)
GC      time   64.126s  ( 34.119s elapsed)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      4246 colls,  4246 par    9.601s   5.887s     0.0014s    0.0353s
  Gen  1       146 colls,   145 par   131.197s  66.556s     0.4559s    1.3432s

snapEvery = 1:

MUT     time   13.467s  ( 60.153s elapsed)
GC      time   83.366s  ( 44.040s elapsed)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      4240 colls,  4240 par   11.467s   7.066s     0.0017s    0.0429s
  Gen  1       164 colls,   163 par   165.699s  83.906s     0.5116s    1.3530s

More things to traverse, so more time, that's expected. Note that profiling skews this result, running with -s without profiling, the result should be better. Also, that epoch boundary transition is generating tons of garbage, so it makes sense that GC is doing a lot of work.

kderme · 2020-08-24T12:41:19Z

The results after the fix #2532 look pretty similar again:

before pr:

after pr:

I also tested what happened if we never call prune (keep all ledger on memory). This shows that sharing is pretty good, if we take into account that by the end there are ~90000 ledger states stored.

I also tested times and memory with psrecord (no ghc profiling involved) and the pr doesn't create any visible difference:

mrBliss · 2020-08-24T15:03:44Z

Great, let's merge!

mrBliss · 2020-08-24T15:03:50Z

bors merge

iohk-bors · 2020-08-24T15:03:53Z

👎 Rejected by too few approved reviews

mrBliss

bors merge

mrBliss

bors merge

mrBliss · 2020-08-25T06:40:35Z

bors ping

iohk-bors · 2020-08-25T06:40:37Z

pong

mrBliss · 2020-08-25T06:40:42Z

bors merge

iohk-bors · 2020-08-25T06:55:47Z

Build succeeded

Fixes #1935. Consider the following situation: A -> B -- current chain \ > B' -> C' -- fork To validate the header of C', we need a ledger view that is valid for the slot of C'. We can't always use the current ledger to produce a ledger view, because our chain might include changes (that applied after the intersection A) to the ledger view that are not present in the fork. So we must get a ledger view at the intersection, A, and use that to forecast the ledger view at C'. Previously, we only kept a ledger state in memory for every 100 blocks, so obtaining a ledger state at the intersection point might require reading blocks from disk and reapplying them. As this is expensive for us, but cheap to trigger by attackers (create headers and serve them to us), this would lead to DoS possibilities. For that reason, each ledger state stored snapshots of past ledger views. So to obtain a ledger view for A, we could ask B for a past ledger view at the slot of A. After #2459, we keep all `k` past ledgers in memory, which makes it cheap to ask for a past ledger and thus past ledger view. This means we no longer need to store ledger view snapshots in each ledger state. Both for Byron and Shelley we can remove the ledger view history from the ledger. To remain backwards binary compatible with existing ledger snapshots, we still allow the ledger view history in the decoders, but ignore it, and don't encode it anymore. Consequently, `ledgerViewForecastAtTip` no longer needs to specify *at* which slot (i.e., A's slot) to make the forecast (i.e., which past ledger view snapshot to use). Instead, we first get the right past ledger with `getPastLedger` and use that to make forecasts. This results in some simplifications in the ChainSyncClient.

Fixes #1935. Consider the following situation: A -> B -- current chain \ > B' -> C' -- fork To validate the header of C', we need a ledger view that is valid for the slot of C'. We can't always use the current ledger to produce a ledger view, because our chain might include changes to the ledger view that are not present in the fork (they were activated after the intersection A). So we must get a ledger view at the intersection, A, and use that to forecast the ledger view at C'. Previously, we only kept a ledger state in memory for every 100 blocks, so obtaining a ledger state at the intersection point might require reading blocks from disk and reapplying them. As this is expensive for us, but cheap to trigger by attackers (create headers and serve them to us), this would lead to DoS possibilities. For that reason, each ledger state stored snapshots of past ledger views. So to obtain a ledger view for A, we could ask B for a past ledger view at the slot of A (and use that to forecast for C'). After #2459, we keep all `k` past ledgers in memory, which makes it cheap to ask for a past ledger and thus past ledger view. This means we no longer need to store ledger view snapshots in each ledger state. This was awkward, because we had a double history: we stored snapshots of the ledger state and each ledger state stored snapshots of the ledger view. Both for Byron and Shelley we can remove the ledger view history from the ledger. To remain backwards binary compatible with existing ledger snapshots, we still allow the ledger view history in the decoders, but ignore it, and don't encode it anymore. Consequently, `ledgerViewForecastAtTip` no longer needs to specify *at* which slot (i.e., A's slot) to make the forecast (i.e., which past ledger view snapshot to use). Instead, we first get the right past ledger with `getPastLedger` and use its ledger view to make forecasts. This results in some simplifications in the ChainSyncClient.

Fixes #1935, #2506, #2559, and #2562. Consider the following situation: A -> B -- current chain \ > B' -> C' -- fork To validate the header of C', we need a ledger view that is valid for the slot of C'. We can't always use the current ledger to produce a ledger view, because our chain might include changes to the ledger view that are not present in the fork (they were activated after the intersection A). So we must get a ledger view at the intersection, A, and use that to forecast the ledger view at C'. Previously, we only kept a ledger state in memory for every 100 blocks, so obtaining a ledger state at the intersection point might require reading blocks from disk and reapplying them. As this is expensive for us, but cheap to trigger by attackers (create headers and serve them to us), this would lead to DoS possibilities. For that reason, each ledger state stored snapshots of past ledger views. So to obtain a ledger view for A, we asked B for a past ledger view at the slot of A (and use that to forecast for C'). After #2459 we keep all `k` past ledgers in memory, which makes it cheap to ask for a past ledger and thus a past ledger view. This means we no longer need to store ledger view snapshots in each ledger state. This was awkward, because we had a double history: we stored snapshots of the ledger state and each ledger state stored snapshots of the ledger view. Both for Byron and Shelley we can remove the ledger view history from the ledger. To remain backwards binary compatible with existing ledger snapshots, we still allow the ledger view history in the decoders but ignore it, and don't encode it anymore. Consequently, `ledgerViewForecastAtTip` no longer needs to specify *at* which slot (i.e., A's slot) to make the forecast (i.e., which past ledger view snapshot to use). Instead, we first get the right past ledger with `getPastLedger` and use its ledger view to make forecasts. This results in some simplifications in the ChainSyncClient.

kderme force-pushed the kderme/snapEvery-1 branch from 9c27ff7 to c3267f7 Compare July 23, 2020 12:11

mrBliss added the consensus issues related to ouroboros-consensus label Jul 23, 2020

mrBliss linked an issue Jul 23, 2020 that may be closed by this pull request

Simplify LedgerDB after snapEvery changes to 1 #2440

Closed

mrBliss reviewed Jul 23, 2020

View reviewed changes

edsko mentioned this pull request Jul 24, 2020

Avoid per-ledger ledger-view history #1935

Closed

kderme force-pushed the kderme/snapEvery-1 branch from c3267f7 to 96ff405 Compare July 30, 2020 06:23

kderme requested a review from mrBliss July 30, 2020 07:19

mrBliss reviewed Jul 31, 2020

View reviewed changes

kderme force-pushed the kderme/snapEvery-1 branch from 96ff405 to c4d4435 Compare August 9, 2020 15:16

mrBliss reviewed Aug 10, 2020

View reviewed changes

mrBliss mentioned this pull request Aug 20, 2020

[PERF] - GetFilteredDelegationsAndRewardAccounts slows down a lot with concurrent requests IntersectMBO/cardano-node#1674

Closed

Keep every snapshot in memory and remove the concept of snapEvery

ddb227e

kderme force-pushed the kderme/snapEvery-1 branch from c4d4435 to ddb227e Compare August 24, 2020 12:03

mrBliss self-requested a review August 24, 2020 15:04

mrBliss reviewed Aug 24, 2020

View reviewed changes

mrBliss self-requested a review August 24, 2020 15:04

mrBliss approved these changes Aug 24, 2020

View reviewed changes

iohk-bors bot merged commit 04734c4 into master Aug 25, 2020

iohk-bors bot deleted the kderme/snapEvery-1 branch August 25, 2020 06:55

mrBliss mentioned this pull request Aug 26, 2020

Replace LedgerCursor with getPastLedger #2543

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keep every snapshot in memory and remove the concept of snapEvery #2459

Keep every snapshot in memory and remove the concept of snapEvery #2459

kderme commented Jul 23, 2020

mrBliss left a comment

mrBliss left a comment

mrBliss Aug 10, 2020

mrBliss commented Aug 10, 2020 •

edited

Loading

mrBliss commented Aug 10, 2020

mrBliss commented Aug 11, 2020

kderme commented Aug 13, 2020 •

edited

Loading

kderme commented Aug 16, 2020

kderme commented Aug 16, 2020

mrBliss commented Aug 17, 2020

kderme commented Aug 20, 2020

mrBliss commented Aug 20, 2020

kderme commented Aug 24, 2020

mrBliss commented Aug 24, 2020

mrBliss commented Aug 24, 2020

iohk-bors bot commented Aug 24, 2020

mrBliss left a comment

mrBliss left a comment

mrBliss commented Aug 25, 2020

iohk-bors bot commented Aug 25, 2020

mrBliss commented Aug 25, 2020

iohk-bors bot commented Aug 25, 2020

Keep every snapshot in memory and remove the concept of snapEvery #2459

Keep every snapshot in memory and remove the concept of snapEvery #2459

Conversation

kderme commented Jul 23, 2020

mrBliss left a comment

Choose a reason for hiding this comment

mrBliss left a comment

Choose a reason for hiding this comment

mrBliss Aug 10, 2020

Choose a reason for hiding this comment

mrBliss commented Aug 10, 2020 • edited Loading

mrBliss commented Aug 10, 2020

mrBliss commented Aug 11, 2020

kderme commented Aug 13, 2020 • edited Loading

kderme commented Aug 16, 2020

kderme commented Aug 16, 2020

mrBliss commented Aug 17, 2020

kderme commented Aug 20, 2020

mrBliss commented Aug 20, 2020

kderme commented Aug 24, 2020

mrBliss commented Aug 24, 2020

mrBliss commented Aug 24, 2020

iohk-bors bot commented Aug 24, 2020

mrBliss left a comment

Choose a reason for hiding this comment

mrBliss left a comment

Choose a reason for hiding this comment

mrBliss commented Aug 25, 2020

iohk-bors bot commented Aug 25, 2020

mrBliss commented Aug 25, 2020

iohk-bors bot commented Aug 25, 2020

Build succeeded

mrBliss commented Aug 10, 2020 •

edited

Loading

kderme commented Aug 13, 2020 •

edited

Loading