-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make sure we never keep more than 1 ledger state in memory #639
Conversation
-- Ledger states are growing to become very big in memory. | ||
-- Before parsing the new ledger state we need to make sure the old ledger state | ||
-- is or can be garbage collected. | ||
writeLedgerState env Nothing | ||
mst <- findStateFromPoint env point delFiles |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we want to force a gc at this point, since the old ledger state is a very big memory chunk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
performMajorGC
?
readStateUnsafe env = do | ||
mState <- readTVar $ leStateVar env | ||
case mState of | ||
Nothing -> panic "ledger state is not found" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize this is still a work in progress, but panic
message should have the module name in the message so if it ever gets hit we know where it came from.
mState <- readTVar $ leStateVar env | ||
case mState of | ||
Nothing -> panic "ledger state is not found" | ||
Just st -> return st |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rest of the code uses pure
instead of return
in preparation for https://gitlab.haskell.org/ghc/ghc/-/wikis/proposal/monad-of-no-return .
@@ -128,7 +127,7 @@ data LedgerEnv = LedgerEnv | |||
{ leProtocolInfo :: !(Consensus.ProtocolInfo IO CardanoBlock) | |||
, leDir :: !LedgerStateDir | |||
, leNetwork :: !Ledger.Network | |||
, leStateVar :: !(StrictTVar IO CardanoLedgerState) | |||
, leStateVar :: !(StrictTVar IO (Maybe CardanoLedgerState)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could have an empty or minimal ledger state there instead of having the Maybe
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leStateVar
is read in 1 place: applyBlock
. If we have a minimal state at this point it would simply lead to more cryptic errors.
We could try to make this type safe. This would probably need to follow the approach of ouroboros-network with typed protocols and parametrise LedgerEnv
over the Nat n
. Probably a pretty big refactoring.
leStateVar
can be empty in 2 cases: after initiation and if loadLedgerAtPoint
returns Left. In both cases we send to the node a FindIntersect
message. The node replies with the point we should roll back and so loadLedgerAtPoint
is called again.
adeef7e
to
197df2f
Compare
197df2f
to
1ec52c8
Compare
readStateUnsafe env = do | ||
mState <- readTVar $ leStateVar env | ||
case mState of | ||
Nothing -> panic "LedgerState.readStateUnsafe: Ledger state is not found" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Ledger states have grown to take a lot of memory and we must be very careful with handling them. On rollbacks, db-sync parses a new ledger state from a file. We have to make sure that we don't keep a pointer to the old ledger state while parsing the new ledger state or this can cause big memory spikes.
Rollbacks also happen on startups. We have fixed startups in a slightly different way. Since the first message is always a
MsgRollBackward
we don't parse any ledger state before this message.After the fix the spike disappears: