Spike: garbage collection in Forest #1708

LesnyRumcajs · 2022-07-22T07:39:46Z

Issue summary

Currently, we can't just leave Forest there to run as the DB seems to, more or less linearly, grow. Around 30 GB per day. After ~3 days the volume consisting of mainnet and calibnet is 208 GB big.

197G    ./mainnet/db
197G    ./mainnet
12G     ./calibnet/db
12G     ./calibnet
293M    ./filecoin-proof-parameters
208G    .

What we most likely need is a garbage collection mechanism along the lines of the one introduced/being introduced in Lotus. Links below.

No code here. This issue is meant for the discovery of what is the state-of-the-art (Lotus), what is Forest currently doing and what can we improve.

The outcome of this task is a set of new issues, properly described and estimated. There may be some low-hanging fruits there that would at least allow slowing down this increase to a reasonable level.

Other information and links

filecoin-project/lotus#6577
filecoin-project/lotus#9056
filecoin-project/lotus#6474

The text was updated successfully, but these errors were encountered:

ZenGround0 · 2022-07-22T16:24:36Z

In case it's useful here is a quick rundown of the lotus splitstore system.

History: this was designed and mostly finished by @vyzo one year ago. A big chunk of it has been an experimental feature you can enable through a non default config. The final parts are finishing up and landing in the default configuration in a month or two if testing goes well.

Terminology: in this domain we talk about "blocks" all the time. These are not filecoin blockchain blocks but hash linked data blocks, i.e. one chunk of data indexed by hash.

The splitstore takes the part of the lotus blockstore used for storing chain data (block headers, messages, state) and separates it into two parts: the hot store and the cold store. The reason for doing this was (iirc) in part related to the lotus graph database (badger) having bad scaling properties where datastore transactions grew slower with the size of the datastore. The idea was to GC datastore blocks from a hot temporary datastore to a persistant cold store to keep the hotstore size down.

The splitstore currently works in lotus in two major modes (with some extra configuration available). In short it can throw away all garbage collected from the hotstore (discard mode) or it can store all of it in the coldstore (universal mode). The last piece is a WIP third mode (prune mode) GCing unwanted blocks from the coldstore. The goal is to have a default splitstore that GCs from hot to cold and then GCs everything but message and block header data from cold.

Currently hotstore GC happens after a (configurable) number of epochs. GC works by (1) walking the chain (4 finalities for safety) and marking recent history that cannot be GCed (2) optionally writing all of the unmarked garbage to the coldstore (3) removing garbage from the hotstore. The actual implementation is a bit complicated because it is quite smart and (1) protects all blocks accessed or written during compaction from being removed (2) uses checkpointing and persistent remove sets to enable continuing with the GC after a crash or shutdown in the middle of the GC process.

Overall this design seems like a good approach especially for supporting a variety of lotus use cases (full archival of message data: prune mode, small datastore for development: discard mode). However its design was largely dependent on properties of the lotus-specific datastore dependency. I would recommend understanding forest use cases and scaling issues thoroughly before adopting the same design. It's possible you will be able to go with something simpler. Assuming network traffic does not change drastically you have plenty of time: we have been operating with the delete datastore and get a new snapshot mode of operation for going on two years and it still hasn't become a burning fire.

--edit--

One last thing -- the discard mode is quite nice for keeping constant disk usage on development nodes. Here is the steady state weeks after starting up when running in discard mode

~/.lotus/datastore$ du --block-size=G
1G      ./client
88G     ./chain
1G      ./staging
1G      ./metadata
124G    ./splitstore/hot.badger.1658400793949152331
4G      ./splitstore/markset.badger/live
4G      ./splitstore/markset.badger
127G    ./splitstore
214G    .

LesnyRumcajs · 2022-07-25T09:10:36Z

@ZenGround0 Thanks for the insights! They'll definitely come in handy when we decide to go ahead with cracking this issue.

LesnyRumcajs added the Status: Needs Triage Used when an issue needs to be assigned a priority and assignee label Jul 22, 2022

LesnyRumcajs mentioned this issue Jul 22, 2022

Periodically clean continous sync db #1709

Closed

LesnyRumcajs added the Performance label Jul 22, 2022

LesnyRumcajs mentioned this issue Nov 29, 2022

Add garbage collection to Forest #2292

Closed

9 tasks

hanabi1224 mentioned this issue Mar 16, 2023

feat: database garbage collection #2638

Merged

8 tasks

hanabi1224 linked a pull request Mar 16, 2023 that will close this issue

feat: database garbage collection #2638

Merged

8 tasks

hanabi1224 closed this as completed in #2638 Mar 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spike: garbage collection in Forest #1708

Spike: garbage collection in Forest #1708

LesnyRumcajs commented Jul 22, 2022 •

edited

ZenGround0 commented Jul 22, 2022 •

edited

LesnyRumcajs commented Jul 25, 2022

Spike: garbage collection in Forest #1708

Spike: garbage collection in Forest #1708

Comments

LesnyRumcajs commented Jul 22, 2022 • edited

ZenGround0 commented Jul 22, 2022 • edited

LesnyRumcajs commented Jul 25, 2022

LesnyRumcajs commented Jul 22, 2022 •

edited

ZenGround0 commented Jul 22, 2022 •

edited