Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spike: garbage collection in Forest #1708

Closed
LesnyRumcajs opened this issue Jul 22, 2022 · 2 comments · Fixed by #2638
Closed

Spike: garbage collection in Forest #1708

LesnyRumcajs opened this issue Jul 22, 2022 · 2 comments · Fixed by #2638
Labels
Performance Status: Needs Triage Used when an issue needs to be assigned a priority and assignee

Comments

@LesnyRumcajs
Copy link
Member

LesnyRumcajs commented Jul 22, 2022

Issue summary

Currently, we can't just leave Forest there to run as the DB seems to, more or less linearly, grow. Around 30 GB per day. After ~3 days the volume consisting of mainnet and calibnet is 208 GB big.

197G    ./mainnet/db
197G    ./mainnet
12G     ./calibnet/db
12G     ./calibnet
293M    ./filecoin-proof-parameters
208G    .

image

What we most likely need is a garbage collection mechanism along the lines of the one introduced/being introduced in Lotus. Links below.

No code here. This issue is meant for the discovery of what is the state-of-the-art (Lotus), what is Forest currently doing and what can we improve.

The outcome of this task is a set of new issues, properly described and estimated. There may be some low-hanging fruits there that would at least allow slowing down this increase to a reasonable level.

Other information and links

filecoin-project/lotus#6577
filecoin-project/lotus#9056
filecoin-project/lotus#6474

@LesnyRumcajs LesnyRumcajs added the Status: Needs Triage Used when an issue needs to be assigned a priority and assignee label Jul 22, 2022
@ZenGround0
Copy link

ZenGround0 commented Jul 22, 2022

In case it's useful here is a quick rundown of the lotus splitstore system.

History: this was designed and mostly finished by @vyzo one year ago. A big chunk of it has been an experimental feature you can enable through a non default config. The final parts are finishing up and landing in the default configuration in a month or two if testing goes well.

Terminology: in this domain we talk about "blocks" all the time. These are not filecoin blockchain blocks but hash linked data blocks, i.e. one chunk of data indexed by hash.

The splitstore takes the part of the lotus blockstore used for storing chain data (block headers, messages, state) and separates it into two parts: the hot store and the cold store. The reason for doing this was (iirc) in part related to the lotus graph database (badger) having bad scaling properties where datastore transactions grew slower with the size of the datastore. The idea was to GC datastore blocks from a hot temporary datastore to a persistant cold store to keep the hotstore size down.

The splitstore currently works in lotus in two major modes (with some extra configuration available). In short it can throw away all garbage collected from the hotstore (discard mode) or it can store all of it in the coldstore (universal mode). The last piece is a WIP third mode (prune mode) GCing unwanted blocks from the coldstore. The goal is to have a default splitstore that GCs from hot to cold and then GCs everything but message and block header data from cold.

Currently hotstore GC happens after a (configurable) number of epochs. GC works by (1) walking the chain (4 finalities for safety) and marking recent history that cannot be GCed (2) optionally writing all of the unmarked garbage to the coldstore (3) removing garbage from the hotstore. The actual implementation is a bit complicated because it is quite smart and (1) protects all blocks accessed or written during compaction from being removed (2) uses checkpointing and persistent remove sets to enable continuing with the GC after a crash or shutdown in the middle of the GC process.

Overall this design seems like a good approach especially for supporting a variety of lotus use cases (full archival of message data: prune mode, small datastore for development: discard mode). However its design was largely dependent on properties of the lotus-specific datastore dependency. I would recommend understanding forest use cases and scaling issues thoroughly before adopting the same design. It's possible you will be able to go with something simpler. Assuming network traffic does not change drastically you have plenty of time: we have been operating with the delete datastore and get a new snapshot mode of operation for going on two years and it still hasn't become a burning fire.

--edit--

One last thing -- the discard mode is quite nice for keeping constant disk usage on development nodes. Here is the steady state weeks after starting up when running in discard mode

~/.lotus/datastore$ du --block-size=G
1G      ./client
88G     ./chain
1G      ./staging
1G      ./metadata
124G    ./splitstore/hot.badger.1658400793949152331
4G      ./splitstore/markset.badger/live
4G      ./splitstore/markset.badger
127G    ./splitstore
214G    .

@LesnyRumcajs
Copy link
Member Author

@ZenGround0 Thanks for the insights! They'll definitely come in handy when we decide to go ahead with cracking this issue.

@hanabi1224 hanabi1224 linked a pull request Mar 16, 2023 that will close this issue
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Status: Needs Triage Used when an issue needs to be assigned a priority and assignee
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants