Implement a mark-and-sweep garbage collector #3072

lemmih · 2023-06-26T08:49:04Z

Issue summary

Context: We previously investigated the feasibility of using a mark-and-sweep garbage collector, but it was prohibitively expensive to scan values with RocksDB. New data shows that ParityDB can efficiently scan through all values. This opens the door for a more space-efficient mark-and-sweep GC.

Important caveats: We require state-roots going back 2 x chain_finality = 1800 epochs. Any data that is reachable within 1800 epochs may not be garbage collected.

The Forest database contains a persistent (i.e. immutable), directed acyclic graph. This makes implementing a mark-and-sweep garbage collector reasonably straightforward. The algorithm could look like this:

Scan through all IPLD values in the database and mark their hash value in a hash set. Collisions don't affect correctness, so the hash size matters little.
Wait for at least 1800 epochs after the scan.
Traverse the reachable graph from HEAD (including at least the last 1800 epochs) and remove reachable nodes from the hash set.
Scan the database again and remove all elements still in the hash set.

Notes:

If there is a hash collision, some data may not be garbage collected. Hash collisions will never cause reachable data to be garbage collected.
ParityDB does not store keys. This is fine in Forest since our keys are always the hash of the IPLD value.
Make sure we no longer use buffered writes and use this approach instead if applicable.

Other information and links

Some knowledge in regards to CID construction:

For the GraphDagCborBlake2b256 parityDB column we currently use this approach (V1, DAG_CBOR codec, Blake2b256 hash). We need to construct the CID manually, because this column does not have a btree_index therefore we have no way to iterate the keys. That's done due to performance reasons as DAG_CBOR is the most common type of key.
For the GraphFull column we have the btree-index so we can iterate the keys. Otherwise there is no way to know how to get the CID hash.

The text was updated successfully, but these errors were encountered:

ruseinov · 2023-06-28T13:13:15Z

After looking into this, am I correct to assume that:

With this approach we can ditch the whole old and new db concept, as we can just operate on the current db, deleting only unreachable stuff from the previously collected hash set.
We'll need to calculate hashes of IPLD values according to the specified algorithm, which is already implemented in forest.
We need to traverse the graph from HEAD to detect all the reachable nodes within 1800. The rest can be removed. Are there cases when we need more than 1800 epochs history?
We aren't currently aiming at parallel DAG traversal, as far as I can see the current implementation is a recursive function that takes care of visiting everything that's reachable.

Additionally, once the above has been implemented we could look into parallel DAG traversal if that'd make sense performance-wise.

Please correct me if I'm wrong.

hanabi1224 · 2023-06-28T13:22:26Z

IIRC, the major blocker with parity-db is that, it's impossible to retrieve raw (CID) key with its iteration API, see paritytech/parity-db#187

lemmih · 2023-06-28T13:29:07Z

After looking into this, am I correct to assume that:

1. With this approach we can ditch the whole `old and new` db concept, as we can just operate on the current db, deleting only unreachable stuff from the previously collected hash set.

Yes.

2. We'll need to calculate hashes of IPLD values according to the specified algorithm, which is already implemented in forest.

Yes. In essence, a CID is the hash of an IPLD value. But the devil is in the details. We can talk through the details in slack.

3. We need to traverse the graph from HEAD to detect all the reachable nodes within 1800. The rest can be removed. Are there cases when we need more than 1800 epochs history?

Yes, 1800 is the absolute minimum, but many node operators want more history than that. The exact number of epochs to keep should be configurable (but still with 1800 as the minimum).

4. We aren't currently aiming at parallel DAG traversal, as far as I can see the current implementation is a recursive function that takes care of visiting everything that's reachable.

Indeed. When exporting a snapshot, the key-value pairs must be emitted in depth-first order. But for the GC, a parallel traversal would be great.

Additionally, once the above has been implemented we could look into parallel DAG traversal if that'd make sense performance-wise.

Please correct me if I'm wrong.

Looks like we're on the same page.

lemmih · 2023-06-28T13:29:56Z

IIRC, the major blocker with parity-db is that, it's impossible to retrieve raw (CID) key with its iteration API, see paritytech/parity-db#187

We don't need to retrieve them. We can re-generate them from the IPLD value. It costs a bit of CPU time but isn't too bad.

ruseinov · 2023-09-26T16:23:22Z

I have got two questions:

How are we going to deal with GC progress endpoint and manual runs? I have currently opted to have EITHER manual OR automatic GC, with the current approach there is little sense to allow for both at the same time. But we can if needed.
We probably need to re-think the whole progress report thing too, as reporting GC progress now makes very little sense. What we could do is something like: 30% when we have marked all the keys, 60% when we filtered and 100% after sweeping, but it's pretty inaccurate. As far as I understand the GC progress report is a part of the filecoin rpc spec, would love some guidance here.

LesnyRumcajs · 2023-11-07T15:15:49Z

Re-opening, the feature was temporarily reverted in #3682

lemmih added the Ready Issue is ready for work and anyone can freely assign it to themselves label Jun 26, 2023

ruseinov self-assigned this Jun 27, 2023

ruseinov mentioned this issue Jun 30, 2023

Forest should be performant by design #3102

Open

4 tasks

ruseinov mentioned this issue Sep 19, 2023

[Feature] Mark-and-sweep GC. #3504

Merged

4 tasks

ruseinov closed this as completed in #3504 Nov 7, 2023

LesnyRumcajs reopened this Nov 7, 2023

ruseinov mentioned this issue Nov 8, 2023

[Feature] Mark-and-sweep GC. #3689

Merged

4 tasks

ruseinov closed this as completed in #3689 Nov 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement a mark-and-sweep garbage collector #3072

Implement a mark-and-sweep garbage collector #3072

lemmih commented Jun 26, 2023 •

edited by ruseinov

ruseinov commented Jun 28, 2023

hanabi1224 commented Jun 28, 2023

lemmih commented Jun 28, 2023

lemmih commented Jun 28, 2023

ruseinov commented Sep 26, 2023

LesnyRumcajs commented Nov 7, 2023 •

edited

Implement a mark-and-sweep garbage collector #3072

Implement a mark-and-sweep garbage collector #3072

Comments

lemmih commented Jun 26, 2023 • edited by ruseinov

ruseinov commented Jun 28, 2023

hanabi1224 commented Jun 28, 2023

lemmih commented Jun 28, 2023

lemmih commented Jun 28, 2023

ruseinov commented Sep 26, 2023

LesnyRumcajs commented Nov 7, 2023 • edited

lemmih commented Jun 26, 2023 •

edited by ruseinov

LesnyRumcajs commented Nov 7, 2023 •

edited