KUDU-3371 [fs] Use RocksDB to store LBM metadata #50

acelyc111 · 2023-02-20T07:33:36Z

Since LogBlockContainer store block records sequentially in metadata file, the live blocks maybe in a very low ratio, and it cause disk space wasting and long time bootstrap.

This patch use RocksDB to store LBM metadata, a new item will be Put() into RocksDB when a new block created in LBM, and the item will be Delete()d from RocksDB when the block removed from LBM. Data in RocksDB can be maintained in RocksDB itself, i.e. deleted items will be GCed so doesn't need rewriting as how we do it in current LBM.

The implemention also reuse most logic of LBM, the main difference is store Block records metadata in RocksDB.

Make LogBlockManager as a super class
The former LBM that stores metadata in a append only file, is separeted from LBM and specified a new name LogfBlockManager. Its behavior has no change.
Introduce a new class LogrBlockManager that stores metadata in RocksDB, the main idea: a. Create container Data file is created as before, metadata is stored in keys prefixed by the container's id, append the block id, e.g. <container_id>.<block_id>. Make sure there is no such keys in RocksDB before this container created. b. Open container Make sure the data file is healthy. c. Deconstruct container If the container is dead (full and no live blocks), remove the data file, and clean up keys prefixed by the container's id. d. Load container (by ProcessRecords()) Iterate the RocksDB in the key range [<container_id>, <next_container_id>), only live block records will be populated, we can use them as before. e. Create blocks in a container Put() serialized BlockRecordPB records into RocksDB in batch, keys are in form of '<container_id>.<block_id>' as mentioned above. f. Remove blocks from a container Contruct the keys by container's id and block's id, Delete() them from RocksDB in batch.
Some refactors, such as create and delete blocks in batch to reduce lock consult times.

This patch contains the following changes:

Adds a new block manager type named 'logr', it use RocksDB to store LBM metadata, it is also specified by flag '--block_manager'.
block_manager-test supports to test LogrBlockManager
block_manager-stress-test supports to test LogrBlockManager
log_block_manager-test supports to test LogrBlockManager
tablet_server-test supports to test LogrBlockManager
dense_node-itest supports to test LogrBlockManager
kudu-tool-test supports to test LogrBlockManager

It's optional to use RocksDB, we can use the former LBM as before, we can introduce more tools to convert data between the two implemention in the future.

The optimization is obvious as shown in JIRA KUDU-3371, it shows that reopen staged reduced upto 90% time cost.

Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f

Since LogBlockContainer store block records sequentially in metadata file, the live blocks maybe in a very low ratio, and it cause disk space wasting and long time bootstrap. This patch use RocksDB to store LBM metadata, a new item will be Put() into RocksDB when a new block created in LBM, and the item will be Delete()d from RocksDB when the block removed from LBM. Data in RocksDB can be maintained in RocksDB itself, i.e. deleted items will be GCed so doesn't need rewriting as how we do it in current LBM. The implemention also reuse most logic of LBM, the main difference is store Block records metadata in RocksDB. 1. Make LogBlockManager as a super class 2. The former LBM that stores metadata in a append only file, is separeted from LBM and specified a new name LogfBlockManager. Its behavior has no change. 3. Introduce a new class LogrBlockManager that stores metadata in RocksDB, the main idea: a. Create container Data file is created as before, metadata is stored in keys prefixed by the container's id, append the block id, e.g. <container_id>.<block_id>. Make sure there is no such keys in RocksDB before this container created. b. Open container Make sure the data file is healthy. c. Deconstruct container If the container is dead (full and no live blocks), remove the data file, and clean up keys prefixed by the container's id. d. Load container (by ProcessRecords()) Iterate the RocksDB in the key range [<container_id>, <next_container_id>), only live block records will be populated, we can use them as before. e. Create blocks in a container Put() serialized BlockRecordPB records into RocksDB in batch, keys are in form of '<container_id>.<block_id>' as mentioned above. f. Remove blocks from a container Contruct the keys by container's id and block's id, Delete() them from RocksDB in batch. 4. Some refactors, such as create and delete blocks in batch to reduce lock consult times. This patch contains the following changes: - Adds a new block manager type named 'logr', it use RocksDB to store LBM metadata, it is also specified by flag '--block_manager'. - block_manager-test supports to test LogrBlockManager - block_manager-stress-test supports to test LogrBlockManager - log_block_manager-test supports to test LogrBlockManager - tablet_server-test supports to test LogrBlockManager - dense_node-itest supports to test LogrBlockManager - kudu-tool-test supports to test LogrBlockManager It's optional to use RocksDB, we can use the former LBM as before, we can introduce more tools to convert data between the two implemention in the future. The optimization is obvious as shown in JIRA KUDU-3371, it shows that reopen staged reduced upto 90% time cost. Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f

acelyc111 closed this Feb 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KUDU-3371 [fs] Use RocksDB to store LBM metadata #50

KUDU-3371 [fs] Use RocksDB to store LBM metadata #50

acelyc111 commented Feb 20, 2023

KUDU-3371 [fs] Use RocksDB to store LBM metadata #50

KUDU-3371 [fs] Use RocksDB to store LBM metadata #50

Conversation

acelyc111 commented Feb 20, 2023