Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maintain BlockHandle of meta blocks by BlockManager #5699

Merged
merged 3 commits into from
Dec 22, 2022

Conversation

Hzc492
Copy link
Contributor

@Hzc492 Hzc492 commented Dec 14, 2022

This PR fix #5698

A map from block_id to shared_ptr<BlockHandle> can be maintained by BlockManager, making the BlockHandle of meta blocks not freed even when MetaBlockReader is freed.

To recover a TPC-H(SF=100) checkpoint (27GB), the running time of function SingleFileStorageManager::LoadDatabase:

latest master this PR
5.4s 0.47s

There is a speedup of more than 10x. (Note that this experiment is performed when direct_io=false. For direct_io=true, there will be a much greater improvement.)

Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Looks great. Nice investigation.

One comment below - otherwise could you just have a look at the failing CI? It looks like a few of the tests run into internal errors after this change.

@@ -268,6 +268,12 @@ shared_ptr<BlockHandle> BlockManager::RegisterBlock(block_id_t block_id) {
return result;
}

shared_ptr<BlockHandle> BlockManager::RegisterMetaBlock(block_id_t block_id) {
auto handle = RegisterBlock(block_id);
meta_blocks[block_id] = handle;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cache will always persist - perhaps we should clear it after the metadata is read (by calling meta_blocks.clear() after the initial load of a database is completed)?

@Hzc492 Hzc492 marked this pull request as draft December 16, 2022 04:43
@Hzc492 Hzc492 marked this pull request as ready for review December 16, 2022 11:07
@Hzc492
Copy link
Contributor Author

Hzc492 commented Dec 16, 2022

Thanks for the PR! Looks great. Nice investigation.

One comment below - otherwise could you just have a look at the failing CI? It looks like a few of the tests run into internal errors after this change.

Thanks for the comments! Some modifications are made:

  1. clear meta_blocks after completing the load of a database, passing unittests.
  2. move updates of meta_blocks into the protection of blocks_lock

@Mytherin Mytherin merged commit ec4a460 into duckdb:master Dec 22, 2022
@Mytherin
Copy link
Collaborator

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MetaBlocks in buffer cannot be reused
2 participants