Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a way to build an index without blocking concurrent data modifications #7967

Open
hvlad opened this issue Jan 16, 2024 · 3 comments
Assignees

Comments

@hvlad
Copy link
Member

hvlad commented Jan 16, 2024

Currently, when index is build, any modifications of table data is not allowed. This is required to create correct index not missing new keys inserted during the build.

The time when such read lock is required could be significantly shortened.

@hvlad hvlad self-assigned this Jan 16, 2024
@aafemt
Copy link
Contributor

aafemt commented Jan 16, 2024

Time can be shortened up to zero, IMHO. Consider this:

  1. Index get state "active but unusable".
  2. DML operations work with it as usual but selects ignore it.
  3. Background table scan is running to add any missing node.

@hvlad
Copy link
Member Author

hvlad commented Jan 17, 2024

This way creates less dense b-tree and could be much slower than our fast_load().

Currently I thinking on combined approach:
at 1st stage engine build "main" b-tree using table snapshot and fast_load(), user attachments maintains separate ("small") b-tree with usual DML activity;
at 2nd stage "main" b-tree is "published" as index b-tree and maintained by user attachments as usual, engine merges "small" b-tree into "main" b-tree;
after merge finishes, index is allowed to use in SELECT's.

Not sure how to handle deletion of index keys on 1st stage.

@aafemt
Copy link
Contributor

aafemt commented Jan 17, 2024

This way creates less dense b-tree and could be much slower than our fast_load().

Yes, this is the price for uninterrupted DB operations which may be acceptable.

Not sure how to handle deletion of index keys on 1st stage.

1st stage is running in snapshot mode so garbage collection is blocked and node deletions shouldn't occur at all, no?

@dyemanov dyemanov changed the title Implement a way to build index without blocking of data modifications Implement a way to build an index without blocking concurrent data modifications Jan 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants