-
Notifications
You must be signed in to change notification settings - Fork 478
fixes #473 avoids scanning entire table metadata for bulk import #3336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Noticed this issue again while working on #3337 and created a fix for it. Thinking this fix may be good for 2.1, it could offer nice performance benefits for bulk imports into really large tables. |
If you decide to do this against 2.1 instead, and update the PR, then please also update the target version project to 2.1.1 from 3.0.0. |
|
I asked for thoughts about merging this to 2.1 in slack. Ed brought up a good point there. I need to investigate what if any impact the changes have on the bulkv1 code. |
|
Looked at bulk import V1 in 2.1 branch and I don't think this optimization could apply to bulk V1 because there is no good way to know what files are imported to what tablets. In bulk V1 intermediate tservers do all of the work of figuring out what files go where and these tservers know something about the metadata. However the manager does not know enough to limit the clean up scans. We could make this change in 2.1, but it would only benefit bulk import v2. Bulk import V1 would continue to scan the entire table for each bulk import when doing cleanup. |
|
I rebased this onto 2.1 |
|
GitHub Actions QA checks didn't run on this PR not sure why. Going to try to add an empty commit to the PR branch to force them to run. |
No description provided.