Skip to content

Conversation

@keith-turner
Copy link
Contributor

No description provided.

@keith-turner
Copy link
Contributor Author

Noticed this issue again while working on #3337 and created a fix for it. Thinking this fix may be good for 2.1, it could offer nice performance benefits for bulk imports into really large tables.

@ctubbsii
Copy link
Member

Noticed this issue again while working on #3337 and created a fix for it. Thinking this fix may be good for 2.1, it could offer nice performance benefits for bulk imports into really large tables.

If you decide to do this against 2.1 instead, and update the PR, then please also update the target version project to 2.1.1 from 3.0.0.

@keith-turner
Copy link
Contributor Author

I asked for thoughts about merging this to 2.1 in slack. Ed brought up a good point there. I need to investigate what if any impact the changes have on the bulkv1 code.

@keith-turner
Copy link
Contributor Author

Looked at bulk import V1 in 2.1 branch and I don't think this optimization could apply to bulk V1 because there is no good way to know what files are imported to what tablets. In bulk V1 intermediate tservers do all of the work of figuring out what files go where and these tservers know something about the metadata. However the manager does not know enough to limit the clean up scans.

We could make this change in 2.1, but it would only benefit bulk import v2. Bulk import V1 would continue to scan the entire table for each bulk import when doing cleanup.

@keith-turner keith-turner changed the base branch from main to 2.1 May 1, 2023 14:36
@keith-turner
Copy link
Contributor Author

keith-turner commented May 1, 2023

I rebased this onto 2.1

@ctubbsii
Copy link
Member

ctubbsii commented May 1, 2023

GitHub Actions QA checks didn't run on this PR not sure why. Going to try to add an empty commit to the PR branch to force them to run.
EDIT: Nevermind. Closing and reopening triggered the QA checks.

@ctubbsii ctubbsii closed this May 1, 2023
@ctubbsii ctubbsii reopened this May 1, 2023
@keith-turner keith-turner merged commit e776715 into apache:2.1 May 9, 2023
@ctubbsii ctubbsii linked an issue May 10, 2023 that may be closed by this pull request
@ctubbsii ctubbsii added this to the 2.1.1 milestone Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bulk import scans all table metadata when removing load flags.

3 participants