Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk import scans all table metadata when removing load flags. #473

Closed
keith-turner opened this issue May 7, 2018 · 1 comment · Fixed by #3336
Closed

Bulk import scans all table metadata when removing load flags. #473

keith-turner opened this issue May 7, 2018 · 1 comment · Fixed by #3336
Labels
enhancement This issue describes a new feature, improvement, or optimization.
Milestone

Comments

@keith-turner
Copy link
Contributor

This is follow up work to #436. When bulk import is complete, load flags are removed. To do this all table metadata is scanned. Could just scan the metadata range that was bulk imported to.

@keith-turner keith-turner added v2.0.0 enhancement This issue describes a new feature, improvement, or optimization. labels May 7, 2018
@keith-turner
Copy link
Contributor Author

keith-turner commented Oct 31, 2018

I looked into doing this. I was thinking I could just use the min and max rows from the load mapping to scan the metadata table. However if a tablet at the min or max row has split then it may copy some load flags outside of the range. There are three possible ways to avoid this :

  1. Ignore this edge case because metadata compaction will eventually clean up load flags.
  2. Make the split code smarter about copying load flags. Currently all load flags go to both children, even if the loaded file does not. This may be impossible though for the case the file was compacted away.
  3. Get the min and max row in the metadata table at the beginning of the fate operation and use these rows to limit the cleanup at the end. Since the min and max would be recorded before anything is loaded splits occurring after/during the fate load step would fall within this range. The could be done in the first fate step when it checks for merges.

Option 3 allows the final fate step to avoid opening the load mapping file to find the min and max row.

keith-turner added a commit to keith-turner/accumulo that referenced this issue Apr 24, 2023
keith-turner added a commit to keith-turner/accumulo that referenced this issue May 1, 2023
@ctubbsii ctubbsii added this to the 2.1.1 milestone Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement This issue describes a new feature, improvement, or optimization.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants