-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-5345] Avoid fs.exists calls for metadata table in HFileBootstrapIndex #7404
Merged
codope
merged 1 commit into
apache:master
from
yihua:HUDI-5345-skip-fs-exist-mt-bootstrap-index
Dec 8, 2022
Merged
[HUDI-5345] Avoid fs.exists calls for metadata table in HFileBootstrapIndex #7404
codope
merged 1 commit into
apache:master
from
yihua:HUDI-5345-skip-fs-exist-mt-bootstrap-index
Dec 8, 2022
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
codope
added
priority:critical
production down; pipelines stalled; Need help asap.
release-0.12.2
Patches targetted for 0.12.2
metadata
metadata table
labels
Dec 7, 2022
codope
approved these changes
Dec 8, 2022
nsivabalan
pushed a commit
that referenced
this pull request
Dec 13, 2022
alexeykudinkin
pushed a commit
to onehouseinc/hudi
that referenced
this pull request
Dec 14, 2022
alexeykudinkin
pushed a commit
to onehouseinc/hudi
that referenced
this pull request
Dec 14, 2022
alexeykudinkin
pushed a commit
to onehouseinc/hudi
that referenced
this pull request
Dec 14, 2022
alexeykudinkin
pushed a commit
to onehouseinc/hudi
that referenced
this pull request
Dec 14, 2022
alexeykudinkin
pushed a commit
to onehouseinc/hudi
that referenced
this pull request
Dec 14, 2022
alexeykudinkin
pushed a commit
to onehouseinc/hudi
that referenced
this pull request
Dec 14, 2022
alexeykudinkin
pushed a commit
that referenced
this pull request
Dec 14, 2022
fengjian428
pushed a commit
to fengjian428/hudi
that referenced
this pull request
Apr 5, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
metadata
metadata table
priority:critical
production down; pipelines stalled; Need help asap.
release-0.12.2
Patches targetted for 0.12.2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Change Logs
When instantiating the file system view of Hudi, the
HFileBootstrapIndex
is also instantiated, which includes twofs.exists
calls to check if the bootstrap index is present. This can be completely avoided for the file system view built for reading the metadata table, as the metadata table never uses a bootstrap index.This PR adds a check on the base path of the table in
HFileBootstrapIndex
and avoids thefs.exists
calls if it is a metadata table.Below is an example log from Presto showing the FS calls to S3 when instantiating
HFileBootstrapIndex
.Impact
This PR avoids
fs.exists
calls and reduces latency for instantiating the file system view for the metadata table. For S3 as the storage, 3 requests are avoided, as shown above, which saves at least 40ms.This affects the file listing of partitions based on the metadata table in Presto Hive and Hudi connectors. This performance fix shaves 10+ seconds for listing ~1800 partitions in a Presto query with metadata table enabled.
Risk level
low
Documentation Update
N/A
Contributor's checklist