[Fix](storage)Unify globList Implementation Using AWS SDK and Optimize S3TVF Handling#49596
Merged
CalvinKirs merged 4 commits intoapache:branch-refactor_propertyfrom Mar 28, 2025
Conversation
Previously, the globList implementation used two different protocols for object storage access, leading to inconsistencies between the Frontend (FE) and Backend (BE). To resolve this issue, we are migrating globList to use the native AWS SDK, ensuring a unified access approach across both FE and BE. This change reduces protocol discrepancies, improves maintainability, and is expected to offer performance benefits (to be validated via benchmarking). Additionally, we have adjusted the S3 Table-Valued Function (S3TVF) handling of region and endpoint. Instead of explicitly specifying these parameters, they are now extracted directly from the S3 URL. As a result, we have rolled back the previous commit that introduced explicit region and endpoint settings. However, we still need to discuss whether similar changes should be applied consistently across other parts of the system. ### Changes - Migrated globList to AWS SDK Native Implementation - Replaced the existing implementation with AWS SDK’s listObjectsV2 API to ensure consistency across object storage operations. - Eliminated the need to maintain two different protocols for listing objects. - Improved alignment between FE and BE storage access. Fix S3 storage
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
6f4568e
into
apache:branch-refactor_property
11 of 12 checks passed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Background
Previously, the globList implementation used two different protocols for object storage access, leading to inconsistencies between the Frontend (FE) and Backend (BE). To resolve this issue, we are migrating globList to use the native AWS SDK, ensuring a unified access approach across both FE and BE. This change reduces protocol discrepancies, improves maintainability, and is expected to offer performance benefits (to be validated via benchmarking).
Additionally, we have adjusted the S3 Table-Valued Function (S3TVF) handling of region and endpoint. Instead of explicitly specifying these parameters, they are now extracted directly from the S3 URL. As a result, we have rolled back the previous commit that introduced explicit region and endpoint settings. However, we still need to discuss whether similar changes should be applied consistently across other parts of the system.
Changes
Migrated globList to AWS SDK Native Implementation
Replaced the existing implementation with AWS SDK’s listObjectsV2 API to ensure consistency across object storage operations.
Eliminated the need to maintain two different protocols for listing objects.
Improved alignment between FE and BE storage access.
Fix S3 storage
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)