[fix](s3) Add anonymous credential fallback for S3 TVF and Broker Load on public buckets#60515
[fix](s3) Add anonymous credential fallback for S3 TVF and Broker Load on public buckets#60515dataroaring wants to merge 2 commits intomasterfrom
Conversation
When Doris runs on an instance with an IAM role, the default AWS credential chain picks up instance profile credentials before reaching the anonymous fallback. If that role lacks s3:ListBucket on a public bucket, the S3 TVF query fails with 403. Add retry-with-anonymous logic in S3TableValuedFunction: when parseFile() fails with 403 and no explicit credentials (access_key, secret_key, or role_arn) were provided, switch to ANONYMOUS credentials and retry. All three property maps (storageProperties, backendConnectProperties, processedParams) are updated so both FE listing and BE data reading use anonymous access. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
There was a problem hiding this comment.
Pull request overview
This PR adds automatic fallback to anonymous S3 credentials when accessing public S3 buckets from Doris instances running with IAM roles. The issue occurs because the AWS credential chain picks up instance profile credentials before trying anonymous access, causing 403 errors on public buckets.
Changes:
- Adds retry-with-anonymous logic in S3TableValuedFunction that triggers on 403 errors when no explicit credentials are provided
- Updates all three property maps (storageProperties, backendConnectProperties, processedParams) to use anonymous credentials during retry
- Includes comprehensive unit tests covering all edge cases (explicit credentials, role ARN, retry failures, non-403 errors)
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| fe/fe-core/src/main/java/org/apache/doris/tablefunction/S3TableValuedFunction.java | Implements the anonymous credential fallback logic with retry mechanism and credential checks |
| fe/fe-core/src/test/java/org/apache/doris/tablefunction/S3TableValuedFunctionTest.java | Adds comprehensive test coverage for all anonymous fallback scenarios |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| parseFile(); | ||
| } catch (AnalysisException e) { | ||
| if (shouldRetryWithAnonymous(e)) { | ||
| LOG.info("S3 TVF got 403 with no explicit credentials, retrying with anonymous access"); |
There was a problem hiding this comment.
The code uses LOG (at lines 67 and 71) which is inherited from the parent class ExternalFileTableValuedFunction. This means log messages will be associated with the parent class name rather than S3TableValuedFunction. For consistency with other table-valued functions in the codebase (e.g., HdfsTableValuedFunction defines its own LOG at line 38), consider adding a static LOG field specific to this class.
|
run buildall |
TPC-H: Total hot run time: 30972 ms |
ClickBench: Total hot run time: 28.63 s |
282d5f8 to
f35e2df
Compare
|
run buildall |
… S3 buckets When Doris runs on an instance with an IAM role, the default AWS credential chain picks up instance profile credentials before reaching the anonymous fallback. If that role lacks s3:ListBucket on a public bucket, Broker Load fails with 403 during the pending task file listing. Add retry-with-anonymous logic in BrokerLoadPendingTask.getAllFileStatus(): when BrokerUtil.parseFile() fails with 403 and no explicit credentials (access_key, secret_key, or role_arn) were provided, switch to ANONYMOUS credentials and retry. Both the pending task's and parent job's brokerDesc are updated so the BE scan phase also uses anonymous access. Extract the shared 403 detection and anonymous BrokerDesc creation logic into BrokerDesc.isS3AccessDeniedWithoutExplicitCredentials() and BrokerDesc.withAnonymousCredentials(), consolidating the duplicate code from S3TableValuedFunction. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
f35e2df to
0babf23
Compare
TPC-H: Total hot run time: 30812 ms |
ClickBench: Total hot run time: 28.03 s |
|
run buildall |
TPC-H: Total hot run time: 31330 ms |
ClickBench: Total hot run time: 28.4 s |
FE Regression Coverage ReportIncrement line coverage |
Summary
S3TableValuedFunction(TVF) andBrokerLoadPendingTask(Broker Load): whenparseFile()fails with 403 and no explicit S3 credentials were provided, switch toANONYMOUScredentials and retrybrokerDescso the downstream BE scan phase also uses anonymous accessBrokerDesc.isS3AccessDeniedWithoutExplicitCredentials()andBrokerDesc.withAnonymousCredentials(), consolidating logic used by both pathsTest plan
S3TableValuedFunctionTest— TVF anonymous fallback tests (403 no creds, explicit creds, both fail, non-403)BrokerLoadPendingTaskTest.testAnonymousFallbackOn403NoCredentials— 403 with no credentials triggers anonymous fallback, both task and job brokerDesc updatedBrokerLoadPendingTaskTest.testNoFallbackWhenExplicitCredentials— 403 with explicit access_key/secret_key does not trigger fallbackBrokerLoadPendingTaskTest.testOriginalErrorThrownWhenBothAttemptsFail— when anonymous retry also fails, original error is thrownBrokerLoadPendingTaskTest.testNoFallbackOnNon403Error— non-403 errors (e.g. 404) do not trigger fallback🤖 Generated with Claude Code