-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[Enhancement](iceberg) Support AWS default credentials chain for S3 Tables and REST catalogs #59893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[Enhancement](iceberg) Support AWS default credentials chain for S3 Tables and REST catalogs #59893
Conversation
…ables and REST catalogs Currently, Doris requires explicit AWS credentials for both S3 Tables (iceberg.catalog.type=s3tables) and REST catalogs with SigV4 authentication. This prevents users from leveraging AWS default credentials chain which supports environment variables, EC2 instance profile, EKS IRSA, and ECS container credentials. This change: - Modifies CustomAwsCredentialsProvider to fall back to DefaultCredentialsProvider when explicit credentials are not provided - Adds support for session tokens (temporary credentials) - Updates IcebergRestProperties to only set credential properties when values are explicitly provided, allowing Iceberg to use the default credentials chain No breaking changes: explicit credentials continue to work as before. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
TPC-H: Total hot run time: 32054 ms |
TPC-DS: Total hot run time: 173870 ms |
ClickBench: Total hot run time: 26.84 s |
FE Regression Coverage ReportIncrement line coverage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks :)
We have this factory class org/apache/doris/datasource/property/common/AwsCredentialsProviderFactory.java.
In practice, we prefer users to explicitly configure how AWS credentials are resolved (e.g. via EC2 or other mechanisms).
Only when no credential-related configuration is provided should we fall back to the default provider chain.
This is consistent with the existing S3 behavior, where users can explicitly declare the credential provider through s3.credentials_provider_type.
|
Hi @CalvinKirs , |
…ing patterns Address PR review feedback by using Doris's AwsCredentialsProviderFactory instead of custom CustomAwsCredentialsProvider class. Changes: - Remove CustomAwsCredentialsProvider in favor of existing patterns - Add credentials-provider-type property for both REST and S3Tables catalogs - Add assume-role.arn and assume-role.external-id support for cross-account access - Use StaticCredentialsProvider for explicit credentials - Use AwsCredentialsProviderFactory.getV2ClassName() for other modes This aligns with the existing AWS credentials handling patterns in Doris and provides a consistent experience across different catalog types. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
@CalvinKirs Thank you for the review feedback! I've refactored the implementation to use Doris's existing Why we removed New approach:
New configuration properties:
Testing:
The implementation now follows the same patterns used elsewhere in Doris for AWS credentials handling, giving users explicit control while maintaining backward compatibility with existing configurations. |
|
run buildall |
TPC-H: Total hot run time: 31137 ms |
TPC-DS: Total hot run time: 174317 ms |
ClickBench: Total hot run time: 26.71 s |
FE UT Coverage ReportIncrement line coverage |
| @ConnectorProperty(names = {"iceberg.rest.credentials-provider-type"}, | ||
| required = false, | ||
| description = "The AWS credentials provider type for REST catalog authentication. " | ||
| + "Options: DEFAULT, ENV, SYSTEM_PROPERTIES, WEB_IDENTITY, CONTAINER, INSTANCE_PROFILE. " | ||
| + "When explicit credentials (access-key-id/secret-access-key) are provided, they take precedence. " | ||
| + "When no explicit credentials are provided, this determines how credentials are resolved.") | ||
| private String icebergRestCredentialsProviderType = ""; | ||
|
|
||
| @ConnectorProperty(names = {"iceberg.rest.assume-role.arn", "s3.role_arn"}, | ||
| required = false, | ||
| description = "The IAM role ARN to assume for cross-account access. " | ||
| + "When set, uses STS AssumeRole to get temporary credentials.") | ||
| private String icebergRestAssumeRoleArn = ""; | ||
|
|
||
| @ConnectorProperty(names = {"iceberg.rest.assume-role.external-id", "s3.external_id"}, | ||
| required = false, | ||
| description = "The external ID for STS AssumeRole, used for cross-account access security.") | ||
| private String icebergRestAssumeRoleExternalId = ""; | ||
|
|
||
| private AwsCredentialsProviderMode awsCredentialsProviderMode; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use a storage properties object (i.e., S3Properties) to receive these attributes, which allows us to unify property handling. For your reference, this is similar to how org.apache.doris.datasource.property.metastore.IcebergRestProperties#toS3FileIOProperties works. The same approach applies to S3Tables as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion! I've refactored IcebergRestProperties to use S3Properties for unified credential handling, following the same pattern as IcebergS3TablesMetaStoreProperties.
Changes made:
- Added
S3Properties s3Propertiesfield, initialized viaS3Properties.of(origProps)ininitNormalizeAndCheckProps() - Removed the dedicated
icebergRestAccessKeyIdandicebergRestSecretAccessKey@ConnectorPropertyfields - Updated
addGlueRestCatalogProperties()to get credentials froms3Properties.getAccessKey()ands3Properties.getSecretKey()
Benefits:
- Unified property handling across REST catalog and S3Tables catalog
- Users can now use either
iceberg.rest.access-key-idors3.access_key(and other S3Properties aliases) for credentials - Explicit credentials are optional - the code gracefully falls back to assume role or credentials provider
Updated tests to verify the new unified handling behavior.
…in IcebergRestProperties Refactor IcebergRestProperties to use S3Properties for AWS credential handling, following the same pattern as IcebergS3TablesMetaStoreProperties. This unifies property handling across both REST catalog and S3Tables catalog. Changes: - Add S3Properties field and initialize via S3Properties.of(origProps) - Remove dedicated icebergRestAccessKeyId/icebergRestSecretAccessKey fields - Get credentials from s3Properties.getAccessKey()/getSecretKey() - Update tests to reflect the new unified handling behavior This allows users to use either iceberg.rest.* or s3.* property names for credentials, and makes explicit credentials optional (can use assume role or credentials provider instead). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
run buildall |
TPC-H: Total hot run time: 31274 ms |
TPC-DS: Total hot run time: 173875 ms |
ClickBench: Total hot run time: 26.77 s |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
Proposed changes
Currently, Doris requires explicit AWS credentials for both:
iceberg.catalog.type = s3tables(S3 Tables native catalog)iceberg.catalog.type = restwith SigV4 authenticationThis prevents users from leveraging AWS's default credentials chain, which supports:
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY)Root Cause
The Java Frontend was passing empty strings to the AWS SDK instead of letting Iceberg use the default credentials chain:
CustomAwsCredentialsProvideronly supported explicit credentialsIcebergRestPropertiesalways setrest.access-key-idandrest.secret-access-keyeven when emptySolution
For S3 Tables catalog:
Modified
CustomAwsCredentialsProviderto fall back to AWS SDK'sDefaultCredentialsProviderwhen explicit credentials are not provided.For REST catalog:
Modified
IcebergRestPropertiesto only setrest.access-key-idandrest.secret-access-keywhen explicitly provided, allowing Iceberg to use the default credentials chain.Changes
fe/fe-core/.../iceberg/s3tables/CustomAwsCredentialsProvider.javaDefaultCredentialsProviderfe/fe-core/.../property/metastore/IcebergRestProperties.javaUsage
S3 Tables (without explicit credentials):
CREATE CATALOG lakehouse PROPERTIES ( 'type'= 'iceberg', 'iceberg.catalog.type'= 's3tables', 'warehouse'= 'arn:aws:s3tables:eu-west-1:123456789:bucket/my-bucket', 's3.region'= 'eu-west-1' );REST catalog with SigV4 (without explicit credentials):
CREATE CATALOG lakehouse PROPERTIES ( 'type'= 'iceberg', 'iceberg.catalog.type'= 'rest', 'iceberg.rest.uri'= 'https://s3tables.eu-west-1.amazonaws.com/iceberg', 'iceberg.rest.sigv4-enabled'= 'true', 'iceberg.rest.signing-name'= 's3tables', 'iceberg.rest.signing-region'= 'eu-west-1', 's3.region'= 'eu-west-1' );Compatibility
Further comments
The C++ Backend already supports IAM roles for S3 file access. This PR only addresses the Java Frontend catalog initialization issue.
🤖 Generated with Claude Code