branch-4.1: [improvement](cloud) Support configurable S3 credentials providers #62788#63680
branch-4.1: [improvement](cloud) Support configurable S3 credentials providers #62788#63680wyxxxcat wants to merge 1 commit into
Conversation
…pache#62788) Support configuring the AWS S3 credentials provider type for cloud storage vaults and S3 storage parameters. This allows IAM role based access to use providers other than instance profile, such as environment, system properties, web identity, container, default chain, and anonymous credentials. The credential provider type is propagated through FE properties, thrift/proto definitions, meta service storage vault handling, and cloud recycler S3 accessor creation. ``` CREATE storage vault IF NOT EXISTS s3_vault_2 PROPERTIES ( "type"="S3", "s3.endpoint"="xxx", "s3.role_arn" = "xxx", "s3.region" = "us-west-1", "s3.root.path" = "xxx", "s3.bucket" = "xxx", "provider" = "S3", "use_path_style" = "false", "s3.credentials_provider_type" = "WEB_IDENTITY", "s3_validity_check" = "false" ); ``` ``` curl "127.0.0.1:5678/MetaService/http/get_obj_store_info ``` ``` { "id": "3", "name": "s3_vault_2", "obj_info": { "ctime": "1777011933", "mtime": "1777011933", "id": "3", "bucket": "xxx", "prefix": "wyx_test", "endpoint": "xxx", "region": "us-west-1", "provider": "S3", "external_endpoint": "", "sse_enabled": true, "use_path_style": false, "cred_provider_type": "WEB_IDENTITY", "role_arn": "xxx", "external_id": "" } } ```
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
|
/review |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
|
/review |
FE Regression Coverage ReportIncrement line coverage |
1 similar comment
FE Regression Coverage ReportIncrement line coverage |
There was a problem hiding this comment.
Review summary:
I found blocking issues in the new configurable S3 credentials provider support. The AWS_* alias is only wired for the static cloud/thrift conversion path, so FE object-bound paths can ignore AWS_CREDENTIALS_PROVIDER_TYPE. Also, the newly exposed CONTAINER provider only handles AWS_CONTAINER_CREDENTIALS_RELATIVE_URI, while ECS/Fargate can use AWS_CONTAINER_CREDENTIALS_FULL_URI plus AWS_CONTAINER_AUTHORIZATION_TOKEN; this affects both the cloud recycler path and the BE thrift path made reachable by this PR.
Critical checkpoints:
- Existing inline threads: none were present, so no duplicate comments were submitted.
- User focus: no additional focus was provided.
- Repository AGENTS/code-review guidance: the only applicable AGENTS.md found is for Nereids expression legality checks; this PR does not touch those files, so that checkpoint is not applicable.
- Backward compatibility / runtime behavior: request changes because valid documented AWS credential-provider configurations can be silently ignored or fail at runtime.
| @@ -468,6 +469,7 @@ public static class Env { | |||
|
|
|||
| public static final String ROLE_ARN = "AWS_ROLE_ARN"; | |||
| public static final String EXTERNAL_ID = "AWS_EXTERNAL_ID"; | |||
There was a problem hiding this comment.
This adds the AWS_* spelling for the static cloud/thrift conversion helpers, but the object-bound S3Properties path still does not bind that key because the @ConnectorProperty for awsCredentialsProviderType only lists s3.credentials_provider_type and glue.credentials_provider_type. A user creating S3 properties with AWS_ROLE_ARN plus AWS_CREDENTIALS_PROVIDER_TYPE=env goes through StorageProperties.createPrimary, leaves awsCredentialsProviderType at DEFAULT, and getBackendConfigProperties()/FE STS validation use the wrong provider even though getObjStoreInfoPB() would accept it. Please add this alias to the connector property binding (or normalize before binding) and cover the createPrimary/backend-config path with an AWS_* test.
| return std::make_shared<Aws::Auth::ProfileConfigFileAWSCredentialsProvider>(); | ||
| case CredProviderType::WebIdentity: | ||
| return std::make_shared<Aws::Auth::STSAssumeRoleWebIdentityCredentialsProvider>(); | ||
| case CredProviderType::Container: |
There was a problem hiding this comment.
The explicit CONTAINER provider only looks at AWS_CONTAINER_CREDENTIALS_RELATIVE_URI. ECS/Fargate can provide credentials via AWS_CONTAINER_CREDENTIALS_FULL_URI (with optional AWS_CONTAINER_AUTHORIZATION_TOKEN), and the existing CustomAwsCredentialsProviderChain in this repo already handles both forms. With this implementation, a vault configured with s3.credentials_provider_type=CONTAINER on a FULL_URI-only task constructs TaskRoleCredentialsProvider with an empty relative path and the recycler cannot obtain credentials. Please mirror the relative/full URI selection used by CustomAwsCredentialsProviderChain here.
| return CredProviderType::SystemProperties; | ||
| case TCredProviderType::WEB_IDENTITY: | ||
| return CredProviderType::WebIdentity; | ||
| case TCredProviderType::CONTAINER: |
There was a problem hiding this comment.
Exposing TCredProviderType::CONTAINER makes FE able to send this mode to BE, but BE's S3ClientFactory::_create_credentials_provider container branch only supports AWS_CONTAINER_CREDENTIALS_RELATIVE_URI and ignores AWS_CONTAINER_CREDENTIALS_FULL_URI/AWS_CONTAINER_AUTHORIZATION_TOKEN. That means the new thrift value fails for ECS/Fargate deployments that use the full-URI credential endpoint. Please update the BE provider creation before enabling this enum mapping, and add a full-URI test alongside the relative-URI case.
pick: #62788