Skip to content

[FLINK-39166] Add bucket-level configuration support to Native S3 FileSystem#27788

Open
Samrat002 wants to merge 1 commit intoapache:masterfrom
Samrat002:FLINK-39166-multi-bucket-support
Open

[FLINK-39166] Add bucket-level configuration support to Native S3 FileSystem#27788
Samrat002 wants to merge 1 commit intoapache:masterfrom
Samrat002:FLINK-39166-multi-bucket-support

Conversation

@Samrat002
Copy link
Contributor

@Samrat002 Samrat002 commented Mar 20, 2026

What is the purpose of the change

Add bucket-level configuration support to the Native S3 FileSystem, enabling different S3 buckets to use different connection settings e.g path-style access, endpoint, credentials, SSE, assume-role. Making native s3 support closing gaps from Hadoop S3 FileSystem.

Brief change log

  • Added BucketConfigProvider to parse bucket-specific configuration from Flink config using the format s3.bucket.<bucket-name>.<property>
  • Added S3BucketConfig to hold bucket-level settings (endpoint, path-style, credentials, SSE, assume-role)
  • Added S3ClientProviderCache to cache S3 client providers per bucket configuration
  • Updated NativeS3FileSystem to use bucket-specific clients when configured, otherwise the default client
  • Updated NativeS3FileSystemFactory to create and pass BucketConfigProvider, and added CONNECTION_TIMEOUT and SOCKET_TIMEOUT options

Verifying this change

  1. This change adds tests and can be verified as follows:
  • Added BucketConfigProviderTest with tests for:
    • Parsing bucket configs from the configuration
    • Bucket config with encryption (SSE-KMS)
    • Bucket config with assume-role
    • Bucket names containing dots
    • S3BucketConfig builder
  1. Running a State job with a checkpointing bucket and a savepoint bucket different:
Screenshot 2026-03-20 at 9 02 51 AM

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: yes

Documentation

  • Does this pull request introduce a new feature? yes
  • If yes, how is the feature documented? README – new configuration options and bucket-level configuration format are documented in flink-s3-fs-native/README.md

@Samrat002
Copy link
Contributor Author

@gaborgsomogyi PTAL

@flinkbot
Copy link
Collaborator

flinkbot commented Mar 20, 2026

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@Samrat002 Samrat002 force-pushed the FLINK-39166-multi-bucket-support branch from 97c8b58 to 9445fe3 Compare March 20, 2026 06:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants