Skip to content

[FLINK-39482][filesystem][backport] Support configurable maxConnections in S3ClientProvider#27986

Merged
gaborgsomogyi merged 1 commit intoapache:release-2.3from
Samrat002:release-2.3
Apr 22, 2026
Merged

[FLINK-39482][filesystem][backport] Support configurable maxConnections in S3ClientProvider#27986
gaborgsomogyi merged 1 commit intoapache:release-2.3from
Samrat002:release-2.3

Conversation

@Samrat002
Copy link
Copy Markdown
Contributor

@Samrat002 Samrat002 commented Apr 21, 2026

What is the purpose of the change

During RocksDB state restore from S3, the NativeS3BulkCopyHelper can issue more concurrent download requests than the Netty HTTP connection pool can serve. When this happens, requests queue inside the pool and eventually hit the SDK's default 10-second connectionAcquisitionTimeout, causing Acquire operation took longer than the configured maximum acquisition time failures.

This PR fixes the issue with three changes:

  1. Expose s3.connection.max — new ConfigOption<Integer> (default 50) so operators can tune the Netty connection pool size independently of the Apache HTTP pool used for sync operations.

  2. Clamp maxConcurrentCopies to maxConnections — if the configured s3.bulk-copy.max-concurrent exceeds s3.connection.max, the helper automatically clamps it down and logs a WARN, following the same pattern used for READ_BUFFER_SIZE clamping.

  3. Raise connectionAcquisitionTimeout — align the Netty pool's acquisition timeout with the configured s3.client.connection-timeout (default 60s) instead of the SDK default of 10s, providing headroom under transient pressure.

Brief change log

  • NativeS3FileSystemFactory: add MAX_CONNECTIONS config option, factory-level validation for both MAX_CONNECTIONS and BULK_COPY_MAX_CONCURRENT
  • NativeS3BulkCopyHelper: 3-arg constructor with clamping logic, isConnectionPoolExhausted() detection, enriched error message in waitForCopies()
  • NativeS3FileSystem: add getBulkCopyHelper() package-private getter for test observability
  • S3ClientProvider: wire maxConnections and connectionAcquisitionTimeout into Netty async HTTP client builder
  • NativeS3BulkCopyHelperTest: parameterized isConnectionPoolExhausted tests, empty request list test
  • NativeS3FileSystemFactoryTest: validation tests for invalid configs and clamping behavior

Verifying this change

This change is already verified by:

  • NativeS3BulkCopyHelperTest (29 tests) — clamping logic, pool exhaustion detection, URI parsing
  • NativeS3FileSystemFactoryTest (29 tests) — invalid s3.connection.max, invalid s3.bulk-copy.max-concurrent, clamping through factory, config preserved within limits
  • E2E validated on standalone cluster (3 TaskManagers, RocksDB, checkpoints to S3) with various config combinations

Does this pull request potentially affect one of the following parts

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: yes (S3 connection pool sizing during state restore)
  • The (cluster) security: no

Documentation

  • Does this pull request introduce a new feature? yes (s3.connection.max config option)
  • If yes, how is the feature documented?

@Samrat002 Samrat002 changed the title [FLINK-39482][filesystem] Support configurable maxConnections in S3ClientProvider [FLINK-39482][filesystem][backport] Support configurable maxConnections in S3ClientProvider Apr 21, 2026
@flinkbot
Copy link
Copy Markdown
Collaborator

flinkbot commented Apr 21, 2026

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@gaborgsomogyi gaborgsomogyi merged commit cbebd3f into apache:release-2.3 Apr 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants