[FLINK-39533][s3] Use abort() instead of drain on close/seek when remaining bytes exceed threshold in NativeS3InputStream by Samrat002 · Pull Request #28012 · apache/flink

Samrat002 · 2026-04-23T17:44:21Z

What is the purpose of the change

NativeS3InputStream calls ResponseInputStream.close() when releasing streams during seek(), skip(), and close() operations. Apache HttpClient's close() implementation
drains all remaining bytes from the response body to enable HTTP connection reuse. For large S3 objects where only a small portion was read (e.g., checkpoint metadata from a
multi-GB state file), this drains potentially gigabytes of data over the network — causing severe latency during checkpoint restore and seek-heavy read patterns.

The AWS SDK v2 ResponseInputStream JavaDoc explicitly recommends
calling abort() when remaining data is not needed. This PR replaces close() with abort() in the stream release path.

Brief change log

Added releaseStream() method to NativeS3InputStream that calls abort() instead of close() on the underlying ResponseInputStream, and drops the BufferedInputStream
wrapper without closing it (closing would delegate to the drain path)
openStreamAtCurrentPosition() and close() now use releaseStream() for stream cleanup
Added NativeS3InputStreamTest with 8 tests covering abort lifecycle, data correctness, position tracking, and error paths

Verifying this change

This change added tests and can be verified as follows:
Unit Test
Manually validated end-to-end on a local Flink 2.3-SNAPSHOT cluster with a stateful job writing checkpoints (up to 199MB) to S3, triggering a savepoint, restoring from it, and confirming checkpoints completed successfully after restore with zero S3/stream errors

Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): no
- The public API, i.e., is any changed class annotated with @Public(Evolving): no
- The serializers: no
- The runtime per-record code paths (performance sensitive): no
- Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: yes
- The S3 file system connector: yes
Documentation
- Does this pull request introduce a new feature? no
- If yes, how is the feature documented? not applicable
Was generative AI tooling used to co-author this PR?
- Yes (please specify the tool below)
took help from claude Sonet for 2nd commit

Samrat002 · 2026-04-23T17:45:58Z

cc: @gaborgsomogyi

flinkbot · 2026-04-23T17:52:51Z

CI report:

26d217c Azure: SUCCESS

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot run azure re-run the last Azure build

gaborgsomogyi · 2026-04-27T09:32:27Z

I think the following 2 blocks can be flipped (lazyInitialize can come after the if confition) to cover an edge case. We can say 100% sure we've not read anything when position >= contentLength. We don't need any stream to say it, right? If you agree then there are multiple occasions

            lazyInitialize();
            if (position >= contentLength) {
                return -1;
            }

gaborgsomogyi · 2026-04-27T09:50:07Z

I think seek(contentLength) triggers S3 416 when a stream is open.

When seeking to exactly contentLength (a valid EOF-boundary seek) with an open stream, the current code calls openStreamAtCurrentPosition() unconditionally. This issues a range request bytes=contentLength-, which starts one byte past the last byte. Per RFC 7233, this range is unsatisfiable and S3 returns 416.

The fix: when seeking to contentLength, release the existing stream but do not reopen it. Any subsequent read() will return -1 immediately by the position check, so an open stream at EOF serves no purpose.

  if (desired != position) {                                                                                                                                                                                                                        
      position = desired;                                                                                                                                                                                                                           
      if (currentStream != null) {                                                                                                                                                                                                                  
          if (desired >= contentLength) {                                                                                                                                                                                                           
              releaseStreams();                                                                                                                                                                                                                     
          } else {                                                                                                                                                                                                                                  
              openStreamAtCurrentPosition();                                                                                                                                                                                                        
          }                                                                                                                                                                                                                                         
      }                                                                                                                                                                                                                                             
  }

releaseStreams() is called directly (not via openStreamAtCurrentPosition()) because seek() already holds the lock. The >= guard instead of == is defensive — desired > contentLength is already rejected earlier, but this makes the branch self-contained against future refactoring.

The same root cause exists in skip() and in read()/read(byte[],int,int) where lazyInitialize() is called before the EOF position check - but those are separate fixes.

…aining bytes exceed threshold in NativeS3InputStream

…m and addres to s3 returning 416

Samrat002 force-pushed the FLINK-39533 branch from 5a70284 to 2badf0b Compare April 23, 2026 17:56

gaborgsomogyi reviewed Apr 24, 2026

View reviewed changes

Samrat002 force-pushed the FLINK-39533 branch from fea4a68 to f36be74 Compare April 24, 2026 13:33

Samrat002 requested a review from gaborgsomogyi April 24, 2026 13:36

gaborgsomogyi reviewed Apr 24, 2026

View reviewed changes

Comment thread ...stems/flink-s3-fs-native/src/main/java/org/apache/flink/fs/s3native/NativeS3InputStream.java Outdated

gaborgsomogyi reviewed Apr 24, 2026

View reviewed changes

Comment thread ...stems/flink-s3-fs-native/src/main/java/org/apache/flink/fs/s3native/NativeS3InputStream.java

Samrat002 force-pushed the FLINK-39533 branch from 5bfbbdd to 4abd92f Compare April 24, 2026 18:42

Samrat002 requested a review from gaborgsomogyi April 24, 2026 18:55

Samrat002 added 2 commits April 27, 2026 15:57

[FLINK-39533][s3] Use abort() instead of drain on close/seek when rem…

97c574e

…aining bytes exceed threshold in NativeS3InputStream

[FLINK-39533][s3] Address to review comments. Bug at the end of strea…

26d217c

…m and addres to s3 returning 416

Samrat002 force-pushed the FLINK-39533 branch from 4abd92f to 26d217c Compare April 27, 2026 12:37

gaborgsomogyi approved these changes Apr 27, 2026

View reviewed changes

gaborgsomogyi merged commit 3d6dff7 into apache:master Apr 27, 2026

Samrat002 mentioned this pull request Apr 27, 2026

[FLINK-39533][s3][backport] Use abort() instead of drain on close/seek when remaining bytes exceed threshold in NativeS3InputStream #28052

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-39533][s3] Use abort() instead of drain on close/seek when remaining bytes exceed threshold in NativeS3InputStream#28012

[FLINK-39533][s3] Use abort() instead of drain on close/seek when remaining bytes exceed threshold in NativeS3InputStream#28012
gaborgsomogyi merged 2 commits into
apache:masterfrom
Samrat002:FLINK-39533

Samrat002 commented Apr 23, 2026 •

edited

Loading

Uh oh!

Samrat002 commented Apr 23, 2026

Uh oh!

flinkbot commented Apr 23, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gaborgsomogyi commented Apr 27, 2026

Uh oh!

gaborgsomogyi commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Samrat002 commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

Was generative AI tooling used to co-author this PR?

Uh oh!

Samrat002 commented Apr 23, 2026

Uh oh!

flinkbot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI report:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gaborgsomogyi commented Apr 27, 2026

Uh oh!

gaborgsomogyi commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Samrat002 commented Apr 23, 2026 •

edited

Loading

flinkbot commented Apr 23, 2026 •

edited

Loading