Skip to content

[FLINK-39484][filesystem/s3] NativeS3InputStream: graceful abort on early close; close() first for connection reuse#27965

Open
macdoor wants to merge 2 commits intoapache:masterfrom
macdoor:FLINK-39484-native-s3-input-stream-abort
Open

[FLINK-39484][filesystem/s3] NativeS3InputStream: graceful abort on early close; close() first for connection reuse#27965
macdoor wants to merge 2 commits intoapache:masterfrom
macdoor:FLINK-39484-native-s3-input-stream-abort

Conversation

@macdoor
Copy link
Copy Markdown

@macdoor macdoor commented Apr 18, 2026

What is the purpose of the change

Fix ConnectionClosedException: Premature end of Content-Length delimited message body
when reading large objects (e.g. Parquet) through flink-s3-fs-native against
S3-compatible endpoints (e.g. MinIO), especially after seek(), skip(), or when the
stream is closed before all bytes are consumed.

Brief change log

  • Seek/skip: When discarding a partially-read GetObject body after seek() or skip(),
    call ResponseInputStream.abort() immediately so the SDK does not attempt to read and
    drain the remainder of the response.
  • Close (normal path): Attempt close() first to preserve HTTP connection reuse for
    well-behaved S3 servers. If the server closes the connection early (MinIO pattern),
    ConnectionClosedException is caught, treated as non-fatal, escalated to WARN, and
    abort() is called as fallback -- because the connection is already broken and cannot be
    reused anyway, so aborting carries no additional performance penalty.
  • isPrematureEndOfMessage(IOException): helper that identifies MinIO/S3-compatible
    early-connection-close by checking for Premature end of Content-Length,
    Connection closed, or ConnectionClosed in the exception message.

This is the optimal approach:

  • No connection-pool penalty for correct S3/AWS behavior -- close() reuses connections
  • No task failures when MinIO closes early -- exception caught, WARN logged, task survives
  • Seek/skip still aborts immediately -- correct semantics regardless of server behavior

Verifying this change

Verified against MinIO with large Parquet reads and repeated seek() operations: the
ConnectionClosedException is replaced by a clean WARN log and the job completes
successfully. Azure Pipelines for flink-s3-fs-native (see CI on this PR).

Does this pull request potentially affect one of the following parts?

  • Dependencies: no
  • Public API: no
  • Serializers: no
  • Runtime per-record code paths: no
  • Deployment or recovery: no
  • The S3 file system connector: yes

Documentation

  • New feature: no

JIRA

https://issues.apache.org/jira/browse/FLINK-39484

@flinkbot
Copy link
Copy Markdown
Collaborator

flinkbot commented Apr 18, 2026

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

…tObject on seek/reopen

Fix ConnectionClosedException: Premature end of Content-Length delimited message
body when reading large Parquet objects through flink-s3-fs-native against
S3-compatible endpoints (e.g. MinIO), especially after seek() / skip() or when
the stream is closed before all bytes are consumed.

When reopenStreamAtCurrentPosition() discards a partially-read GetObject body
after seek/skip, ResponseInputStream.abort() is used immediately so the SDK does
not attempt to read and drain the remainder of the body.

For the close() path: attempt normal close() first (preserving HTTP connection
reuse for well-behaved S3 servers).  If the server closed the connection early
(a pattern seen on MinIO where it terminates the TCP connection before all
Content-Length bytes are sent), the ConnectionClosedException is non-fatal and
escalated to WARN — the connection is already broken and cannot be reused, so
falling back to abort() carries no additional performance penalty.

This is the optimal approach:
- No connection-pool penalty for correct S3/Genuine AWS behavior (close reuses)
- No task failures when MinIO closes early (exception caught, WARN logged)
- Seek/skip still aborts immediately (correct semantics regardless of server)

See: https://issues.apache.org/jira/browse/FLINK-39484
Made-with: Cursor
@macdoor macdoor force-pushed the FLINK-39484-native-s3-input-stream-abort branch from 91883a7 to 9e6576a Compare April 19, 2026 08:54
@macdoor macdoor changed the title [FLINK-39484][filesystem/s3] NativeS3InputStream: abort unfinished GetObject on seek/reopen [FLINK-39484][filesystem/s3] NativeS3InputStream: graceful abort on early close; close() first for connection reuse Apr 19, 2026
macdoor pushed a commit to macdoor/flink that referenced this pull request Apr 19, 2026
…euse; handle MinIO early close (release-2.3)

Backport of apache#27965 behavior: on seek/skip abort partial GetObject; on
close() attempt normal BufferedInputStream.close() first for HTTP connection reuse,
catch premature Content-Length / connection-closed from S3-compatible storage,
WARN and abort as non-fatal.

https: //issues.apache.org/jira/browse/FLINK-39484
Made-with: Cursor
macdoor pushed a commit to macdoor/flink that referenced this pull request Apr 19, 2026
… PR apache#27965)

Align release-2.3 with apache#27965: try close() before abort on normal
close path when bytes remain; detect premature Content-Length / connection
closed; seek still aborts immediately. Preserves bucket-root fix in 0a01cbd.

Made-with: Cursor
@spuru9
Copy link
Copy Markdown
Contributor

spuru9 commented Apr 19, 2026

@macdoor spotless is failing for the build, also there are some changes in the license header which I believe in unintended.

…veS3InputStream

Align ASF license block with Flink convention; apply Spotless (Javadoc wrap).

Made-with: Cursor
@macdoor
Copy link
Copy Markdown
Author

macdoor commented Apr 19, 2026

@spuru9 Thanks — Spotless is fixed (spotless:apply + spotless:check), and the license header in NativeS3InputStream.java is aligned with the standard ASF template.

@macdoor
Copy link
Copy Markdown
Author

macdoor commented Apr 19, 2026

@flinkbot run azure

@Samrat002
Copy link
Copy Markdown
Contributor

Thank you @macdoor for raising the patch.

i have a fundamental query associate with this change.

is the problem true for AWS S3?

As per my analysis , Premature end of Content-Length delimited message body can only occur when the server closes the TCP connection before delivering all Content-Length bytes. This is an HTTP/1.1 protocol violation.
AWS S3 does not do this. S3 always delivers exactly the number of bytes promised in Content-Length. The drain loop in Apache HttpClient's ContentLengthInputStream.close() will always succeed against real S3. It may be slow (reading GBs to discard), but it will never hit a premature EOF.

The only valid concern for real S3 is performance. I see probable improvement is draining large amounts of unread data on close() just to reuse a connection.

The SDK JavaDoc says it plainly:

"If it is not desired to read remaining data from the stream, you can explicitly abort the connection via abort() instead. This will close the underlying connection and require establishing a new HTTP connection on subsequent requests which may outweigh the cost of reading the additional data."

reference : https://github.com/aws/aws-sdk-java-v2/blob/master/core/sdk-core/src/main/java/software/amazon/awssdk/core/ResponseInputStream.java#L52

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants