Skip to content

[fix](fe) Fix broken pipe risk on stream load redirect with unconsumed request body#63381

Closed
wenzhenghu wants to merge 23 commits into
apache:branch-3.1from
HYDCP:branch-3.1-stream-load-drain-request-body
Closed

[fix](fe) Fix broken pipe risk on stream load redirect with unconsumed request body#63381
wenzhenghu wants to merge 23 commits into
apache:branch-3.1from
HYDCP:branch-3.1-stream-load-drain-request-body

Conversation

@wenzhenghu
Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: close #63325

Problem Summary:

Problem

  • Starting from Doris 3.1.3, FE uses Jetty 12, and this introduced a compatibility change in the Stream Load redirect path.
  • When a Stream Load request is sent to FE, FE may return 307 Temporary Redirect before the request body is fully consumed. Under Jetty 12, this behavior is more likely to cause early connection close or reset while the client is still writing the request body.
  • As a result, some HTTP/1.1 streaming clients may observe errors such as BrokenPipeError or ConnectionResetError when sending Stream Load requests through FE.
  • The problem is more visible with chunked uploads, higher network latency, and clients that continue sending request body data before fully processing the redirect response.
  • In short, this is a compatibility regression introduced by the Jetty 12 upgrade in Doris 3.1.3 and later.

Fix

  • We keep the existing FE-to-BE redirect architecture unchanged, so FE still redirects Stream Load requests to BE instead of proxying the full request body.
  • We add a bounded request-body drain step on the FE Stream Load redirect path:
    • FE first writes the 307 Temporary Redirect response.
    • FE then drains and discards only a bounded amount of the remaining request body.
    • This provides a small compatibility window for in-flight client writes and reduces the chance of early connection reset.
  • We also apply the same handling to token-authenticated Stream Load requests, so both password-authenticated and token-authenticated paths behave consistently.
  • In addition, we expose Jetty's unconsumed request content read setting through FE configuration and apply it to HTTP connectors, so operators can tune Jetty behavior for redirect scenarios where the request body is not fully consumed.
  • To make the compatibility path effective out of the box, this PR also enables the bounded drain path by default with a 1GB drain limit and a 1000ms idle wait window.

New Configurations

  • jetty_server_max_unconsumed_request_content_reads

    • Controls how many extra reads Jetty performs for unconsumed request content.
    • -1 means unlimited, 0 use Jetty default value, and a positive value sets the maximum number of read attempts.
    • Default value in this PR: 0.
    • This helps tune Jetty behavior after the Jetty 12 upgrade when FE returns a response before the request body is fully consumed.
  • stream_load_redirect_bounded_drain_max_bytes

    • Controls the maximum number of request body bytes FE drains after returning 307 for a Stream Load redirect.
    • 0 disables this compatibility logic.
    • A positive value enables bounded draining and limits how much data FE will discard.
    • Default value in this PR: 1GB.
  • stream_load_redirect_bounded_drain_max_idle_time_ms

    • Controls how long FE waits for more readable request body data during the bounded drain process.
    • 0 disables the extra idle wait.
    • A positive value provides a small grace window for slow clients or delayed body chunks, helping absorb in-flight writes without keeping the connection open indefinitely.
    • Default value in this PR: 1000ms.

Test Result / Validation

  • Verified the behavior with the same Python HTTP/1.1 chunked Stream Load reproduction used during issue analysis.
  • Reproduced requests were sent to FE with Expect: 100-continue, Transfer-Encoding: chunked, and paced body streaming to maximize the redirect race window.
  • Baseline validation on Doris 3.0 (9030 / FE 8030):
    • payload_mb=1, chunk_kb=1, sleep_ms=0
    • payload_mb=8, chunk_kb=16, sleep_ms=10
    • Both requests returned normal 307 Temporary Redirect.
  • Validation on the fixed Doris 3.1.4 instance (9034 / FE 8034):
    • Before enabling the bounded drain config, the same reproduction still triggered BrokenPipeError.
    • After enabling the FE configs below:
      • jetty_server_max_unconsumed_request_content_reads = -1
      • stream_load_redirect_bounded_drain_max_bytes = 16777216
      • stream_load_redirect_bounded_drain_max_idle_time_ms = 1000
    • The same two reproduction requests both returned normal 307 Temporary Redirect.
    • No BrokenPipeError or ConnectionResetError was observed after the config took effect.
  • The PR now further updates the default bounded drain byte limit from 16MB to 1GB, while keeping the default idle wait at 1000ms, so the compatibility path is enabled by default with a more generous drain window.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes. FE now drains a bounded amount of request body after Stream Load redirect by default, and the default drain byte limit is set to 1GB.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

Ryan19929 and others added 23 commits January 30, 2026 16:56
* Fix BE UT core dump.

* Enhance comments for pointer assignment logic

Add comment explaining conditional check for pointer assignment

* Add comments.

---------

Co-authored-by: Wen Zhenghu <wenzhenghu.zju@gmail.com>
…are mixed in VFileScanner (#30)

* [fix](scan) Fix missing predicate filter when Native and JNI readers are mixed in VFileScanner

* code format

* fix be ut test
Add the missing java.net.URI import to fix FE compilation in RestBaseController.
Handle IOException in stream load forward redirect response generation.

Use a bounded idle time window for redirect drain and add delayed-arrival regression coverage.
Remove the RedirectView status getter assertion from LoadActionTest because the current Spring version does not expose getStatusCode().
Add a configurable idle wait window for stream load redirect drain.

Refactor the repeated idle wait path, return INTERRUPTED on sleep interruption, and update FE unit tests accordingly.
### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Increase the default bounded drain byte limit to 1 GB and the default idle wait window to 1 second for stream load redirect handling, and keep FE tests aligned with the new defaults.

### Release note

Adjust the default FE stream load redirect drain limits to 1 GB and 1 second.

### Check List (For Author)

- Test: Manual check

    - No need to test (default value adjustment and FE test reset update only)

- Behavior changed: Yes (the default stream load redirect bounded drain limits are larger)

- Does this need documentation: No
@wenzhenghu wenzhenghu requested a review from morrySnow as a code owner May 19, 2026 03:10
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@wenzhenghu wenzhenghu closed this May 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants