Skip to content

[fix](streaming-job) fix streaming job properties not parsed after FE restart#62298

Merged
JNSimba merged 2 commits intoapache:masterfrom
JNSimba:fix_streaming_job_props_after_restart_new
Apr 13, 2026
Merged

[fix](streaming-job) fix streaming job properties not parsed after FE restart#62298
JNSimba merged 2 commits intoapache:masterfrom
JNSimba:fix_streaming_job_props_after_restart_new

Conversation

@JNSimba
Copy link
Copy Markdown
Member

@JNSimba JNSimba commented Apr 9, 2026

Summary

  • After FE restart, StreamingJobProperties constructor was called via gsonPostProcess() without validate(). When properties map was non-empty (e.g. max_interval=1), the old code skipped the isEmpty() branch and left maxIntervalSecond=0, causing timeoutMs=0 and every streaming task to timeout immediately.
  • Fix: parse properties directly in the constructor with safe fallback to defaults, so properties are always correctly initialized regardless of whether validate() is called.
  • Added timeout detection logging in StreamingMultiTblTask.isTimeout() for easier future debugging.

Test plan

  • Unit test: StreamingJobPropertiesTest - covers empty properties, explicit properties, bad values, and validate() behavior
  • Regression test: test_streaming_mysql_job_restart_fe_with_props - creates job with max_interval=5, restarts FE, verifies new data is consumed after restart

🤖 Generated with Claude Code

… restart

After FE restart, StreamingJobProperties constructor was only called via
gsonPostProcess() without validate(). When properties map was non-empty
(e.g. max_interval=1), the old code skipped the isEmpty() branch and left
maxIntervalSecond=0, causing timeoutMs=0 and every task to timeout immediately.

Fix: parse properties directly in the constructor with safe fallback to
defaults, so properties are always correctly initialized regardless of
whether validate() is called.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Thearas
Copy link
Copy Markdown
Contributor

Thearas commented Apr 9, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@JNSimba
Copy link
Copy Markdown
Member Author

JNSimba commented Apr 9, 2026

run buildalll

@JNSimba
Copy link
Copy Markdown
Member Author

JNSimba commented Apr 9, 2026

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found 2 issues in this patch.

  1. StreamingJobProperties now reconstructs persisted properties on FE restart, but the new fallback parser does not preserve the original validation contract for s3.max_batch_bytes.
  2. The new restart regression test compares SucceedTaskCount as strings, which makes the await flaky once the count reaches two digits.

Critical checkpoint conclusions:

  • Goal of the task: Partially achieved. The max_interval replay bug is addressed, but replay/default handling is still inconsistent for one persisted property, and the new regression test is brittle.
  • Modification size/focus: Yes, the code change is small and focused.
  • Concurrency: No new concurrency or lock-order issues found in the changed code.
  • Lifecycle/static initialization: No special lifecycle or static-init issues introduced.
  • Configuration items: No new config added.
  • Compatibility/incompatible changes: No protocol or storage-format incompatibility introduced.
  • Parallel code paths: Not fully covered. The constructor is also used for S3 replay, and that path now diverges from validate() semantics.
  • Special conditional checks: Timeout logging change is fine.
  • Test coverage: FE unit test coverage improved, but the new regression case is flaky and there is still no regression that exercises the S3 replay/default path.
  • Observability: Improved by the new timeout log.
  • Transaction/persistence: This patch changes restart/replay behavior; one replay fallback remains inconsistent with create-time validation.
  • Data writes/modifications: No direct write-path correctness issue found in this diff.
  • FE/BE variable passing: Not applicable.
  • Performance: No material performance issue found in the changed paths.
  • Other issues: None beyond the findings below.

@JNSimba
Copy link
Copy Markdown
Member Author

JNSimba commented Apr 10, 2026

run buildall

morrySnow
morrySnow previously approved these changes Apr 10, 2026
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 10, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Apr 10, 2026
@JNSimba
Copy link
Copy Markdown
Member Author

JNSimba commented Apr 10, 2026

run buildall

@JNSimba
Copy link
Copy Markdown
Member Author

JNSimba commented Apr 10, 2026

run external

Copy link
Copy Markdown
Contributor

@liaoxin01 liaoxin01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 13, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@JNSimba JNSimba merged commit 5d8855d into apache:master Apr 13, 2026
30 of 32 checks passed
github-actions bot pushed a commit that referenced this pull request Apr 13, 2026
… restart (#62298)

## Summary
- After FE restart, `StreamingJobProperties` constructor was called via
`gsonPostProcess()` without `validate()`. When properties map was
non-empty (e.g. `max_interval=1`), the old code skipped the `isEmpty()`
branch and left `maxIntervalSecond=0`, causing `timeoutMs=0` and every
streaming task to timeout immediately.
- Fix: parse properties directly in the constructor with safe fallback
to defaults, so properties are always correctly initialized regardless
of whether `validate()` is called.
- Added timeout detection logging in `StreamingMultiTblTask.isTimeout()`
for easier future debugging.
github-actions bot pushed a commit that referenced this pull request Apr 13, 2026
… restart (#62298)

## Summary
- After FE restart, `StreamingJobProperties` constructor was called via
`gsonPostProcess()` without `validate()`. When properties map was
non-empty (e.g. `max_interval=1`), the old code skipped the `isEmpty()`
branch and left `maxIntervalSecond=0`, causing `timeoutMs=0` and every
streaming task to timeout immediately.
- Fix: parse properties directly in the constructor with safe fallback
to defaults, so properties are always correctly initialized regardless
of whether `validate()` is called.
- Added timeout detection logging in `StreamingMultiTblTask.isTimeout()`
for easier future debugging.
yiguolei pushed a commit that referenced this pull request Apr 15, 2026
…sed after FE restart #62298 (#62432)

Cherry-picked from #62298

Co-authored-by: wudi <wudi@selectdb.com>
yiguolei pushed a commit that referenced this pull request Apr 15, 2026
…sed after FE restart #62298 (#62431)

Cherry-picked from #62298

Co-authored-by: wudi <wudi@selectdb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.0.6-merged dev/4.1.1-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants