Skip to content

test(amber): drop stale Thread.sleep(1000) after COMPLETED state#4680

Merged
Yicong-Huang merged 1 commit into
apache:mainfrom
Yicong-Huang:fix/e2e-drop-stale-completed-sleep
May 2, 2026
Merged

test(amber): drop stale Thread.sleep(1000) after COMPLETED state#4680
Yicong-Huang merged 1 commit into
apache:mainfrom
Yicong-Huang:fix/e2e-drop-stale-completed-sleep

Conversation

@Yicong-Huang
Copy link
Copy Markdown
Contributor

@Yicong-Huang Yicong-Huang commented May 2, 2026

What changes were proposed in this PR?

Remove the Thread.sleep(1000) workaround inside the result-reading block of DataProcessingSpec and ReconfigurationSpec. The sleep was annotated with a TODO calling it a workaround for "the issue of reporting completed status too early" — the current engine path is synchronous end-to-end, so the workaround is dead code.

             .map(terminalOpId => {
-              //TODO: remove the delay after fixing the issue of reporting "completed" status too early.
-              Thread.sleep(1000)
               val uri = getResultUriByLogicalPortId(...)
               ...
             })

The synchronous path that emits COMPLETED:

  1. DataProcessor.outputOneTuple handles FinalizePortoutputManager.closeOutputStorageWriterIfNeeded(portId)
  2. closeOutputStorageWriterIfNeeded puts a terminate signal on the writer thread's queue and join()s it
  3. The writer thread, on terminate, calls IcebergTableWriter.close()flushBuffer()table.newAppend().appendFile(dataFile).commit()
  4. Only then does DataProcessor fire portCompleted to the controller
  5. After all ports complete, controller emits ExecutionStateUpdate(COMPLETED) to the client
  6. Read side does IcebergDocument.get()seekToUsableFile()table.refresh() before scan

Any related issues, documentation, discussions?

Closes #4679.

How was this PR tested?

Ran the two specs together with -T 1 (sequential) on a clean local checkout, 20 consecutive runs, 21/21 tests passing every run, 0 flakes:

sbt 'WorkflowExecutionService / Test / testOnly \
  org.apache.texera.amber.engine.e2e.DataProcessingSpec \
  org.apache.texera.amber.engine.e2e.ReconfigurationSpec'

Run wall-clock distribution: min 35.6 s, mean 39.5 s, max 45.9 s.

Compared to the same two specs with the sleep on the same machine:

spec before after delta
DataProcessingSpec 36.5 s 16.9 s -19.6 s
ReconfigurationSpec 27.2 s 18.7 s -8.5 s

DataProcessingSpec saves more than 16 × 1 s because the sleep was inside a per-terminal-port .map(...) block — workflows with multiple terminal ports paid the cost more than once per test.

Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.7, 1M context)

@Yicong-Huang Yicong-Huang changed the title test(amber-e2e): drop stale Thread.sleep(1000) after COMPLETED state test(amber): drop stale Thread.sleep(1000) after COMPLETED state May 2, 2026
The sleep was a workaround for an early engine version where the
COMPLETED state could fire before the sink's iceberg commit landed.
Today's path is fully synchronous: closeOutputStorageWriterIfNeeded
joins the writer thread, the writer's close() commits the iceberg
table, and only then does the worker emit portCompleted. Local 5x
sequential runs of the two specs pass 21/21 with the sleep removed.

Closes apache#4679.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Yicong-Huang Yicong-Huang force-pushed the fix/e2e-drop-stale-completed-sleep branch from 63889d3 to 9f8ad9b Compare May 2, 2026 18:28
@Yicong-Huang Yicong-Huang enabled auto-merge (squash) May 2, 2026 18:29
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 53.02%. Comparing base (7811115) to head (9f8ad9b).

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #4680      +/-   ##
============================================
+ Coverage     46.17%   53.02%   +6.84%     
+ Complexity     1999     1995       -4     
============================================
  Files          1013      817     -196     
  Lines         38165    24630   -13535     
  Branches       3712     1921    -1791     
============================================
- Hits          17623    13060    -4563     
+ Misses        19769    10912    -8857     
+ Partials        773      658     -115     
Flag Coverage Δ
agent-service ?
frontend ?
python 85.05% <ø> (ø)
scala 38.17% <ø> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@aicam aicam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Yicong-Huang Yicong-Huang merged commit 598ed0f into apache:main May 2, 2026
15 checks passed
Yicong-Huang added a commit that referenced this pull request May 3, 2026
…#4874)

### What changes were proposed in this PR?

Backport [#4680](#4680) (commit
`598ed0f7f037b96ebe5549759158e46ba736b372` on `main`) onto
`release/v1.1.0-incubating`.

Removes the dead `Thread.sleep(1000)` + TODO inside the result-reading
block in `ReconfigurationSpec.scala` and `DataProcessingSpec.scala`,
matching what already landed on `main`.

### Any related issues, documentation, discussions?

Backports #4680. Unblocks the backport leg of #4871.

### How was this PR tested?

Cherry-pick of an already-reviewed and merged commit; applies cleanly.
CI on this PR exercises the change against the release branch's full
Python matrix.

### Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Code (claude-opus-4-7)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Drop unneeded Thread.sleep(1000) workaround in DataProcessingSpec and ReconfigurationSpec

3 participants