fix(execution-api): fix multi-input operator's execution termination condition by bobbai00 · Pull Request #4615 · apache/texera

bobbai00 · 2026-05-01T20:30:32Z

What changes were proposed in this PR?

This PR fixes a race in SyncExecutionResource.allTargetsCompleted that causes the sync execution API (POST /api/execution/{wid}/{cuid}/run) to terminate before a HashJoin's probe phase produces output, returning an empty result.

Root cause. HashJoinOpDesc.getPhysicalPlan produces two PhysicalOps (build, probe) sharing one logical id, separated by a blocking edge. The scheduler places them in two regions and runs them sequentially. WorkflowExecution.getAllRegionExecutionsStats aggregates per-logical-op state by groupBy(_._1.logicalOpId.id) over only the registered RegionExecutions. Between "build region completed" and "probe region instantiated," only the build PhysicalOp is registered, so aggregateStates(Iterable(COMPLETED)) returns COMPLETED. The sync resource then takes the TargetResultsReady branch, calls killExecution, and reads the probe's still-empty Iceberg output. The same shape applies to any logical operator whose physical plan contains multiple PhysicalOps separated by a blocking edge (e.g., Aggregate). It does not surface in the regular WebSocket-driven frontend execution because the frontend waits for full workflow termination.

Fix. Strengthen allTargetsCompleted to require, in addition to operatorState == COMPLETED, that every declared external input port of the target is already present in OperatorMetrics.operatorStatistics.inputMetrics. Port-1 metrics only appear after the probe actually consumes data, which closes the race window. Internal ports (e.g., HashJoin's build→probe internal edge) are filtered out on both sides of the comparison so the predicate matches what aggregateMetrics already exposes. Source operators (zero declared inputs) and single-input operators are unaffected; for empty-input edge cases, terminalStateObservable continues to provide the fallback signal.

val targetExpectedExternalInputs: Map[String, Int] = effectiveLogicalPlan.operators
  .filter(op => request.targetOperatorIds.contains(op.operatorIdentifier.id))
  .map(op =>
    op.operatorIdentifier.id -> op.operatorInfo.inputPorts.count(!_.id.internal)
  )
  .toMap

def allTargetsCompleted(stats: ExecutionStatsStore): Boolean = {
  request.targetOperatorIds.nonEmpty && request.targetOperatorIds.forall { opId =>
    stats.operatorInfo.get(opId).exists { metrics =>
      val externalInputPortsReporting =
        metrics.operatorStatistics.inputMetrics.count(!_.portId.internal)
      val expectedExternalInputs = targetExpectedExternalInputs.getOrElse(opId, 0)
      metrics.operatorState == COMPLETED &&
      externalInputPortsReporting >= expectedExternalInputs
    }
  }
}

Any related issues, documentation, discussions?

Closes #4576

How was this PR tested?

Manually reproduced and verified end-to-end against ComputingUnitMaster on port 8085 with a 3-operator DAG (CSVFileScan movies + CSVFileScan ratings → HashJoin on movieId) executed via POST /api/execution/{wid}/{cuid}/run with targetOperatorIds = [HashJoinId]. Inputs: movies.csv (1000 rows) and ratings.csv (10 311 rows).

Steps to reproduce / verify:

# 1. Start the master
sbt "project WorkflowExecutionService" compile
java ... org.apache.texera.web.ComputingUnitMaster   # listens on :8085

# 2. Get a JWT
curl -s -X POST http://localhost:8080/api/auth/login \
  -H "Content-Type: application/json" \
  -d '{"username":"<user>","password":"<pw>"}'

# 3. POST the request (CSV → CSV → HashJoin, target = HashJoin)
curl -s -X POST http://localhost:8085/api/execution/<wid>/<cuid>/run \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <token>" \
  --data @sync-exec-request.json

Existing tests pass (sbt "project WorkflowExecutionService" compile succeeds). No new unit test was added because the failure is a timing race in the controller's region-registration sequence relative to the sync resource's observable; reproducing it deterministically in a unit test would require either mocking ExecutionStatsStore to emit a build-only snapshot followed by a build+probe snapshot, or driving the full controller actor system, both of which are out of scope for this targeted fix. Manual reproduction is reliable on every run because the race window is several hundred milliseconds wide and Observable.amb consistently selects the (incorrect) target-completion signal first prior to this fix.

Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.7)

`SyncExecutionResource.allTargetsCompleted` previously fired as soon as the target's aggregated `operatorState` reached `COMPLETED`. For logical operators that compile to multiple PhysicalOps separated by a blocking edge (HashJoin: build → probe; same shape applies to Aggregate), the build region's terminal state propagates briefly before the probe region is added to `regionExecutions`. During that window the per- logical-op aggregation only sees the build PhysicalOp and reports COMPLETED, the resource then takes the `TargetResultsReady` branch, kills the execution, and reads the probe's still-empty output storage. Also require every declared external input port to appear in the target's `inputMetrics` before treating it as completed. Port-1 stats only appear once the probe actually consumes data, which closes the race; source operators (no input) and single-input operators are unaffected.

aglinxinyuan · 2026-05-01T20:56:08Z

@Xiao-zhen-Liu, please review this PR. It's about the region lifecycle and hashjoin.

Xiao-zhen-Liu

LGTM.

Xiao-zhen-Liu · 2026-05-01T22:49:17Z

@bobbai00 But I think the issue is not about multi-input operator? It is multi-physical operators for one logical operator.

…condition (#4615) ### What changes were proposed in this PR? This PR fixes a race in `SyncExecutionResource.allTargetsCompleted` that causes the sync execution API (`POST /api/execution/{wid}/{cuid}/run`) to terminate before a HashJoin's probe phase produces output, returning an empty result. **Root cause.** `HashJoinOpDesc.getPhysicalPlan` produces two PhysicalOps (`build`, `probe`) sharing one logical id, separated by a blocking edge. The scheduler places them in two regions and runs them sequentially. `WorkflowExecution.getAllRegionExecutionsStats` aggregates per-logical-op state by `groupBy(_._1.logicalOpId.id)` over only the *registered* `RegionExecution`s. Between "build region completed" and "probe region instantiated," only the build PhysicalOp is registered, so `aggregateStates(Iterable(COMPLETED))` returns `COMPLETED`. The sync resource then takes the `TargetResultsReady` branch, calls `killExecution`, and reads the probe's still-empty Iceberg output. The same shape applies to any logical operator whose physical plan contains multiple PhysicalOps separated by a blocking edge (e.g., `Aggregate`). It does not surface in the regular WebSocket-driven frontend execution because the frontend waits for full workflow termination. **Fix.** Strengthen `allTargetsCompleted` to require, in addition to `operatorState == COMPLETED`, that every declared external input port of the target is already present in `OperatorMetrics.operatorStatistics.inputMetrics`. Port-1 metrics only appear after the probe actually consumes data, which closes the race window. Internal ports (e.g., HashJoin's build→probe internal edge) are filtered out on both sides of the comparison so the predicate matches what `aggregateMetrics` already exposes. Source operators (zero declared inputs) and single-input operators are unaffected; for empty-input edge cases, `terminalStateObservable` continues to provide the fallback signal. ```scala val targetExpectedExternalInputs: Map[String, Int] = effectiveLogicalPlan.operators .filter(op => request.targetOperatorIds.contains(op.operatorIdentifier.id)) .map(op => op.operatorIdentifier.id -> op.operatorInfo.inputPorts.count(!_.id.internal) ) .toMap def allTargetsCompleted(stats: ExecutionStatsStore): Boolean = { request.targetOperatorIds.nonEmpty && request.targetOperatorIds.forall { opId => stats.operatorInfo.get(opId).exists { metrics => val externalInputPortsReporting = metrics.operatorStatistics.inputMetrics.count(!_.portId.internal) val expectedExternalInputs = targetExpectedExternalInputs.getOrElse(opId, 0) metrics.operatorState == COMPLETED && externalInputPortsReporting >= expectedExternalInputs } } } ``` ### Any related issues, documentation, discussions? Closes #4576 ### How was this PR tested? Manually reproduced and verified end-to-end against `ComputingUnitMaster` on port 8085 with a 3-operator DAG (CSVFileScan movies + CSVFileScan ratings → HashJoin on `movieId`) executed via `POST /api/execution/{wid}/{cuid}/run` with `targetOperatorIds = [HashJoinId]`. Inputs: `movies.csv` (1000 rows) and `ratings.csv` (10 311 rows). Steps to reproduce / verify: ``` # 1. Start the master sbt "project WorkflowExecutionService" compile java ... org.apache.texera.web.ComputingUnitMaster # listens on :8085 # 2. Get a JWT curl -s -X POST http://localhost:8080/api/auth/login \ -H "Content-Type: application/json" \ -d '{"username":"<user>","password":"<pw>"}' # 3. POST the request (CSV → CSV → HashJoin, target = HashJoin) curl -s -X POST http://localhost:8085/api/execution/<wid>/<cuid>/run \ -H "Content-Type: application/json" \ -H "Authorization: Bearer <token>" \ --data @sync-exec-request.json ``` Existing tests pass (`sbt "project WorkflowExecutionService" compile` succeeds). No new unit test was added because the failure is a timing race in the controller's region-registration sequence relative to the sync resource's observable; reproducing it deterministically in a unit test would require either mocking `ExecutionStatsStore` to emit a build-only snapshot followed by a build+probe snapshot, or driving the full controller actor system, both of which are out of scope for this targeted fix. Manual reproduction is reliable on every run because the race window is several hundred milliseconds wide and `Observable.amb` consistently selects the (incorrect) target-completion signal first prior to this fix. ### Was this PR authored or co-authored using generative AI tooling? Generated-by: Claude Code (Claude Opus 4.7) --------- Co-authored-by: Xinyuan Lin <xinyual3@uci.edu> (backported from commit 8383e19)

…condition (apache#4615) ### What changes were proposed in this PR? This PR fixes a race in `SyncExecutionResource.allTargetsCompleted` that causes the sync execution API (`POST /api/execution/{wid}/{cuid}/run`) to terminate before a HashJoin's probe phase produces output, returning an empty result. **Root cause.** `HashJoinOpDesc.getPhysicalPlan` produces two PhysicalOps (`build`, `probe`) sharing one logical id, separated by a blocking edge. The scheduler places them in two regions and runs them sequentially. `WorkflowExecution.getAllRegionExecutionsStats` aggregates per-logical-op state by `groupBy(_._1.logicalOpId.id)` over only the *registered* `RegionExecution`s. Between "build region completed" and "probe region instantiated," only the build PhysicalOp is registered, so `aggregateStates(Iterable(COMPLETED))` returns `COMPLETED`. The sync resource then takes the `TargetResultsReady` branch, calls `killExecution`, and reads the probe's still-empty Iceberg output. The same shape applies to any logical operator whose physical plan contains multiple PhysicalOps separated by a blocking edge (e.g., `Aggregate`). It does not surface in the regular WebSocket-driven frontend execution because the frontend waits for full workflow termination. **Fix.** Strengthen `allTargetsCompleted` to require, in addition to `operatorState == COMPLETED`, that every declared external input port of the target is already present in `OperatorMetrics.operatorStatistics.inputMetrics`. Port-1 metrics only appear after the probe actually consumes data, which closes the race window. Internal ports (e.g., HashJoin's build→probe internal edge) are filtered out on both sides of the comparison so the predicate matches what `aggregateMetrics` already exposes. Source operators (zero declared inputs) and single-input operators are unaffected; for empty-input edge cases, `terminalStateObservable` continues to provide the fallback signal. ```scala val targetExpectedExternalInputs: Map[String, Int] = effectiveLogicalPlan.operators .filter(op => request.targetOperatorIds.contains(op.operatorIdentifier.id)) .map(op => op.operatorIdentifier.id -> op.operatorInfo.inputPorts.count(!_.id.internal) ) .toMap def allTargetsCompleted(stats: ExecutionStatsStore): Boolean = { request.targetOperatorIds.nonEmpty && request.targetOperatorIds.forall { opId => stats.operatorInfo.get(opId).exists { metrics => val externalInputPortsReporting = metrics.operatorStatistics.inputMetrics.count(!_.portId.internal) val expectedExternalInputs = targetExpectedExternalInputs.getOrElse(opId, 0) metrics.operatorState == COMPLETED && externalInputPortsReporting >= expectedExternalInputs } } } ``` ### Any related issues, documentation, discussions? Closes apache#4576 ### How was this PR tested? Manually reproduced and verified end-to-end against `ComputingUnitMaster` on port 8085 with a 3-operator DAG (CSVFileScan movies + CSVFileScan ratings → HashJoin on `movieId`) executed via `POST /api/execution/{wid}/{cuid}/run` with `targetOperatorIds = [HashJoinId]`. Inputs: `movies.csv` (1000 rows) and `ratings.csv` (10 311 rows). Steps to reproduce / verify: ``` # 1. Start the master sbt "project WorkflowExecutionService" compile java ... org.apache.texera.web.ComputingUnitMaster # listens on :8085 # 2. Get a JWT curl -s -X POST http://localhost:8080/api/auth/login \ -H "Content-Type: application/json" \ -d '{"username":"<user>","password":"<pw>"}' # 3. POST the request (CSV → CSV → HashJoin, target = HashJoin) curl -s -X POST http://localhost:8085/api/execution/<wid>/<cuid>/run \ -H "Content-Type: application/json" \ -H "Authorization: Bearer <token>" \ --data @sync-exec-request.json ``` Existing tests pass (`sbt "project WorkflowExecutionService" compile` succeeds). No new unit test was added because the failure is a timing race in the controller's region-registration sequence relative to the sync resource's observable; reproducing it deterministically in a unit test would require either mocking `ExecutionStatsStore` to emit a build-only snapshot followed by a build+probe snapshot, or driving the full controller actor system, both of which are out of scope for this targeted fix. Manual reproduction is reliable on every run because the race window is several hundred milliseconds wide and `Observable.amb` consistently selects the (incorrect) target-completion signal first prior to this fix. ### Was this PR authored or co-authored using generative AI tooling? Generated-by: Claude Code (Claude Opus 4.7) --------- Co-authored-by: Xinyuan Lin <xinyual3@uci.edu>

github-actions Bot assigned bobbai00 May 1, 2026

github-actions Bot added engine fix labels May 1, 2026

bobbai00 requested a review from aglinxinyuan May 1, 2026 20:54

aglinxinyuan requested a review from Xiao-zhen-Liu May 1, 2026 20:55

bobbai00 added the release/v1.1.0-incubating back porting to release/v1.1.0-incubating label May 1, 2026

aglinxinyuan removed their request for review May 1, 2026 20:55

fmt

2638b6a

bobbai00 requested review from Xiao-zhen-Liu and aglinxinyuan and removed request for Xiao-zhen-Liu May 1, 2026 20:56

Merge branch 'main' into fix/hashjoin-sync-execution-race

9dc90a8

bobbai00 changed the title ~~fix(execution): close HashJoin sync-exec premature-termination race~~ fix(execution-api): fix multi-input operator's execution termination condition May 1, 2026

Merge branch 'main' into fix/hashjoin-sync-execution-race

2a13c4d

Xiao-zhen-Liu approved these changes May 1, 2026

View reviewed changes

bobbai00 enabled auto-merge (squash) May 1, 2026 22:47

bobbai00 merged commit 8383e19 into apache:main May 1, 2026
26 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(execution-api): fix multi-input operator's execution termination condition#4615

fix(execution-api): fix multi-input operator's execution termination condition#4615
bobbai00 merged 4 commits into
apache:mainfrom
bobbai00:fix/hashjoin-sync-execution-race

bobbai00 commented May 1, 2026 •

edited

Loading

Uh oh!

aglinxinyuan commented May 1, 2026 •

edited

Loading

Uh oh!

Xiao-zhen-Liu left a comment

Uh oh!

Xiao-zhen-Liu commented May 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

bobbai00 commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this PR?

Any related issues, documentation, discussions?

How was this PR tested?

Was this PR authored or co-authored using generative AI tooling?

Uh oh!

aglinxinyuan commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Xiao-zhen-Liu left a comment

Choose a reason for hiding this comment

Uh oh!

Xiao-zhen-Liu commented May 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bobbai00 commented May 1, 2026 •

edited

Loading

aglinxinyuan commented May 1, 2026 •

edited

Loading