Skip to content

fix: race condition for SuspendException#46

Merged
maschnetwork merged 9 commits intomainfrom
fix/testing
Feb 4, 2026
Merged

fix: race condition for SuspendException#46
maschnetwork merged 9 commits intomainfrom
fix/testing

Conversation

@maschnetwork
Copy link
Contributor

@maschnetwork maschnetwork commented Feb 3, 2026

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Problem

Tests were failing intermittently on GitHub Actions but passing locally. Executions that should
return PENDING (waiting on callbacks/waits) were incorrectly returning FAILED.

Root Cause

The SDK uses SuspendExecutionException to unwind the call stack when execution should suspend.
The DurableExecutor relied on checking suspendFuture.isDone() to detect suspension, but due to a
race condition, the exception could propagate and be caught by generic exception handlers before
the future was visible as done - causing it to be treated as a failure.

This race was more likely to occur on CI runners due to different thread scheduling and slower
execution compared to local machines.

Changes

DurableExecutor.java

  • Simplified exception handling to explicitly check for SuspendExecutionException (including when
    wrapped by CompletableFuture)
  • Removed redundant isCompletedExceptionally() check - now handled by single catch block

StepOperation.java

  • Catch SuspendExecutionException in finally block of worker thread to prevent phaser deadlock
  • Without this fix, stepAsync().get() could block forever when the exception escaped the executor

zhongkechen
zhongkechen previously approved these changes Feb 3, 2026
// Check if this is a suspension, not a real failure
if (cause instanceof SuspendExecutionException || suspendFuture.isDone()) {
return DurableExecutionOutput.pending();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is a fix of how we handle suspend exceptions. The PR title/commit message might be confusing as it implies an issue in tests.

@maschnetwork maschnetwork changed the title fix: testing fix: race condition for SuspendException Feb 4, 2026
@maschnetwork maschnetwork requested a review from phipag February 4, 2026 08:03
@maschnetwork maschnetwork merged commit 114f749 into main Feb 4, 2026
7 checks passed
@maschnetwork maschnetwork deleted the fix/testing branch February 4, 2026 09:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants