Skip to content

Conversation

@ParidelPooya
Copy link
Contributor

Implement three-layer defense mechanism to prevent race conditions when parallel execution with minSuccessful completes while branches have pending callbacks.

Layer 1: Active Operations Tracking

  • Add ActiveOperationsTracker to track in-flight async operations
  • Wrap checkpoint operations with increment/decrement
  • Defer termination until active operations complete

Layer 2: Pending Completions Check

  • Track operations being checkpointed in pendingCompletions Set
  • Add hasPendingAncestorCompletion() to detect in-flight completions
  • Check both completed status and pending completions

Layer 3: Delayed Ancestor Check

  • Use setImmediate before checking ancestor status
  • Gives checkpoint queue time to process
  • Ensures ancestor status is current when checked

Changes:

  • Add ActiveOperationsTracker class and tests
  • Enhance terminate() with three-layer protection
  • Add hasPendingAncestorCompletion() to CheckpointHandler
  • Update ExecutionContext with activeOperationsTracker
  • Add example: parallel with 100 branches, minSuccessful:1
  • Add comprehensive documentation

Fixes race where:

  • Branch completes, starts checkpointing
  • Parallel sees minSuccessful reached, returns
  • Other branches try to terminate
  • Termination interrupted checkpoint
  • Parallel failed instead of succeeded

Now:

  • Pending branches detect ancestor finished
  • Skip termination, return never-resolving promise
  • Parallel succeeds correctly

Tests: All 691 SDK tests pass, 100-branch example passes in 254ms

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@ParidelPooya ParidelPooya marked this pull request as draft November 14, 2025 15:32
…e condition

    Implement three-layer defense mechanism to prevent race conditions when
    parallel execution with minSuccessful completes while branches have
    pending callbacks.

    Layer 1: Active Operations Tracking
    - Add ActiveOperationsTracker to track in-flight async operations
    - Wrap checkpoint operations with increment/decrement
    - Defer termination until active operations complete

    Layer 2: Pending Completions Check
    - Track operations being checkpointed in pendingCompletions Set
    - Add hasPendingAncestorCompletion() to detect in-flight completions
    - Check both completed status and pending completions

    Layer 3: Delayed Ancestor Check
    - Use setImmediate before checking ancestor status
    - Gives checkpoint queue time to process
    - Ensures ancestor status is current when checked

    Changes:
    - Add ActiveOperationsTracker class and tests
    - Enhance terminate() with three-layer protection
    - Add hasPendingAncestorCompletion() to CheckpointHandler
    - Update ExecutionContext with activeOperationsTracker
    - Add example: parallel with 3 branches, minSuccessful:1

    Fixes race where:
    - Branch completes, starts checkpointing
    - Parallel sees minSuccessful reached, returns
    - Other branches try to terminate
    - Termination interrupted checkpoint
    - Parallel failed instead of succeeded

    Now:
    - Pending branches detect ancestor finished
    - Skip termination, return never-resolving promise
    - Parallel succeeds correctly
@ParidelPooya ParidelPooya force-pushed the feat/termination-deferral-race-condition-fix branch from 390a956 to b18956b Compare November 14, 2025 21:28
@anthonyting
Copy link
Contributor

Can we create an issue to follow-up to investigate the issues with polling and using setImmediate?

@ParidelPooya ParidelPooya marked this pull request as ready for review November 14, 2025 23:08
@ParidelPooya ParidelPooya merged commit 0419ca8 into development Nov 14, 2025
42 of 44 checks passed
@ParidelPooya ParidelPooya deleted the feat/termination-deferral-race-condition-fix branch November 14, 2025 23:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants