Skip to content

feat(nested-steps): Stop hardcoding Root context thread#9

Merged
phipag merged 10 commits intomainfrom
phipag/nested-steps
Jan 21, 2026
Merged

feat(nested-steps): Stop hardcoding Root context thread#9
phipag merged 10 commits intomainfrom
phipag/nested-steps

Conversation

@phipag
Copy link
Contributor

@phipag phipag commented Jan 19, 2026

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Issue Link, if available

#7

Description

This PR removes hardcoding of the "Root" context thread name. Now, the DurableContext thread default to the name assigned by the managed executor. Steps will run in their own thread and set the thread name on their own in sync with e.g. stepTaskId. This way, we can de-register the calling thread by thread name in .get().

We also removed ThreadType since it was unused and only used in DEBUG logs. If we need it again in the future, we can re-add it.

Demo/Screenshots

Added NestedStepExample:

Logs (see how step 2 is de-registered from step 1 running .get() inside step 2). This can be interpreted as handing off control from step 2 (running in "Root" context) to step 1 running in step 2 and then back up the chain.

// step 1 obtaining control from step 2 (caller thread)
2026-01-19 16:06:13.039 [2-step] DEBUG com.amazonaws.lambda.durable.execution.ExecutionManager - Deregistered thread '2-step'. Active threads: 1

...

// Handing back control to step 2
2026-01-19 16:06:23.135 [2-step] DEBUG com.amazonaws.lambda.durable.execution.ExecutionManager - Registered thread '2-step' as active. Active threads: 2
// De-registering itself in step 1
2026-01-19 16:06:23.135 [1-step] DEBUG com.amazonaws.lambda.durable.execution.ExecutionManager - Deregistered thread '1-step'. Active threads: 1
SAM_CONTAINER_ID: c41077c4159180f8e08d217001199734e35c069b0b2c295e8edeb4fb0efb7ac7                                                                                                                                                                                                                                     
START RequestId: 7ad356ed-ff9c-4cd4-bd62-0781d793c642 Version: $LATEST
2026-01-19 16:06:12.639 [main] DEBUG com.amazonaws.lambda.durable.DurableConfig - Creating default DurableExecutionClient
2026-01-19 16:06:12.983 [main] DEBUG com.amazonaws.lambda.durable.DurableConfig - Default DurableExecutionClient created for region: us-east-1
2026-01-19 16:06:12.984 [main] DEBUG com.amazonaws.lambda.durable.DurableConfig - Creating default ExecutorService
2026-01-19 16:06:12.986 [main] DEBUG com.amazonaws.lambda.durable.DurableHandler - Raw input from durable handler: {"DurableExecutionArn": "8a276f03-422a-4a4f-84f4-a9c07124be32", "CheckpointToken": "eyJhcm4iOiI4YTI3NmYwMy00MjJhLTRhNGYtODRmNC1hOWMwNzEyNGJlMzIiLCJzZXEiOjF9", "InitialExecutionState": {"Operations": [{"Id": "95df56c3-358e-4cdd-912b-af06b32cde7e", "Type": "EXECUTION", "Status": "STARTED", "StartTimestamp": "2026-01-19 16:06:12.319239+00:00", "ExecutionDetails": {"InputPayload": "{}"}}], "NextMarker": ""}}
2026-01-19 16:06:13.027 [main] DEBUG com.amazonaws.lambda.durable.DurableExecutor - DurableExecution.execute() called
2026-01-19 16:06:13.027 [main] DEBUG com.amazonaws.lambda.durable.DurableExecutor - DurableExecutionArn: 8a276f03-422a-4a4f-84f4-a9c07124be32
2026-01-19 16:06:13.028 [main] DEBUG com.amazonaws.lambda.durable.DurableExecutor - CheckpointToken: eyJhcm4iOiI4YTI3NmYwMy00MjJhLTRhNGYtODRmNC1hOWMwNzEyNGJlMzIiLCJzZXEiOjF9
2026-01-19 16:06:13.028 [main] DEBUG com.amazonaws.lambda.durable.DurableExecutor - Initial operations count: 1
2026-01-19 16:06:13.028 [main] DEBUG com.amazonaws.lambda.durable.DurableExecutor - EXECUTION operation found: 95df56c3-358e-4cdd-912b-af06b32cde7e
2026-01-19 16:06:13.032 [durable-exec-14] DEBUG com.amazonaws.lambda.durable.execution.ExecutionManager - Registered thread 'durable-exec-14' as active. Active threads: 1
2026-01-19 16:06:13.033 [durable-exec-14] DEBUG com.amazonaws.lambda.durable.execution.ExecutionManager - Started phaser for operation '1'
2026-01-19 16:06:13.033 [durable-exec-14] DEBUG com.amazonaws.lambda.durable.execution.ExecutionManager - Registered thread '1-step' as active. Active threads: 2
2026-01-19 16:06:13.034 [durable-exec-14] DEBUG com.amazonaws.lambda.durable.execution.ExecutionManager - Started phaser for operation '2'
2026-01-19 16:06:13.034 [durable-exec-14] DEBUG com.amazonaws.lambda.durable.execution.ExecutionManager - Registered thread '2-step' as active. Active threads: 3
2026-01-19 16:06:13.035 [durable-exec-14] DEBUG com.amazonaws.lambda.durable.operation.StepOperation - StepOperation.get() attempting to deregister thread: durable-exec-14
2026-01-19 16:06:13.035 [durable-exec-14] DEBUG com.amazonaws.lambda.durable.execution.ExecutionManager - Deregistered thread 'durable-exec-14'. Active threads: 2
2026-01-19 16:06:13.035 [durable-exec-14] DEBUG com.amazonaws.lambda.durable.operation.StepOperation - Waiting for operation to finish 2 (Phaser: java.util.concurrent.Phaser@527ccbcd[phase = 0 parties = 2 arrived = 0])
2026-01-19 16:06:13.039 [1-step] DEBUG com.amazonaws.lambda.durable.execution.CheckpointBatcher - Checkpoint request received: Action START
2026-01-19 16:06:13.039 [2-step] DEBUG com.amazonaws.lambda.durable.execution.CheckpointBatcher - Checkpoint request received: Action START
2026-01-19 16:06:13.039 [2-step] DEBUG com.amazonaws.lambda.durable.operation.StepOperation - StepOperation.get() attempting to deregister thread: 2-step
2026-01-19 16:06:13.039 [2-step] DEBUG com.amazonaws.lambda.durable.execution.ExecutionManager - Deregistered thread '2-step'. Active threads: 1
2026-01-19 16:06:13.039 [2-step] DEBUG com.amazonaws.lambda.durable.operation.StepOperation - Waiting for operation to finish 1 (Phaser: java.util.concurrent.Phaser@44755686[phase = 0 parties = 2 arrived = 0])
2026-01-19 16:06:13.042 [pool-2-thread-1] DEBUG com.amazonaws.lambda.durable.client.LambdaDurableFunctionsClient - Calling DAR backend with 2 updates: CheckpointDurableExecutionRequest(DurableExecutionArn=8a276f03-422a-4a4f-84f4-a9c07124be32, CheckpointToken=eyJhcm4iOiI4YTI3NmYwMy00MjJhLTRhNGYtODRmNC1hOWMwNzEyNGJlMzIiLCJzZXEiOjF9, Updates=[OperationUpdate(Id=1, Name=async-step, Type=STEP, Action=START), OperationUpdate(Id=2, Name=process-result, Type=STEP, Action=START)])
2026-01-19 16:06:13.060 [pool-2-thread-1] DEBUG com.amazonaws.lambda.durable.execution.CheckpointBatcher - DAR backend called: CheckpointDurableExecutionResponse(CheckpointToken=eyJhcm4iOiI4YTI3NmYwMy00MjJhLTRhNGYtODRmNC1hOWMwNzEyNGJlMzIiLCJzZXEiOjJ9, NewExecutionState=CheckpointUpdatedExecutionState(Operations=[Operation(Id=95df56c3-358e-4cdd-912b-af06b32cde7e, Type=EXECUTION, StartTimestamp=2026-01-19T16:06:12.319Z, Status=STARTED, ExecutionDetails=ExecutionDetails(InputPayload=*** Sensitive Data Redacted ***)), Operation(Id=1, Name=async-step, Type=STEP, StartTimestamp=2026-01-19T16:06:13.053Z, Status=STARTED, StepDetails=StepDetails(Attempt=0)), Operation(Id=2, Name=process-result, Type=STEP, StartTimestamp=2026-01-19T16:06:13.053Z, Status=STARTED, StepDetails=StepDetails(Attempt=0))])).
2026-01-19 16:06:23.060 [1-step] DEBUG com.amazonaws.lambda.durable.execution.CheckpointBatcher - Checkpoint request received: Action SUCCEED
2026-01-19 16:06:23.061 [pool-2-thread-1] DEBUG com.amazonaws.lambda.durable.client.LambdaDurableFunctionsClient - Calling DAR backend with 1 updates: CheckpointDurableExecutionRequest(DurableExecutionArn=8a276f03-422a-4a4f-84f4-a9c07124be32, CheckpointToken=eyJhcm4iOiI4YTI3NmYwMy00MjJhLTRhNGYtODRmNC1hOWMwNzEyNGJlMzIiLCJzZXEiOjJ9, Updates=[OperationUpdate(Id=1, Name=async-step, Type=STEP, Action=SUCCEED, Payload=*** Sensitive Data Redacted ***)])
2026-01-19 16:06:23.134 [pool-2-thread-1] DEBUG com.amazonaws.lambda.durable.execution.CheckpointBatcher - DAR backend called: CheckpointDurableExecutionResponse(CheckpointToken=eyJhcm4iOiI4YTI3NmYwMy00MjJhLTRhNGYtODRmNC1hOWMwNzEyNGJlMzIiLCJzZXEiOjN9, NewExecutionState=CheckpointUpdatedExecutionState(Operations=[Operation(Id=95df56c3-358e-4cdd-912b-af06b32cde7e, Type=EXECUTION, StartTimestamp=2026-01-19T16:06:12.319Z, Status=STARTED, ExecutionDetails=ExecutionDetails(InputPayload=*** Sensitive Data Redacted ***)), Operation(Id=1, Name=async-step, Type=STEP, StartTimestamp=2026-01-19T16:06:13.053Z, EndTimestamp=2026-01-19T16:06:23.121Z, Status=SUCCEEDED, StepDetails=StepDetails(Attempt=1, Result=*** Sensitive Data Redacted ***)), Operation(Id=2, Name=process-result, Type=STEP, StartTimestamp=2026-01-19T16:06:13.053Z, Status=STARTED, StepDetails=StepDetails(Attempt=0))])).
2026-01-19 16:06:23.135 [pool-2-thread-1] DEBUG com.amazonaws.lambda.durable.execution.ExecutionManager - Advancing phaser 0 -> 1: java.util.concurrent.Phaser@44755686[phase = 0 parties = 2 arrived = 1]
2026-01-19 16:06:23.135 [pool-2-thread-1] DEBUG com.amazonaws.lambda.durable.execution.ExecutionManager - Advancing phaser 1 -> 2: java.util.concurrent.Phaser@44755686[phase = 1 parties = 2 arrived = 0]
2026-01-19 16:06:23.135 [2-step] DEBUG com.amazonaws.lambda.durable.execution.ExecutionManager - Registered thread '2-step' as active. Active threads: 2
2026-01-19 16:06:23.135 [1-step] DEBUG com.amazonaws.lambda.durable.execution.ExecutionManager - Deregistered thread '1-step'. Active threads: 1
2026-01-19 16:06:23.137 [2-step] DEBUG com.amazonaws.lambda.durable.execution.CheckpointBatcher - Checkpoint request received: Action SUCCEED
2026-01-19 16:06:23.137 [pool-2-thread-1] DEBUG com.amazonaws.lambda.durable.client.LambdaDurableFunctionsClient - Calling DAR backend with 1 updates: CheckpointDurableExecutionRequest(DurableExecutionArn=8a276f03-422a-4a4f-84f4-a9c07124be32, CheckpointToken=eyJhcm4iOiI4YTI3NmYwMy00MjJhLTRhNGYtODRmNC1hOWMwNzEyNGJlMzIiLCJzZXEiOjN9, Updates=[OperationUpdate(Id=2, Name=process-result, Type=STEP, Action=SUCCEED, Payload=*** Sensitive Data Redacted ***)])
2026-01-19 16:06:23.150 [pool-2-thread-1] DEBUG com.amazonaws.lambda.durable.execution.CheckpointBatcher - DAR backend called: CheckpointDurableExecutionResponse(CheckpointToken=eyJhcm4iOiI4YTI3NmYwMy00MjJhLTRhNGYtODRmNC1hOWMwNzEyNGJlMzIiLCJzZXEiOjR9, NewExecutionState=CheckpointUpdatedExecutionState(Operations=[Operation(Id=95df56c3-358e-4cdd-912b-af06b32cde7e, Type=EXECUTION, StartTimestamp=2026-01-19T16:06:12.319Z, Status=STARTED, ExecutionDetails=ExecutionDetails(InputPayload=*** Sensitive Data Redacted ***)), Operation(Id=1, Name=async-step, Type=STEP, StartTimestamp=2026-01-19T16:06:13.053Z, EndTimestamp=2026-01-19T16:06:23.121Z, Status=SUCCEEDED, StepDetails=StepDetails(Attempt=1, Result=*** Sensitive Data Redacted ***)), Operation(Id=2, Name=process-result, Type=STEP, StartTimestamp=2026-01-19T16:06:13.053Z, EndTimestamp=2026-01-19T16:06:23.146Z, Status=SUCCEEDED, StepDetails=StepDetails(Attempt=1, Result=*** Sensitive Data Redacted ***))])).
2026-01-19 16:06:23.150 [pool-2-thread-1] DEBUG com.amazonaws.lambda.durable.execution.ExecutionManager - Advancing phaser 0 -> 1: java.util.concurrent.Phaser@44755686[phase = 2 parties = 1 arrived = 0]
2026-01-19 16:06:23.150 [pool-2-thread-1] DEBUG com.amazonaws.lambda.durable.execution.ExecutionManager - Advancing phaser 1 -> 2: java.util.concurrent.Phaser@44755686[phase = 3 parties = 1 arrived = 0]
2026-01-19 16:06:23.150 [pool-2-thread-1] DEBUG com.amazonaws.lambda.durable.execution.ExecutionManager - Advancing phaser 0 -> 1: java.util.concurrent.Phaser@527ccbcd[phase = 0 parties = 2 arrived = 1]
2026-01-19 16:06:23.150 [pool-2-thread-1] DEBUG com.amazonaws.lambda.durable.execution.ExecutionManager - Advancing phaser 1 -> 2: java.util.concurrent.Phaser@527ccbcd[phase = 1 parties = 2 arrived = 0]
2026-01-19 16:06:23.151 [durable-exec-14] DEBUG com.amazonaws.lambda.durable.execution.ExecutionManager - Registered thread 'durable-exec-14' as active. Active threads: 2
2026-01-19 16:06:23.153 [2-step] DEBUG com.amazonaws.lambda.durable.execution.ExecutionManager - Deregistered thread '2-step'. Active threads: 1
END RequestId: c91a4451-43d7-48f2-b8bf-5bf6c90d1b71
REPORT RequestId: c91a4451-43d7-48f2-b8bf-5bf6c90d1b71  Init Duration: 0.17 ms  Duration: 10796.30 ms   Billed Duration: 10797 ms       Memory Size: 512 MB     Max Memory Used: 512 MB

Checklist

  • I have filled out every section of the PR template
  • I have thoroughly tested this change

@phipag phipag requested a review from maschnetwork January 19, 2026 16:11
@phipag phipag linked an issue Jan 19, 2026 that may be closed by this pull request
@phipag phipag added the enhancement New feature or request label Jan 19, 2026
Copy link
Contributor

@maschnetwork maschnetwork left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. Some comments to discuss.

@phipag phipag self-assigned this Jan 20, 2026
Copy link
Contributor

@maschnetwork maschnetwork left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

this.serDes = serDes;
this.lambdaContext = lambdaContext;
this.operationCounter = new AtomicInteger(0);
this.uniqueThreadName = Thread.currentThread().getName(); // Auto-detect handler thread
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this guaranteed to be unique with all executors? I assume we're going to allow customers to use their own Executor implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point @zhongkechen. I actually tested some edge-cases and we should avoid relying on synchronizing the thread name with our tracking of active threads in Execution Manager.

  • Users can provide their own executor with their own naming strategy which we would overwrite (bad user experience)
  • Thread pools can re-use threads which can lead to side-effects in concurrency scenarios.

A better solution might be to track the currently active thread context with a thread local. Something like:

  • DurableContext -> enters "Context" thread
  • Step.execute() -> enters "Step" thread
  • Step.get() -> check if currently active context is a context or step thread
  • RunInChild context can create a new "Context" thread with another name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed this problem in the newest revision and we raise an exception now when nested step calling happens.

@phipag phipag merged commit fe12ac7 into main Jan 21, 2026
1 check passed
@phipag phipag deleted the phipag/nested-steps branch January 21, 2026 14:14
@phipag phipag mentioned this pull request Jan 26, 2026
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Add support for nested step calling

3 participants