fix(tests): stabilize flaky durable timeout tests#9035
Conversation
The ExecutionTimeout and wait_for_callback timeout integration tests were intermittently failing due to an upstream emulator/SDK issue where event payloads occasionally fail to propagate to the Lambda handler. When the input was lost, handlers fell back to short default durations and the executions completed successfully instead of timing out, causing the assertions to fail. Replace the event-driven handlers with dedicated handlers that hardcode the wait and callback timeout durations, removing the dependency on the event payload reaching the function and making the tests deterministic.
The :latest tag of public.ecr.aws/durable-functions/aws-durable-execution-emulator currently fails to persist execution state, causing GetDurableExecution to return 404 immediately after a successful start_durable_execution. This breaks every durable integration test (HelloWorld, NamedStep, callbacks, etc.), not just the ones modified in this PR. Pin the emulator image tag for the durable-functions CI matrix entry so the suite is reproducible until :latest is republished cleanly.
roger-zhangg
left a comment
There was a problem hiding this comment.
Coverage concern (finding 1 of 3): SAM CLI event-payload propagation to durable functions is no longer exercised by any test that would fail if the payload were silently dropped.
The two timeout tests being rewritten here were effectively the only durable integ tests whose pass/fail signal depended on event payload values reaching the handler. After this change:
test_local_invoke_callback_heartbeat(tests/integration/local/invoke/test_invoke_durable.py:181) passes{timeout_seconds: 60, heartbeat_timeout_seconds: 30}, but completes via callback success well before any timeout — it would still pass with the handler defaults (120/60).test_local_callback_cli_heartbeat(tests/integration/local/start_lambda/test_start_lambda_durable.py:339) follows the same pattern and has the same property.
The PR description attributes the flake to an upstream emulator/SDK issue. If that's confirmed, this is acceptable — but the description doesn't include emulator/SDK logs proving the propagation gap is purely upstream. If SAM CLI is partly to blame for invoke-event drops to durable handlers, this PR removes the only integ-level detector.
Suggestion: add (or restore) one durable test where the assertion explicitly depends on the event payload being honored — e.g., a wait-for-callback test that sends a short-timeout payload and asserts CallbackTimedOut, or an invoke test that varies wait_seconds between two values and asserts the resulting wait duration. Otherwise a real propagation regression would land silently regardless of where the underlying bug lives.
| # Assert invoke output shows timeout | ||
| execution_arn = self.assert_invoke_output( | ||
| stdout, input_data={"wait_seconds": 30}, execution_name=execution_name, expected_status="TIMED_OUT" | ||
| stdout, input_data={}, execution_name=execution_name, expected_status="TIMED_OUT" |
There was a problem hiding this comment.
Finding 2 of 3: this weakens the only durable invoke test that asserted a non-trivial input made it into the execution-history output.
Before: input_data={"wait_seconds": 30} — assert_invoke_output validated the input shown in the execution actually matched what was passed in.
After: input_data={} — there's nothing to validate against, so this branch of assert_invoke_output is effectively a no-op for propagation. Combined with the switch to --no-event, the timeout assertion now passes solely because the hardcoded handler does what it does, regardless of whether the event channel works at all.
If the goal is just stabilization, that's fine, but please consider keeping at least one durable test that asserts the input round-trips correctly through invoke → handler → execution history.
| function_name, event_path=event_path, durable_execution_name=execution_name | ||
| ) | ||
| command_list = self.get_invoke_command_list(function_name, no_event=True, durable_execution_name=execution_name) | ||
|
|
There was a problem hiding this comment.
Finding 3 of 3: the -e <event_path> code path for durable invokes is no longer exercised by the timeout test.
It's still covered by test_local_invoke_callback_heartbeat (line 181) which uses event_path=..., so it's not a total loss — flagging mainly because the timeout flow was the test most likely to fail on a regression in event-file handling specific to durable invoke (e.g., payload encoding, file read path differences when DurableExecutionName is also set). Worth a sanity check that the heartbeat test would actually catch such a regression, since it doesn't assert on the event contents being honored either.
|
Thanks for the careful review — these are fair findings. I traced the propagation path locally to evaluate the suggested coverage: What I tried: added an What I found: the test reproduces a real propagation drop. SAM CLI prints On scope for this PR: since this is upstream of the timeout-flake fix, and a working propagation integ test would currently fail in CI on develop today, I'm leaving #9035 narrow:
|
Summary
ExecutionTimeoutandwait_for_callbacktimeout integration tests have been intermittently failing on unrelated PRs (e.g. run 26182603010).wait_seconds=2,timeout_seconds=120) and executions completeSUCCEEDED/ stayRUNNINGinstead of timing out, so the assertions fail.Changes:
wait/wait_30_seconds.py(hardcoded 30s wait) wired up to theExecutionTimeoutfunction.wait_for_callback/wait_for_callback_short_timeout.py(hardcoded 5s/3s) wired up to a newWaitForCallbackShortTimeoutfunction.--no-eventand the new dedicated functions; obsoletetimeout_test_event.jsondeleted.Test plan
pytest tests/integration/local/invoke/test_invoke_durable.py::TestInvokeDurable::test_local_invoke_durable_function_timeoutpytest tests/integration/local/start_lambda/test_start_lambda_durable.py::TestStartLambdaDurable::test_local_start_lambda_invoke_timeoutpytest tests/integration/local/start_lambda/test_start_lambda_durable.py::TestStartLambdaDurable::test_local_start_lambda_invoke_wait_for_callback_timeoutHelloWorld,NamedStep,NamedWait,MapOperations,Parallel, callback success/failure/heartbeat) still pass and were not affected by template changes.