Skip to content

fix: handle_end_invocation race condition when extracting payload#1024

Merged
duncanista merged 2 commits intomainfrom
pmartinez/race-condition
Mar 5, 2026
Merged

fix: handle_end_invocation race condition when extracting payload#1024
duncanista merged 2 commits intomainfrom
pmartinez/race-condition

Conversation

@pablomartinezbernardo
Copy link
Contributor

@pablomartinezbernardo pablomartinezbernardo commented Feb 12, 2026

Overview

  1. The first thing handle_end_invocation does is spawn a task, let's call it anonymousTask
  2. handle_end_invocation then immediately returns 200 so the tracer can continue
  3. anonymousTask is busy with a complex body in extract_request_body for the time being
  4. Because the tracer has continued, eventually PlatformRuntimeDone is processed
  5. Given our customer is not managed (initialization_type: SnapStart) then PlatformRuntimeDone tries to pair_platform_runtime_done_event which is None because anonymousTask is still busy with the body
  6. We then jump to process_on_platform_runtime_done
  7. Span and trace ids are not there yet, and they are never checked again after this
  8. anonymousTask finally completes, but that's irrelevant because send_ctx_spans is only run on PlatformRuntimeDone which assumes universal_instrumentation_end has already been sent

Why this looks likely

In the customer's logs we can see

  • 05:11:48.463 datadog.trace.agent.core.DDSpan - Finished span (WRITTEN): DDSpan [ t_id=2742542901019652192
  • 05:11:48.489 PlatformRuntimeDone received
  • 05:11:48.630 REPORT RequestId 1db22159-7200-43c8-bec1-11b89df4f099 (last log emitted in an execution)
  • 05:11:53.784 START RequestId: 8c801767-e21b-43f7-bd11-078bb64bc430 (new request id, 5s later)
  • 05:11:53.789 Received end invocation request from headers:{""x-datadog-trace-id"": ""2742542901019652192"... -> we are now trying to finish the span after the request is long gone 🙃

In this specific run, the lambda even had time to stop before continuin with the anonymous task from handle_end_invocation.

Motivation

SLES-2666

Performance

This PR makes the reading of the body synchronous with the response. This will delay handing over execution to outside the extension until the body is read. But that is irrelevant because it is a requirement to read the body and send universal_instrumentation_end before relinquishing control.

Testing

Suggestions very welcome

@pablomartinezbernardo pablomartinezbernardo changed the title [SLES-2666] extract_request_body before exiting handle_end_invocation [SLES-2666] handle_end_invocation race condition Feb 12, 2026
@duncanista
Copy link
Contributor

Updating branch and creating an RC from it

@duncanista duncanista force-pushed the pmartinez/race-condition branch from 7316b72 to 51d0f6d Compare March 5, 2026 20:27
@duncanista duncanista marked this pull request as ready for review March 5, 2026 20:28
@duncanista duncanista requested a review from a team as a code owner March 5, 2026 20:28
@duncanista duncanista changed the title [SLES-2666] handle_end_invocation race condition fix: handle_end_invocation race condition when extracting payload Mar 5, 2026
@duncanista duncanista merged commit 7bc7e7c into main Mar 5, 2026
48 of 49 checks passed
@duncanista duncanista deleted the pmartinez/race-condition branch March 5, 2026 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants