Skip to content

Conversation

@lym953
Copy link
Contributor

@lym953 lym953 commented Oct 21, 2025

This PR

Change the trace request limit from 2 MiB to 50 MiB.

Motivation

When the Node.js tracer layer sends a request to Lambda extension that's between 2 MiB and 50 MiB, the extension closes the HTTP connection, the tracer gets an EPIPE error and breaks. (Maybe the tracer should handle the error better, but that's out of scope of this PR.)

According to @rochdev:

the agent is supposed to have a limit of 50mb

So let's change the limit on agent side to match the expectation.

Testing

Tested with Node.js 22 Lambda with this handler:

import tracer from 'dd-trace';
import crypto from 'crypto';
tracer.init();

function randomGarbage(len) {
  // low-compressibility payload (random bytes -> base64)
  return crypto.randomBytes(len).toString('base64');
}

export const handler = async (event) => {
  const SPANS = 3000;
  const TAG_BYTES_PER_SPAN = 20_000; // ~20 KB per span tag (base64 expands a bit)

  const root = tracer.startSpan('repro.root');
  root.setTag('dd.repro', 'true');

  for (let i = 0; i < SPANS; i++) {
    console.log(`Sending the ${i}-th span`);
    const span = tracer.startSpan('repro.child', { childOf: root });
    span.setTag('blob', randomGarbage(TAG_BYTES_PER_SPAN));
    span.finish();
  }
  root.finish();

  const response = {
    statusCode: 200,
    body: JSON.stringify('Hello from Lambda!'),
  };
  return response;
};

Before:

There are errors like:

Error: write EPIPE
at WriteWrap.onWriteComplete [as oncomplete] (node:internal/stream_base_commons:95:16)
at WriteWrap.callbackTrampoline (node:internal/async_hooks:130:17)
LAMBDA_RUNTIME Failed to post handler success response. Http response code: 403. {"errorMessage":"State transition from Ready to InvocationErrorResponse failed for runtime. Error: State transition is not allowed","errorType":"InvalidStateTransition"}

After

When Lambda's memory is 1024 MB, the error no longer happens.
When Lambda's memory is 512 MB, the invocation can fail due to OOM. But I think that's a legit error. We can ask customers to increase memory limit for high-volume workload like this.

Notes

cc @astuyve who set a MAX_CONTENT_LENGTH of 10 MiB in #294. This PR increases it to 50 MiB as well.

Thanks @dougqh @duncanista @lucaspimentel @rochdev for discussion.

#899
Jira: https://datadoghq.atlassian.net/browse/SVLS-7777

@lym953 lym953 requested a review from a team as a code owner October 21, 2025 20:06
let info_router = Router::new().route(INFO_ENDPOINT_PATH, any(Self::info));
let info_router = Router::new()
.route(INFO_ENDPOINT_PATH, any(Self::info))
.layer(RequestBodyLimitLayer::new(DEFAULT_REQUEST_BODY_LIMIT));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this endpoint probably doesn't need it

@lym953 lym953 merged commit cf23e14 into main Oct 21, 2025
38 checks passed
@lym953 lym953 deleted the yiming.luo/inc-trace-limit branch October 21, 2025 21:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants