Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dd-trace with datadog-lambda-js in Lambda throws error #3699

Open
kasleet opened this issue Oct 10, 2023 · 0 comments
Open

dd-trace with datadog-lambda-js in Lambda throws error #3699

kasleet opened this issue Oct 10, 2023 · 0 comments
Assignees
Labels
bug Something isn't working serverless

Comments

@kasleet
Copy link

kasleet commented Oct 10, 2023

Expected behaviour
We are using dd-trace and datadog-lambda-js in our Lambdas for logging and tracing, which works out quite well.

We encountered the following scenario, where dd-trace will throw an error:

  1. We have a Lambda which is triggered by a message in SQS. The timeout of the lambda is set to 30 seconds. The visibilityTimeout of the queue is set to 3 minutes.
  2. The Lambda reads a new message from SQS, processes it and fails with an error (in our use case this is totally fine)
  3. After the message is visible again (~ 3 minutes later), the Lambda will try to process the message again.
  4. This immediately fails with an exception within the dd-trace lambda/handler.js crashFlush method. Somehow, tracer._processor is undefined and the .killAll() call will fail (https://github.com/DataDog/dd-trace-js/blob/8511189932652f362baaeab2323cb9f94ed79975/packages/dd-trace/src/lambda/handler.js#L56C3-L56C30)

After some debugging, we saw in the logs from dd-trace that the Lambda RequestId and the traceId of the requests are somehow the same. We think that somehow the old context of the first request is used by or passed to the second request in dd-trace. This also explains why crashFlush is called, because dd-trace thinks that the lambda timed out (still doesn't explain why the _processor is undefined, but I think that's not relevant for now).

Here is a obfuscated CloudWatch screenshot, which shows this behaviour. I tried to mark the important parts with different colors for both requests.

Screenshot 2023-10-10 at 17 26 17

I don't know if this is a bug, it could very likely be that we are using or configuring dd-trace and datadog-lambda-js in a wrong way.

Actual behaviour
Executing the second request with the actual context and don't throw an error.

Steps to reproduce
As described above. Here is a code snippet regarding our setup (we created our own custom 'lambda middleware' setup).

import type { Context } from 'aws-lambda'

const middleware = <TEvent, TResult>(
  event: TEvent,
  context: Context,
  next: (event: TEvent) => Promise<TResult>,
): Promise<TResult> => {
  const { datadog } = require('datadog-lambda-js')
  const { tracer } = require('dd-trace')

  const wrapped = datadog((e: TEvent) => {
    tracer.init({})

    return tracer.trace(
      `init middleware`,
      () => {
        return next(e)
      },
    )
  }, { logForwarding: true })

  return wrapped(event, context)
}

Environment

  • Operation system/Lambda: Runtime Version: nodejs:18.v13 Runtime Version ARN: arn:aws:lambda:eu-central-1::runtime:0229ff5ced939264450549058d8f267110e92677c27063e6dcd781a280f2462b
  • Tracer version: newest / arn:aws:lambda:eu-central-1:464622532012:layer:Datadog-Node18-x:98
@kasleet kasleet added the bug Something isn't working label Oct 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working serverless
Projects
None yet
Development

No branches or pull requests

3 participants