Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High quantity of intermittent failures when using DD Trace v0.5 protocol #5314

Closed
thomwatkin opened this issue Jun 2, 2023 · 10 comments
Closed
Assignees
Milestone

Comments

@thomwatkin
Copy link

Ever since the v1.15.0 release of dd-trace-java we've had the overwhelmingly majority of our traces failing across a number of services.

[dd.trace 2023-06-02 13:15:54:800 +0000] [dd-trace-processor] WARN 
  datadog.trace.agent.common.writer.ddagent.DDAgentApi - 
  msgp: attempted to decode type "map" with method for "str" while sending 1 (size=1KB) traces. 
  Total: 8401, Received: 8401, Sent: 1, Failed: 8400. Status: 400 Bad Request (Will not log errors for 5 minutes)

We occasionally will get some successes seen in the logs, though none of these are coming through as traces in Datadog itself and will still always show at least one error.

[dd.trace 2023-06-01 16:16:20:158 +0000] [dd-trace-processor] INFO
  datadog.trace.agent.common.writer.ddagent.DDAgentApi - 
  Success while sending 25 (size=15KB) traces. 
  Total: 26, Received: 26, Sent: 25, Failed: 1.

We're running our services in ECS Fargate, using the Datadog Agent as a sidecar container running the latest version.

We can get the traces working by enabling DD_TRACE_AGENT_V0_5_ENABLED=false on our Java application, or by downgrading the dd-trace-java to the previous release where DD_TRACE_AGENT_V0_5_ENABLED is set to false by default.

That said, pinning to the old protocol version doesn't seem like an ideal long-term fix.

Any idea why the new protocol would be throwing these kinds of errors?

@kayman-mk
Copy link

Relates to/Duplicate of #5313

@internetstaff
Copy link

We lose all trace data with 1.15.0+, so if you have some, you're doing better than we are! :)

Same work around for us.

@randomanderson
Copy link
Contributor

@internetstaff @kayman-mk Are you also running in fargate?

@internetstaff
Copy link

I am indeed.

@kayman-mk
Copy link

I am running on Fargate too, yes.

@randomanderson
Copy link
Contributor

We're releasing 1.15.2 soon that will revert the enabled by default behavior and will continue to look into this issue

@nayeem-kamal nayeem-kamal self-assigned this Jun 5, 2023
@nayeem-kamal
Copy link
Contributor

This should now be resolved with the release of 1.15.3.
Release notes: https://github.com/DataDog/dd-trace-java/releases/tag/v1.15.3

@kayman-mk
Copy link

kayman-mk commented Jun 6, 2023

@nayeem-kamal Still not working. Upgraded to 1.15.3 and the error message is still there. It seems that almost nothing is transferred now to Datadog. APM shows a blank screen. Application runs on Fargate and uses the EU API endpoint.

[dd.trace 2023-06-06 06:33:23:554 +0000] [dd-trace-processor] WARN datadog.trace.agent.common.writer.ddagent.DDAgentApi - msgp: attempted to decode type "map" with method for "str" while sending 4 (size=2KB) traces. Total: 5318, Received: 5318, Sent: 2, Failed: 5316. Status: 400 Bad Request (Will not log errors for 5 minutes)

image

@smola smola added this to the 1.15.3 milestone Jun 6, 2023
@kayman-mk
Copy link

@smola Could you please have a look, what's wrong with 1.15.3?

@fknrio
Copy link

fknrio commented Jun 6, 2023

For me, the new version 1.15.3 works (whereas 1.15.0 did not work). I have a similar setup (Fargate with Datadog sidecar calling the EU API endpoint). The error messages are gone and the APM traces appear again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants