False "Failed to send traces to Datadog Agent" errors #1155

dreki · 2019-12-02T16:07:19Z

Which version of dd-trace-py are you using?

0.31.0

Which version of the libraries are you using?

contentful==1.12.3
ddtrace==0.31.0
ipython==7.1.1
python-box==3.4.0
python-json-logger==0.1.11
tornado==6.0.3
validators==0.12.4
tenacity==6.0.0
contentful==1.12.3

How can we reproduce your problem?

Here's our init code:

import ddtrace

# this must be called before `tornado` packages are imported
if os.environ.get('DD_AGENT_HOST'): # noqa
    ddtrace.patch(tornado=True)  # noqa

import tornado.ioloop
import tornado.locks
import tornado.log
import tornado.options
import tornado.web

And our tornado application settings:

            datadog_trace={
                'default_service': self.name,
                'tags': {'debug': DEBUG},
                'analytics_enabled': True,
                'agent_hostname': os.environ.get('HOSTNAME', 'localhost'),
            },

What is the result that you get?

Every 7-15 minutes or so (unpredictable rate of incidence) we see a "Failed to send traces to Datadog Agent" entry in our Datadog logs, as an error.

What is result that you expected?

If there is an error, more cause information. If it's a false negative (which we suspect), prevention of this false negative.

The text was updated successfully, but these errors were encountered:

dreki · 2019-12-02T16:08:43Z

Here's a sample log message:

Time | Host
-----------------------------
14:45:47 UTC | gke-hostname-hidden
Failed to send traces to Datadog Agent at http://ourservice-deployment-xx-zz:8129:
  [Errno 104] Connection reset by peer
-----------------------------

majorgreys · 2019-12-06T20:46:53Z

This log record is created when the request to send traces throws an exceptions, which we log in this case along with its exception message.

@dreki Why do you think this is a false negative?

It could be a networking issue in your k8s environment or perhaps the agent restarting. Logs from either might shed light on what is happening when the tracer gets this exception upon a put request to the agent.

dreki · 2019-12-09T14:24:00Z

@majorgreys Discussion online made it sound like it was an acknowledged negative. If this is legit, that's great; would it be possible to log further exception information? We've found no other error messages anywhere to hint at what this might be.

JordanP · 2019-12-11T12:46:36Z

We also got that since a long time (not 0.31 specific), so much that we had to put

logging.getLogger("ddtrace.writer").setLevel(logging.CRITICAL)

somewhere in our code.

It would be great to log more info here to help you debug this.

majorgreys · 2019-12-13T18:59:58Z

@dreki and @JordanP You might want to consider rate limiting the library logger using the DD_LOGGING_RATE_LIMIT as described in internal/logger.py. This is not currently documented in our library documentation, but we are planning it in an upcoming update.

I would encourage you to reach out to Datadog support as we will need more specific information from your deployment of the agent and tracer to identify why these networking errors are occuring.

JerryVerhoef · 2020-01-16T14:55:25Z

It could be a networking issue in your k8s environment or perhaps the agent restarting. Logs from either might shed light on what is happening when the tracer gets this exception upon a put request to the agent.

We are also exeriencing the same problem. But in our k8s environment we are also using the PHP client, this client is not reporting any errors at all. So either the PHP client is buggy in reporting the error or this is not a networking issue of the k8s environment.

dreki · 2020-04-13T13:38:24Z

Just a note to say that this is still affecting us daily.

majorgreys · 2020-05-28T13:02:56Z

@dreki @JerryVerhoef @JordanP

The v0.37.0 release included a fix that potentially resolves a scenario where you might see such log events. Can you confirm whether this still is happening?

lu911 · 2020-06-06T08:33:09Z

@majorgreys still happen

dd-trace-py: 0.38.0
message: Failed to send traces to Datadog Agent at <ddtrace.api.API object at 0x7fa6e133f250>: ConnectionResetError(104, 'Connection reset by peer')

dreki · 2020-06-11T13:28:34Z

Thank you for checking, @YukSeungChan. Thank you for the attention @majorgreys.

FYI @peter-bertuglia, looks like this is getting more attention.

peter-bertuglia · 2020-06-11T14:02:17Z

With v0.37.1 we're still seeing the error with high frequency

kamyar · 2020-06-11T17:16:35Z

We are also seeing this error, can you give a timeline of when this will be fixed?

Kyle-Verhoog · 2020-06-11T18:11:42Z

👋 hi all, we suspect this is an issue with the agent it has a limit to the number of connections it can handle by default. Please see: https://docs.datadoghq.com/tracing/troubleshooting/agent_rate_limits/

Let us know if you're seeing the error in your agents logs and if bumping the limit helps at all. 🙂

peter-bertuglia · 2020-10-07T19:00:46Z

After upgrading to 0.42.0 and fixing an egregious mistake in our ddtrace_settings (wrong env var for DD_AGENT_HOST) we're now properly sending traces and we no longer receive the Failed to send traces to Datadog Agent while services are running. However the new startup logs added in 0.42.0 still complain that the agent is not reachable DATADOG TRACER DIAGNOSTIC - Agent not reachable. Exception raised: [Errno 111] Connection refused. This is logged on every startup without exception. Since we're no longer getting logs about failed tracing and we're getting all of our data in Datadog, it seems that we can ignore this but wouldn't hate understanding why it gets logged out.

paulkarayan · 2020-10-13T18:36:03Z

@Kyle-Verhoog - that link isnt going to what I think it should? maybe I'm wrong. is it: https://docs.datadoghq.com/tracing/troubleshooting/agent_rate_limits/
?

Kyle-Verhoog · 2020-10-14T15:32:28Z

@peter-bertuglia yes we're aware of this. It's a diagnostic log that we print on startup to help onboarding however this is done for the default hostname/port configuration of the client so if your agent is located somewhere else it'll create noise. We're working to address this in #1715 and #1717.

@paulkarayan hmm yeah looks like my link is dead now. Thanks for posting the updated one!

dreki · 2020-10-15T01:04:29Z

@peter-bertuglia has been able to prevent this issue from occurring in our infrastructure. So this issue is 'resolved' from our end, in a manner of speaking. @Kyle-Verhoog, would y'all prefer to have this issue be closed, or leave it open since other folks may have a similar but different issue?

Kyle-Verhoog · 2020-10-15T15:18:11Z

@dreki since the initial issue is resolved let's close it for now. We can always re-open or open a new issue (since I think most folks have a similar but different issue). Thanks for following up! 🙂

luvpreetsingh mentioned this issue Mar 30, 2020

Datadog Agent sends traces to localhost even when agent address is changed #1316

Closed

Kyle-Verhoog mentioned this issue Apr 14, 2020

tracer: stop previous writer if a new one is created #1356

Merged

majorgreys self-assigned this May 28, 2020

Kyle-Verhoog closed this as completed Oct 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

False "Failed to send traces to Datadog Agent" errors #1155

False "Failed to send traces to Datadog Agent" errors #1155

dreki commented Dec 2, 2019

dreki commented Dec 2, 2019 •

edited

majorgreys commented Dec 6, 2019 •

edited

dreki commented Dec 9, 2019

JordanP commented Dec 11, 2019

majorgreys commented Dec 13, 2019

JerryVerhoef commented Jan 16, 2020

dreki commented Apr 13, 2020

majorgreys commented May 28, 2020

lu911 commented Jun 6, 2020

dreki commented Jun 11, 2020 •

edited

peter-bertuglia commented Jun 11, 2020

kamyar commented Jun 11, 2020

Kyle-Verhoog commented Jun 11, 2020 •

edited

peter-bertuglia commented Oct 7, 2020

paulkarayan commented Oct 13, 2020

Kyle-Verhoog commented Oct 14, 2020 •

edited

dreki commented Oct 15, 2020 •

edited

Kyle-Verhoog commented Oct 15, 2020

False "Failed to send traces to Datadog Agent" errors #1155

False "Failed to send traces to Datadog Agent" errors #1155

Comments

dreki commented Dec 2, 2019

Which version of dd-trace-py are you using?

Which version of the libraries are you using?

How can we reproduce your problem?

What is the result that you get?

What is result that you expected?

dreki commented Dec 2, 2019 • edited

majorgreys commented Dec 6, 2019 • edited

dreki commented Dec 9, 2019

JordanP commented Dec 11, 2019

majorgreys commented Dec 13, 2019

JerryVerhoef commented Jan 16, 2020

dreki commented Apr 13, 2020

majorgreys commented May 28, 2020

lu911 commented Jun 6, 2020

dreki commented Jun 11, 2020 • edited

peter-bertuglia commented Jun 11, 2020

kamyar commented Jun 11, 2020

Kyle-Verhoog commented Jun 11, 2020 • edited

peter-bertuglia commented Oct 7, 2020

paulkarayan commented Oct 13, 2020

Kyle-Verhoog commented Oct 14, 2020 • edited

dreki commented Oct 15, 2020 • edited

Kyle-Verhoog commented Oct 15, 2020

dreki commented Dec 2, 2019 •

edited

majorgreys commented Dec 6, 2019 •

edited

dreki commented Jun 11, 2020 •

edited

Kyle-Verhoog commented Jun 11, 2020 •

edited

Kyle-Verhoog commented Oct 14, 2020 •

edited

dreki commented Oct 15, 2020 •

edited