Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agent proxy error: Flush ## is taking more than 10s, stopping it #3474

Open
emmetog opened this issue Aug 14, 2017 · 0 comments
Open

Agent proxy error: Flush ## is taking more than 10s, stopping it #3474

emmetog opened this issue Aug 14, 2017 · 0 comments
Assignees

Comments

@emmetog
Copy link

emmetog commented Aug 14, 2017

Problem

I'm using the agent as a proxy as described here. I have around 15 hosts all using one server as the "proxy", which relays to the datadog public url.

Tailing the logs and restarting the datadog-agent service I find this:

$ tail -f /var/log/datadog/forwarder.log
2017-08-14 13:22:40 BST | WARNING | dd.forwarder | transaction(transaction.py:138) | Removed transaction 3210486 from queue
2017-08-14 13:22:40 BST | WARNING | dd.forwarder | transaction(transaction.py:138) | Removed transaction 3210485 from queue
2017-08-14 13:22:40 BST | WARNING | dd.forwarder | transaction(transaction.py:138) | Removed transaction 3210484 from queue
2017-08-14 13:22:40 BST | WARNING | dd.forwarder | transaction(transaction.py:138) | Removed transaction 3210483 from queue
2017-08-14 13:22:40 BST | INFO | dd.forwarder | forwarder(ddagent.py:593) | caught sigterm. stopping
2017-08-14 13:22:40 BST | INFO | dd.forwarder | forwarder(ddagent.py:575) | Stopped
2017-08-14 13:22:40 BST | INFO | dd.forwarder | forwarder(ddagent.py:593) | caught sigterm. stopping
2017-08-14 13:22:46 BST | INFO | dd.forwarder | root(ddagent.py:146) | Done with custom emitters
2017-08-14 13:22:46 BST | INFO | dd.forwarder | forwarder(ddagent.py:542) | Listening on port 17123
2017-08-14 13:22:46 BST | INFO | dd.forwarder | transaction(transaction.py:177) | Flushing 1 transaction during flush #1
2017-08-14 13:22:47 BST | INFO | dd.forwarder | transaction(transaction.py:177) | Flushing 3 transactions during flush #2
2017-08-14 13:22:49 BST | INFO | dd.forwarder | transaction(transaction.py:177) | Flushing 7 transactions during flush #3
2017-08-14 13:22:53 BST | INFO | dd.forwarder | transaction(transaction.py:177) | Flushing 12 transactions during flush #4
2017-08-14 13:22:59 BST | INFO | dd.forwarder | transaction(transaction.py:177) | Flushing 17 transactions during flush #5
2017-08-14 13:22:59 BST | INFO | dd.forwarder | transaction(transaction.py:195) | First flushes done, next flushes will be logged every 20 flushes.
2017-08-14 13:23:18 BST | WARNING | dd.forwarder | transaction(transaction.py:212) | Flush 6 is taking more than 10s, stopping it
2017-08-14 13:23:28 BST | WARNING | dd.forwarder | transaction(transaction.py:212) | Flush 7 is taking more than 10s, stopping it
2017-08-14 13:23:39 BST | WARNING | dd.forwarder | transaction(transaction.py:212) | Flush 8 is taking more than 10s, stopping it
2017-08-14 13:23:49 BST | WARNING | dd.forwarder | transaction(transaction.py:212) | Flush 9 is taking more than 10s, stopping it
2017-08-14 13:24:00 BST | WARNING | dd.forwarder | transaction(transaction.py:212) | Flush 10 is taking more than 10s, stopping it
2017-08-14 13:24:10 BST | WARNING | dd.forwarder | transaction(transaction.py:212) | Flush 11 is taking more than 10s, stopping it
2017-08-14 13:24:20 BST | WARNING | dd.forwarder | transaction(transaction.py:212) | Flush 12 is taking more than 10s, stopping it
2017-08-14 13:24:31 BST | WARNING | dd.forwarder | transaction(transaction.py:212) | Flush 13 is taking more than 10s, stopping it
2017-08-14 13:24:41 BST | WARNING | dd.forwarder | transaction(transaction.py:212) | Flush 14 is taking more than 10s, stopping it
2017-08-14 13:24:52 BST | WARNING | dd.forwarder | transaction(transaction.py:212) | Flush 15 is taking more than 10s, stopping it
2017-08-14 13:25:02 BST | WARNING | dd.forwarder | transaction(transaction.py:212) | Flush 16 is taking more than 10s, stopping it
2017-08-14 13:25:13 BST | WARNING | dd.forwarder | transaction(transaction.py:212) | Flush 17 is taking more than 10s, stopping it
2017-08-14 13:25:23 BST | WARNING | dd.forwarder | transaction(transaction.py:212) | Flush 18 is taking more than 10s, stopping it
2017-08-14 13:25:34 BST | WARNING | dd.forwarder | transaction(transaction.py:212) | Flush 19 is taking more than 10s, stopping it
2017-08-14 13:25:34 BST | INFO | dd.forwarder | transaction(transaction.py:177) | Flushing 345 transactions during flush #20
2017-08-14 13:25:45 BST | WARNING | dd.forwarder | transaction(transaction.py:212) | Flush 20 is taking more than 10s, stopping it
2017-08-14 13:25:55 BST | WARNING | dd.forwarder | transaction(transaction.py:212) | Flush 21 is taking more than 10s, stopping it

This goes on like this. I still see the data in the graphs on the datadog web interface though, although I'm interested in a process check on one of the servers which doesn't appear in the web, I'm wondering is it getting lost because the proxy doesn't pass it on.

Output of the info page

$ sudo service datadog-agent info
2017-08-14 13:32:08,134 | WARNING | dd.collector | utils.dockerutil(dockerutil.py:94) | Error while detecting orchestrator: Error while fetching server API version: ('Connection aborted.', error(2, 'No such file or directory'))
====================
Collector (v 5.16.0)
====================

  Status date: 2017-08-14 13:31:52 (15s ago)
  Pid: 6878
  Platform: Linux-4.4.0-83-generic-x86_64-with-Ubuntu-16.04-xenial
  Python Version: 2.7.13, 64bit
  Logs: <stderr>, /var/log/datadog/collector.log, syslog:/dev/log

  Clocks
  ======
  
    NTP offset: 52.3057 s
    System UTC time: 2017-08-14 12:32:08.188551
  
  Paths
  =====
  
    conf.d: /etc/dd-agent/conf.d
    checks.d: /opt/datadog-agent/agent/checks.d
  
  Hostnames
  =========
  
    socket-hostname: <redacted>
    hostname: <redacted>
    socket-fqdn: <redacted>
  
  Checks
  ======
  
    network (5.16.0)
    ----------------
      - instance #0 [OK]
      - Collected 19 metrics, 0 events & 0 service checks
  
    process (5.16.0)
    ----------------
      - instance #0 [OK]
      - instance #1 [OK]
      - instance #2 [WARNING]
          Warning: No matching process was found
      - Collected 33 metrics, 0 events & 3 service checks
  
    ntp (5.16.0)
    ------------
      - Collected 0 metrics, 0 events & 0 service checks
  
    disk (5.16.0)
    -------------
      - instance #0 [OK]
      - Collected 32 metrics, 0 events & 0 service checks
  
    http_check (5.16.0)
    -------------------
      - instance #0 [OK]
      - instance #1 [OK]
      - instance #2 [OK]
      - instance #3 [OK]
      - instance #4 [OK]
      - instance #5 [OK]
      - instance #6 [OK]
      - instance #7 [OK]
      - instance #8 [OK]
      - Collected 9 metrics, 0 events & 9 service checks
  
  
  Emitters
  ========
  
    - http_emitter [OK]

====================
Dogstatsd (v 5.16.0)
====================

  Status date: 2017-08-14 13:32:06 (1s ago)
  Pid: 6875
  Platform: Linux-4.4.0-83-generic-x86_64-with-Ubuntu-16.04-xenial
  Python Version: 2.7.13, 64bit
  Logs: <stderr>, /var/log/datadog/dogstatsd.log, syslog:/dev/log

  Flush count: 56
  Packet Count: 952
  Packets per second: 1.7
  Metric count: 9
  Event count: 0
  Service check count: 0

====================
Forwarder (v 5.16.0)
====================

  Status date: 2017-08-14 13:32:05 (3s ago)
  Pid: 6874
  Platform: Linux-4.4.0-83-generic-x86_64-with-Ubuntu-16.04-xenial
  Python Version: 2.7.13, 64bit
  Logs: <stderr>, /var/log/datadog/forwarder.log, syslog:/dev/log

  Queue Size: 2340983 bytes
  Queue Length: 1166
  Flush Count: 57
  Transactions received: 2158
  Transactions flushed: 992
  Transactions rejected: 0
  API Key Status: API Key is valid
  

======================
Trace Agent (v 5.16.0)
======================

  Pid: 6873
  Uptime: 566 seconds
  Mem alloc: 2710240 bytes

  Hostname: <redacted>
  Receiver: 0.0.0.0:8126
  API Endpoint: https://trace.agent.datadoghq.com

  Bytes received (1 min): 0
  Traces received (1 min): 0
  Spans received (1 min): 0

  Bytes sent (1 min): 0
  Traces sent (1 min): 0
  Stats sent (1 min): 0

Additional environment details (Operating System, Cloud provider, etc):

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 16.04.2 LTS
Release:	16.04
Codename:	xenial

Steps to reproduce the issue:

  1. Configure an agent as a proxy as described here.
  2. Configure lots of hosts (around 15 in my case) to send stats to the proxy

Describe the results you received:

See above

Describe the results you expected:

See above

@hush-hush hush-hush removed their assignment Aug 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants