Skip to content

Agent Architecture

Quentin Madec edited this page Oct 16, 2016 · 18 revisions

Introduction

This page gives you an overview of the agent, what its components are, how they interact with each other, how we collect data on the machine and how it is transmitted to Datadog HQ (https://app.datadoghq.com)

Components

The agent is composed of 4 major components, all written in Python. Each component runs in a separate process.

  1. The collector (agent.py), responsible for gathering system and application metrics from the machine.
  2. The forwarder (ddagent.py), responsible for buffering and communicating with Datadog HQ over SSL.
  3. dogstatsd (dogstatsd.py), responsible for aggregating local metrics sent from your code
  4. supervisord, responsible for keeping all previous processes up and running.
graphite  ---(tcp/17124)--->
                           |
                           |
dogstatsd ---(http)------> |
                         | |
                         v v
collector ---(http)---> forwarder ---(https)---> datadoghq

Supervision, Privileges and Network Ports

supervisord runs a master process as dd-agent and forks all subprocesses as the user dd-agent. The agent configuration resides at /etc/dd-agent/datadog.conf and /etc/dd-agent/conf.d. All configuration must be readable by dd-agent. The recommended permissions are 0600 since configuration files contain your API key and other credentials needed to access metrics (e.g. mysql, postgresql metrics).

The following ports are open for normal operations:

  • forwarder tcp/17123 for normal operations and tcp/17124 if graphite support is turned on
  • dogstatsd udp/8125

All listening processes are bound by default to 127.0.0.1 and/or ::1 on v 3.4.1 and greater of the agent. In earlier versions, they were bound to 0.0.0.0 (i.e. all interfaces).

For more advanced network information, see the Network & Proxy Configuration page

The Collector

This is where all standard metrics are gathered, every 15 seconds. To do so, the collector uses a number of methods:

  1. Temporarily execing standard utilities such as vmstat, mpstat, iostat, varnishstat and parsing their results returned over process pipes.
  2. Connecting to applications' monitoring interfaces over tcp and http and parsing their results (e.g. Apache, MySQL, some JMX-based applications)
  3. Tailing open files to extract metrics (e.g. nagios, jenkins)

The collector also supports the execution of python-based, user-provided checks, stored in /opt/datadog-agent/agent/checks.d. User-provided checks must inherit from the AgentCheck abstract class defined in checks/init.py.

The Forwarder

The forwarder listens over HTTP for incoming requests to buffer and forward over HTTPS to Datadog HQ. Bufferring allows for network splits to not affect metric reporting. Metrics will be buffered in memory until a limit in size or number of outstanding requests to send is reached. Afterwards the oldest metrics will be discarded to keep the forwarder's memory footprint manageable.

Interprocess communication is not authenticated or encrypted.

Note that the collector has the ability to send its results directly to Datadog HQ over HTTPS in case the forwarder is not present, but it cannot buffer results.

dogstatsd

Dogstatsd is a python implementation of etsy's statsD metric aggregation daemon. It is used to receive and roll up arbitrary metrics over UDP, thus allowing custom code to be instrumented without adding latency to the mix.

In the Agent, JMXFetch-based checks (e.g. cassandra, kafka, etc) and go-metro both send the metrics they collect through Dogstatsd.

Learn more about dogstatsd in our documentation.

Note: A script to benchmark dogstatsd memory consumption is available at this gist