Skip to content

Agent Architecture

alq666 edited this page Dec 14, 2012 · 18 revisions

Introduction

This page gives you an overview of the agent, what its components are, how they interact with each other, how we collect data on the machine and how it is transmitted to Datadog HQ (https://app.datadoghq.com)

Components

The agent is composed of 5 major components, all written in Python. Each component runs in a separate process.

  1. The collector (agent.py), responsible for gathering system and application metrics from the machine.
  2. The forwarder (ddagent.py), responsible for buffering and communicating with Datadog HQ over SSL.
  3. dogstatsd (dogstatsd.py), responsible for aggregating local metrics sent from your code
  4. pup (pup.py), a simple web-socket frontend to display metrics being collected in real time
  5. supervisord, responsible for keeping all previous processes up and running.
graphite  ---(tcp/17124)--->
                           |
                           |
dogstatsd ---(http)------> |
                         | |
                         v v
collector ---(http)---> forwarder ---(https)---> datadoghq
    |
    >--------(http)---> pup ---(http/17125)----> localhost browser

When running on platforms with Python <= 2.4, only the collector and dogstatsd are available.

Supervision, Privileges and Network Ports

supervisord runs a master process as root and forks all subprocesses as the user dd-agent. The agent configuration resides at /etc/dd-agent/datadog.conf and /etc/dd-agent/conf.d. All configuration must be readable by dd-agent. The recommended permissions are 0600 since configuration files contain your API key and other credentials needed to access metrics (e.g. mysql, postgresql metrics).

The following ports are open for normal operations:

  • forwarder tcp/17123 for normal operations and tcp/17124 if graphite support is turned on
  • dogstatsd udp/8125
  • pup tcp/17125

All listening processes are bound by default to 127.0.0.1 and/or ::1 on v 3.4.1 and greater of the agent. In earlier versions, they were bound to 0.0.0.0 (i.e. all interfaces).

The Collector

This is where all standard metrics are gathered, every 15 seconds. To do so, the collector uses a number of methods:

  1. Temporarily execing standard utilities such as vmstat, mpstat, iostat, varnishstat and parsing their results returned over process pipes.
  2. Connecting to applications' monitoring interfaces over tcp and http and parsing their results (e.g. Apache, MySQL, JMX-based applications)
  3. Tailing open files to extract metrics (e.g. nagios, jenkins)

The collector also supports the execution of python-based, user-provided checks, stored in /usr/share/datadog/agent/checks.d. User-provided checks must inherit from the AgentCheck abstract class defined in checks/init.py.

The Forwarder

The forwarder listens over HTTP for incoming requests to buffer and forward over HTTPS to Datadog HQ. Bufferring allows for network splits to not affect metric reporting. Metrics will be buffered in memory until a limit in size or number of outstanding requests to send is reached. Afterwards the oldest metrics will be discarded to keep the forwarder's memory footprint manageable.

Note that the collector has the ability to send its results directly to Datadog HQ over HTTPS in case the forwarder is not present, but it cannot buffer results.