Skip to content

Agent Developer Mode

vagelim edited this page Jun 22, 2015 · 12 revisions

Agent Developer Mode

The Agent Developer Mode allows the user to collect a wide array of metrics concerning the performance of the agent itself. It provides visibility into bottlenecks when writing an AgentCheck and when making changes to the collector core.

Enabling Agent Developer Mode

The developer mode can be enabled by adding to your datadog.conf file

developer_mode: yes

Be sure to restart the agent after modifying the configuration file.

There is also an option to override the datadog.conf setting with the --profile command-line flag (e.g. python agent.py start --profile). When in developer mode the following functionality is enabled in the agent:

  1. Metrics for collection time, emit time and CPU used are sent to Datadog on every collector run.
  2. The collector loop is profiled using cProfile. At an interval specified by collector_profile_interval in the configuration file, the pstats output for the collector loop is dumped to log.debug as well as to the file ./collector-stats.dmp.
  3. An additional check agent_metrics is run at the end of every collector loop. This check collects a variety of metrics about the collector's performance, and can be configured with the same interface used to configure regular AgentChecks. Source code for this check can be found under checks.d/agent_metrics.py

Configuring the Agent Metrics Check

Here is an example configuration for the agent_metrics check:

init_config:
  process_metrics:
    - name: get_memory_info
      type: gauge
      active: yes
    - name: get_io_counters
      type: rate
      active: yes
    - name: get_connections
      type: gauge
      active: no

instances:
    [{}]

Each element in the process_metrics list represents a single psutil.Process method that will be executed against the running collector process. The name field specifies the name of the method, the type field specifies the metric type (currently only gauge and rate are supported), and the active field is a utility flag to activate/deactivate certain method calls during the check. Note the method specified in name is executed only when:

  1. The method is available on the psutil.Process class as of psutil==2.1.1
  2. The underlying OS supports the execution of that method (e.g get_io_counters is not available for OS X processes)

If the agent_metrics check cannot execute a particular method, it logs a warning and continues with its business. For debugging, the list of metrics collected in this check is available in the log (grep for AGENT STATS)

Metrics collected via the psutil methods are parsed and aggregated in a namespace derived from the method name and its output. E.g. get_memory_info is parsed to datadog.agent.collector.memory_info.rss and datadog.agent.collector.memory_info.vms. The logic for this parsing lives here and here. Once computed, these metrics are then aggregated and forwarded to Datadog as with any other AgentCheck

Profiling an individual check

It is sometimes useful to profile individual checks to spot bottlenecks and critical paths in agent performance. When used with agent.py check the --profile flag dumps some interesting profiling information to stdout. Presently this consists of the following:

  1. Check runtime
  2. Memory use and Disk I/O if available
  3. Pstats output restricted to 20 calls.

Here is an example of what you see when profiling the network check

Clone this wiki locally