[core] Collect run-time metrics #819

Kyle-Verhoog · 2019-02-09T08:02:37Z

In likeness of DataDog/dd-trace-rb#677 this PR brings run-time metrics collection to the Python tracer.

Overview

Metrics are periodically polled from various source modules such as gc, platform and psutil and sent to Datadog (through the Agent) via dogstatsd. Some example metrics are:

runtime.python.thread_count
- the number of threads being used by the python process
runtime.python.mem.rss
- "Resident Set Size" the amount of non-swapped physical memory a process has used in bytes
runtime.python.gc.gen1_count
- the number of generation 1 objects tracked by the garbage collector see https://pythoninternal.wordpress.com/2014/08/04/the-garbage-collector/ for more info

Metrics are tagged with static values which are obtained on startup. These include values like:

runtime.python.lang_interpreter
- could be cpython or jython or pypy for example
runtime.python.lang_version
- language version (eg. 3.7.2)

What this (roughly) looks like in Datadog (TODO: better data):

TODOs

Add service name as tag to metrics
Support for forking or process spawning Python applications
Configuration
- figure out when/how to enable
  - when datadog module detected
  - additional dependencies?
- enabling and disabling tags
- flush interval
- agent details
Documentation
Some kind of sanity testing
- should ensure collection never raises exceptions
- various combinations of dependent libraries installed (ie. with and without psutil)

brettlangdon · 2019-02-09T13:01:05Z

Are context switch and CPU time just always going to be incrementing during runtime?

Is there an alert or check you would do on either? Like if context switches are above/below a certain value? (Same for CPU time)

brettlangdon

starting dig in a little, need to spend more time with this PR

ddtrace/runtime_metrics.py

tox.ini

brettlangdon

a bunch of minor comments.

Seems a little over-engineered, but the direction in general is good.

For logging, please use ddtrace.internal.logger.get_logger and only use log.warn, log.error, and log.debug.

Also, the runtime metrics worker, we should get 1 worker per-process, similar to our AgentWriter.

ddtrace/runtime_metrics/__init__.py

tests/runtime_metrics/test_metrics.py

tests/runtime_metrics/test_metric_collectors.py

ddtrace/utils/runtime.py

ddtrace/runtime_metrics/collector.py

ddtrace/runtime_metrics/runtime_metrics.py

ddtrace/runtime_metrics/metric_collectors.py

ddtrace/runtime_metrics/runtime_metrics.py

- add data structures with tests - add collectors for psutil and platform modules - send metrics to agent statsd

- thanks brett :) Co-Authored-By: Kyle-Verhoog <kyle.verhoog@datadoghq.com>

brettlangdon · 2019-04-10T15:36:27Z

ddtrace/bootstrap/sitecustomize.py

@@ -97,6 +98,8 @@ def add_global_tags(tracer):
        opts["port"] = int(port)
    if priority_sampling:
        opts["priority_sampling"] = asbool(priority_sampling)
+    if runtime_metrics_enabled:


If we set a default, then we'll always have this, unless they do DD_RUNTIME_METRICS= and empty string is falsey.

Also we don't really use "enabled" as a value for other things do we? we should just use True as the default.

We should be able to change to:

opts['collect_metrics'] = asbool(get_env('runtime_metrics', True))

We need to verify if we want this to be True or False by default.

nvm, I completely forgot how get_env works, it is get_env(<integration>, <name>) so this is DD_RUNTIME_METRICS_ENABLED and the default is None.

So this is totally fine to keep as-is! sorry about any confusion!

ddtrace/internal/runtime/constants.py

brettlangdon · 2019-04-10T16:46:54Z

ddtrace/tracer.py

@@ -239,8 +262,66 @@ def start_span(self, name, child_of=None, service=None, resource=None, span_type
        # add it to the current context
        context.add_span(span)

+        if service and service not in self._services:


This needs to be indented under the else from L226

self._services is a set, we can do:

if service: self._services.add(service)

brettlangdon · 2019-04-10T16:56:44Z

tests/internal/runtime/test_runtime_metrics.py

+    def test_worker_flush(self):
+        self.worker.start()
+        self.worker.flush()
+


Move self.worker.stop() here to keep from double flushing.

tests/internal/runtime/test_runtime_metrics.py

brettlangdon · 2019-04-10T16:58:39Z

tests/internal/runtime/test_runtime_metrics.py

+
+        # expect all metrics in default set are received
+        # DEV: dogstatsd gauges in form "{metric_name}:{value}|g"
+        self.assertSetEqual(


Add check here to confirm values match what is cached by the collections in the worker.

Co-Authored-By: majorgreys <tahir@tahirbutt.com>

… into kyle-verhoog/metrics

brettlangdon · 2019-04-11T17:54:18Z

tests/commands/ddtrace_run_dogstatsd.py

@@ -3,6 +3,6 @@
 from ddtrace import tracer

 if __name__ == '__main__':
-    assert tracer.dogstatsd.host == "172.10.0.1"
-    assert tracer.dogstatsd.port == 8120
+    assert tracer._dogstatsd_client.host == "172.10.0.1"


single quotes

brettlangdon

looks good, shipping behind a flag too which is nice.

noticed some double quotes usage... I need to get a flake8 plugin setup for that.

ddtrace/bootstrap/sitecustomize.py

Co-Authored-By: majorgreys <tahir@tahirbutt.com>

Kyle-Verhoog added core labels Feb 9, 2019

Kyle-Verhoog requested review from brettlangdon and majorgreys February 9, 2019 08:02

brettlangdon reviewed Feb 21, 2019

View reviewed changes

brettlangdon reviewed Mar 4, 2019

View reviewed changes

tox.ini Outdated Show resolved Hide resolved

Kyle-Verhoog requested a review from a team as a code owner March 15, 2019 21:21

Kyle-Verhoog force-pushed the kyle-verhoog/metrics branch from 5a8435f to 06c6ba1 Compare March 15, 2019 21:41

brettlangdon reviewed Mar 17, 2019

View reviewed changes

majorgreys reviewed Mar 20, 2019

View reviewed changes

ddtrace/runtime_metrics/metric_collectors.py Outdated Show resolved Hide resolved

delner reviewed Mar 21, 2019

View reviewed changes

ddtrace/runtime_metrics/runtime_metrics.py Outdated Show resolved Hide resolved

Kyle-Verhoog and others added 18 commits March 22, 2019 16:35

[metrics] initial implementation

c29f31c

- add data structures with tests - add collectors for psutil and platform modules - send metrics to agent statsd

[metrics] add gc generation metrics

8c2679f

[metrics] clean-up

d9b61b6

[metrics] add thread worker, additional metrics

5f14909

[metrics] linting

f0174de

[metrics] code organization

4a4c9f7

[metrics] add runtime_id to tracer

75c05c1

[metrics] resolve rebase conflicts

1c4a3b8

[metrics] linting

066df1a

[metrics] add runtime-id tag

b96ba74

[metrics] linting

e84e450

[metrics] linting

1e10667

Add environment variable for enabling runtime metrics

910b83b

Environment configuration for dogstatsd

0abcafb

apply brettlinter

1612214

- thanks brett :) Co-Authored-By: Kyle-Verhoog <kyle.verhoog@datadoghq.com>

[metrics] remove unnecessary LazyValues

eed154c

[metrics] in-line psutil method calls

009b94f

[metrics] use internal logger

8aea85a

majorgreys added 3 commits April 8, 2019 19:29

Fix cpu metrics

34d5c0c

Fix cumulative metrics

e344085

Fix reset

a234743

brettlangdon reviewed Apr 10, 2019

View reviewed changes

majorgreys and others added 13 commits April 10, 2019 14:01

Flag check unnecessary

657061b

Fix runtime tag names

a76e1ee

Co-Authored-By: majorgreys <tahir@tahirbutt.com>

Merge branch 'kyle-verhoog/metrics' of github.com:DataDog/dd-trace-py…

a9fb5c0

… into kyle-verhoog/metrics

Only tag root span with runtime info

52acbb8

Use common namespace for gc metric names

610e8ce

Remove unnecessary set check

94f58ad

Wait for tests of metrics received

5d34662

Fix for constant tags and services

af39200

Fix broken config

75fb9de

Fix flake8

bc560ed

Merge branch '0.24-dev' into kyle-verhoog/metrics

7e26b3f

Fix ddtrace-run test for runtime metrics enabled

c467106

Merge branch 'kyle-verhoog/metrics' of github.com:DataDog/dd-trace-py…

667feea

… into kyle-verhoog/metrics

brettlangdon reviewed Apr 11, 2019

View reviewed changes

brettlangdon previously approved these changes Apr 11, 2019

View reviewed changes

ddtrace/bootstrap/sitecustomize.py Outdated Show resolved Hide resolved

Update ddtrace/bootstrap/sitecustomize.py

077cad9

Co-Authored-By: majorgreys <tahir@tahirbutt.com>

majorgreys dismissed stale reviews from brettlangdon via 077cad9 April 11, 2019 17:59

majorgreys added this to the 0.24.0 milestone Apr 11, 2019

majorgreys removed the do-not-merge/WIP label Apr 11, 2019

Merge branch '0.24-dev' into kyle-verhoog/metrics

ab0c594

brettlangdon approved these changes Apr 11, 2019

View reviewed changes

majorgreys merged commit 2051e04 into 0.24-dev Apr 11, 2019

majorgreys deleted the kyle-verhoog/metrics branch April 11, 2019 18:35

majorgreys mentioned this pull request Apr 12, 2019

0.24.0 #881

Merged

ellieayla mentioned this pull request Dec 27, 2019

Getting psutil installed is difficult #1170

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core] Collect run-time metrics #819

[core] Collect run-time metrics #819

Kyle-Verhoog commented Feb 9, 2019 •

edited by majorgreys

brettlangdon commented Feb 9, 2019

brettlangdon left a comment

brettlangdon left a comment

brettlangdon Apr 10, 2019

brettlangdon Apr 10, 2019

brettlangdon Apr 10, 2019

brettlangdon Apr 10, 2019

brettlangdon Apr 10, 2019

brettlangdon Apr 10, 2019

brettlangdon Apr 10, 2019

brettlangdon Apr 11, 2019

brettlangdon left a comment

[core] Collect run-time metrics #819

[core] Collect run-time metrics #819

Conversation

Kyle-Verhoog commented Feb 9, 2019 • edited by majorgreys

Overview

TODOs

brettlangdon commented Feb 9, 2019

brettlangdon left a comment

Choose a reason for hiding this comment

brettlangdon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brettlangdon left a comment

Choose a reason for hiding this comment

Kyle-Verhoog commented Feb 9, 2019 •

edited by majorgreys