Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Collect run-time metrics #819

Merged
merged 75 commits into from Apr 11, 2019
Merged

[core] Collect run-time metrics #819

merged 75 commits into from Apr 11, 2019

Conversation

Kyle-Verhoog
Copy link
Member

@Kyle-Verhoog Kyle-Verhoog commented Feb 9, 2019

In likeness of DataDog/dd-trace-rb#677 this PR brings run-time metrics collection to the Python tracer.

Overview

Metrics are periodically polled from various source modules such as gc, platform and psutil and sent to Datadog (through the Agent) via dogstatsd. Some example metrics are:

  • runtime.python.thread_count
    • the number of threads being used by the python process
  • runtime.python.mem.rss
    • "Resident Set Size" the amount of non-swapped physical memory a process has used in bytes
  • runtime.python.gc.gen1_count

Metrics are tagged with static values which are obtained on startup. These include values like:

  • runtime.python.lang_interpreter
    • could be cpython or jython or pypy for example
  • runtime.python.lang_version
    • language version (eg. 3.7.2)

What this (roughly) looks like in Datadog (TODO: better data):
screen shot 2019-02-09 at 2 54 34 am

TODOs

  • Add service name as tag to metrics
  • Support for forking or process spawning Python applications
  • Configuration
    • figure out when/how to enable
      • when datadog module detected
      • additional dependencies?
    • enabling and disabling tags
    • flush interval
    • agent details
  • Documentation
  • Some kind of sanity testing
    • should ensure collection never raises exceptions
    • various combinations of dependent libraries installed (ie. with and without psutil)

@brettlangdon
Copy link
Member

Are context switch and CPU time just always going to be incrementing during runtime?

Is there an alert or check you would do on either? Like if context switches are above/below a certain value? (Same for CPU time)

Copy link
Member

@brettlangdon brettlangdon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

starting dig in a little, need to spend more time with this PR

ddtrace/runtime_metrics.py Outdated Show resolved Hide resolved
ddtrace/runtime_metrics.py Outdated Show resolved Hide resolved
ddtrace/runtime_metrics.py Outdated Show resolved Hide resolved
ddtrace/runtime_metrics.py Outdated Show resolved Hide resolved
ddtrace/runtime_metrics.py Outdated Show resolved Hide resolved
ddtrace/runtime_metrics.py Outdated Show resolved Hide resolved
tox.ini Outdated Show resolved Hide resolved
Copy link
Member

@brettlangdon brettlangdon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a bunch of minor comments.

Seems a little over-engineered, but the direction in general is good.

For logging, please use ddtrace.internal.logger.get_logger and only use log.warn, log.error, and log.debug.

Also, the runtime metrics worker, we should get 1 worker per-process, similar to our AgentWriter.

ddtrace/runtime_metrics/__init__.py Outdated Show resolved Hide resolved
tests/runtime_metrics/test_metrics.py Outdated Show resolved Hide resolved
tests/runtime_metrics/test_metrics.py Outdated Show resolved Hide resolved
tests/runtime_metrics/test_metrics.py Outdated Show resolved Hide resolved
tests/runtime_metrics/test_metrics.py Outdated Show resolved Hide resolved
tests/runtime_metrics/test_metric_collectors.py Outdated Show resolved Hide resolved
ddtrace/utils/runtime.py Outdated Show resolved Hide resolved
ddtrace/runtime_metrics/collector.py Outdated Show resolved Hide resolved
ddtrace/runtime_metrics/runtime_metrics.py Outdated Show resolved Hide resolved
ddtrace/runtime_metrics/runtime_metrics.py Outdated Show resolved Hide resolved
@@ -97,6 +98,8 @@ def add_global_tags(tracer):
opts["port"] = int(port)
if priority_sampling:
opts["priority_sampling"] = asbool(priority_sampling)
if runtime_metrics_enabled:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we set a default, then we'll always have this, unless they do DD_RUNTIME_METRICS= and empty string is falsey.

Also we don't really use "enabled" as a value for other things do we? we should just use True as the default.

We should be able to change to:

opts['collect_metrics'] = asbool(get_env('runtime_metrics', True))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to verify if we want this to be True or False by default.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, I completely forgot how get_env works, it is get_env(<integration>, <name>) so this is DD_RUNTIME_METRICS_ENABLED and the default is None.

So this is totally fine to keep as-is! sorry about any confusion!

ddtrace/internal/runtime/constants.py Outdated Show resolved Hide resolved
ddtrace/internal/runtime/constants.py Outdated Show resolved Hide resolved
ddtrace/internal/runtime/constants.py Outdated Show resolved Hide resolved
ddtrace/internal/runtime/constants.py Outdated Show resolved Hide resolved
@@ -239,8 +262,66 @@ def start_span(self, name, child_of=None, service=None, resource=None, span_type
# add it to the current context
context.add_span(span)

if service and service not in self._services:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be indented under the else from L226

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self._services is a set, we can do:

if service:
    self._services.add(service)

def test_worker_flush(self):
self.worker.start()
self.worker.flush()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move self.worker.stop() here to keep from double flushing.

tests/internal/runtime/test_runtime_metrics.py Outdated Show resolved Hide resolved

# expect all metrics in default set are received
# DEV: dogstatsd gauges in form "{metric_name}:{value}|g"
self.assertSetEqual(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add check here to confirm values match what is cached by the collections in the worker.

@@ -3,6 +3,6 @@
from ddtrace import tracer

if __name__ == '__main__':
assert tracer.dogstatsd.host == "172.10.0.1"
assert tracer.dogstatsd.port == 8120
assert tracer._dogstatsd_client.host == "172.10.0.1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

single quotes

brettlangdon
brettlangdon previously approved these changes Apr 11, 2019
Copy link
Member

@brettlangdon brettlangdon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, shipping behind a flag too which is nice.

noticed some double quotes usage... I need to get a flake8 plugin setup for that.

ddtrace/bootstrap/sitecustomize.py Outdated Show resolved Hide resolved
Co-Authored-By: majorgreys <tahir@tahirbutt.com>
@majorgreys majorgreys dismissed stale reviews from brettlangdon via 077cad9 April 11, 2019 17:59
@majorgreys majorgreys added this to the 0.24.0 milestone Apr 11, 2019
@majorgreys majorgreys merged commit 2051e04 into 0.24-dev Apr 11, 2019
@majorgreys majorgreys deleted the kyle-verhoog/metrics branch April 11, 2019 18:35
@majorgreys majorgreys mentioned this pull request Apr 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants