-
Notifications
You must be signed in to change notification settings - Fork 394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] Collect run-time metrics #819
Merged
Merged
Changes from 60 commits
Commits
Show all changes
75 commits
Select commit
Hold shift + click to select a range
c29f31c
[metrics] initial implementation
Kyle-Verhoog 8c2679f
[metrics] add gc generation metrics
Kyle-Verhoog d9b61b6
[metrics] clean-up
Kyle-Verhoog 5f14909
[metrics] add thread worker, additional metrics
Kyle-Verhoog f0174de
[metrics] linting
Kyle-Verhoog 4a4c9f7
[metrics] code organization
Kyle-Verhoog 75c05c1
[metrics] add runtime_id to tracer
Kyle-Verhoog 1c4a3b8
[metrics] resolve rebase conflicts
Kyle-Verhoog 066df1a
[metrics] linting
Kyle-Verhoog b96ba74
[metrics] add runtime-id tag
Kyle-Verhoog e84e450
[metrics] linting
Kyle-Verhoog 1e10667
[metrics] linting
Kyle-Verhoog 910b83b
Add environment variable for enabling runtime metrics
majorgreys 0abcafb
Environment configuration for dogstatsd
majorgreys 1612214
apply brettlinter
brettlangdon eed154c
[metrics] remove unnecessary LazyValues
Kyle-Verhoog 009b94f
[metrics] in-line psutil method calls
Kyle-Verhoog 8aea85a
[metrics] use internal logger
Kyle-Verhoog 02f2b00
[metrics] add reset method, gather services
Kyle-Verhoog 5ba8dc1
[metrics] support multiple services properly
Kyle-Verhoog 810ec4e
[metrics] use base test case
Kyle-Verhoog 7f63ec9
[metrics] handle process forking
Kyle-Verhoog d23995c
[metrics] add runtime metrics tags to spans
Kyle-Verhoog d886bff
Remove LazyValue
majorgreys fcda216
Add dependencies for runtime metrics to library
majorgreys a333f70
Refactor metrics collectors and add tests
majorgreys 321474d
Begin major refactoring of api
majorgreys 1cc7895
Decouple dogstatsd from runtime metrics
majorgreys f851205
Fix constant
majorgreys c900dd2
Fix flake8
majorgreys 2e807ec
Separate host/port for trace agent and dogstatsd
majorgreys a9999c8
Update ddtrace_run tests
majorgreys 0308fd7
Fix integration test
majorgreys 992c9ce
Merge branch '0.24-dev' into kyle-verhoog/metrics
majorgreys c78c5a0
Merge branch '0.24-dev' into kyle-verhoog/metrics
majorgreys a198c5f
Vendor datadogpy to fix issues with gevent+requests
majorgreys 4e8e40e
Revert change to on import
majorgreys 868891e
Add license for dogstatsd
majorgreys df7a07f
Move runtime metrics into internal
majorgreys c58e796
Fixes for ddtrace.internal.runtime
majorgreys effd59a
Wrap worker flush in try-except to log errors
majorgreys 1ffdcb9
Flush calls gauge which is a UDP so no need to catch errors
majorgreys 71439ac
Remove unused datadog and metrics tests
majorgreys 86f70c8
Rename class in repr
majorgreys 15953d0
Remove collect_fn argument from ValueCollector
majorgreys b1ff051
Fix flake8
majorgreys fbbbddf
Remove tags not called for in RFC
majorgreys b592566
Merge branch '0.24-dev' into kyle-verhoog/metrics
majorgreys 3940813
Better metric names for cpu
majorgreys 50a6ecf
Merge branch 'kyle-verhoog/metrics' of github.com:DataDog/dd-trace-py…
majorgreys 641f9b6
Use 0-1-2 for gc collections
majorgreys da771e1
Merge branch '0.24-dev' into kyle-verhoog/metrics
majorgreys 38e7f60
Comments
majorgreys 9a8b6c7
Merge branch 'kyle-verhoog/metrics' of github.com:DataDog/dd-trace-py…
majorgreys 156b6b4
Fix daemon for threading
majorgreys 589a89b
Add test on metrics received by dogstatsd
majorgreys 48d9bf2
Remove datadog dependency since we have it vendored
majorgreys 34d5c0c
Fix cpu metrics
majorgreys e344085
Fix cumulative metrics
majorgreys a234743
Fix reset
majorgreys 657061b
Flag check unnecessary
majorgreys a76e1ee
Fix runtime tag names
brettlangdon a9fb5c0
Merge branch 'kyle-verhoog/metrics' of github.com:DataDog/dd-trace-py…
majorgreys 52acbb8
Only tag root span with runtime info
majorgreys 610e8ce
Use common namespace for gc metric names
majorgreys 94f58ad
Remove unnecessary set check
majorgreys 5d34662
Wait for tests of metrics received
majorgreys af39200
Fix for constant tags and services
majorgreys 75fb9de
Fix broken config
majorgreys bc560ed
Fix flake8
majorgreys 7e26b3f
Merge branch '0.24-dev' into kyle-verhoog/metrics
majorgreys c467106
Fix ddtrace-run test for runtime metrics enabled
majorgreys 667feea
Merge branch 'kyle-verhoog/metrics' of github.com:DataDog/dd-trace-py…
majorgreys 077cad9
Update ddtrace/bootstrap/sitecustomize.py
brettlangdon ab0c594
Merge branch '0.24-dev' into kyle-verhoog/metrics
majorgreys File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
from .runtime_metrics import ( | ||
RuntimeTags, | ||
RuntimeMetrics, | ||
RuntimeWorker, | ||
) | ||
|
||
|
||
__all__ = [ | ||
'RuntimeTags', | ||
'RuntimeMetrics', | ||
'RuntimeWorker', | ||
] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,85 @@ | ||
import importlib | ||
|
||
from ..logger import get_logger | ||
|
||
log = get_logger(__name__) | ||
|
||
|
||
class ValueCollector(object): | ||
"""A basic state machine useful for collecting, caching and updating data | ||
obtained from different Python modules. | ||
|
||
The two primary use-cases are | ||
1) data loaded once (like tagging information) | ||
2) periodically updating data sources (like thread count) | ||
|
||
Functionality is provided for requiring and importing modules which may or | ||
may not be installed. | ||
""" | ||
enabled = True | ||
periodic = False | ||
required_modules = [] | ||
value = None | ||
value_loaded = False | ||
|
||
def __init__(self, enabled=None, periodic=None, required_modules=None): | ||
self.enabled = self.enabled if enabled is None else enabled | ||
self.periodic = self.periodic if periodic is None else periodic | ||
self.required_modules = self.required_modules if required_modules is None else required_modules | ||
|
||
self._modules_successfully_loaded = False | ||
self.modules = self._load_modules() | ||
if self._modules_successfully_loaded: | ||
self._on_modules_load() | ||
|
||
def _on_modules_load(self): | ||
"""Hook triggered after all required_modules have been successfully loaded. | ||
""" | ||
|
||
def _load_modules(self): | ||
modules = {} | ||
try: | ||
for module in self.required_modules: | ||
modules[module] = importlib.import_module(module) | ||
self._modules_successfully_loaded = True | ||
except ImportError: | ||
# DEV: disable collector if we cannot load any of the required modules | ||
self.enabled = False | ||
log.warn('Could not import module "{}" for {}. Disabling collector.'.format(module, self)) | ||
return None | ||
return modules | ||
|
||
def collect(self, keys=None): | ||
"""Returns metrics as collected by `collect_fn`. | ||
|
||
:param keys: The keys of the metrics to collect. | ||
""" | ||
if not self.enabled: | ||
return self.value | ||
|
||
keys = keys or set() | ||
|
||
if not self.periodic and self.value_loaded: | ||
return self.value | ||
|
||
# call underlying collect function and filter out keys not requested | ||
self.value = self.collect_fn(keys) | ||
|
||
# filter values for keys | ||
if len(keys) > 0 and isinstance(self.value, list): | ||
self.value = [ | ||
(k, v) | ||
for (k, v) in self.value | ||
if k in keys | ||
] | ||
|
||
self.value_loaded = True | ||
return self.value | ||
|
||
def __repr__(self): | ||
return '<{}(enabled={},periodic={},required_modules={})>'.format( | ||
self.__class__.__name__, | ||
self.enabled, | ||
self.periodic, | ||
self.required_modules, | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
GC_GEN0_COUNT = 'runtime.python.gc.gen0_count' | ||
GC_GEN1_COUNT = 'runtime.python.gc.gen1_count' | ||
GC_GEN2_COUNT = 'runtime.python.gc.gen2_count' | ||
|
||
THREAD_COUNT = 'runtime.python.thread_count' | ||
MEM_RSS = 'runtime.python.mem.rss' | ||
CPU_TIME_SYS = 'runtime.python.cpu.time.sys' | ||
CPU_TIME_USER = 'runtime.python.cpu.time.user' | ||
CPU_PERCENT = 'runtime.python.cpu.percent' | ||
CTX_SWITCH_VOLUNTARY = 'runtime.python.cpu.ctx_switch.voluntary' | ||
CTX_SWITCH_INVOLUNTARY = 'runtime.python.cpu.ctx_switch.involuntary' | ||
|
||
GC_RUNTIME_METRICS = set([ | ||
GC_GEN0_COUNT, | ||
GC_GEN1_COUNT, | ||
GC_GEN2_COUNT, | ||
]) | ||
|
||
PSUTIL_RUNTIME_METRICS = set([ | ||
THREAD_COUNT, | ||
MEM_RSS, | ||
CTX_SWITCH_VOLUNTARY, | ||
CTX_SWITCH_INVOLUNTARY, | ||
CPU_TIME_SYS, | ||
CPU_TIME_USER, | ||
CPU_PERCENT, | ||
]) | ||
|
||
DEFAULT_RUNTIME_METRICS = GC_RUNTIME_METRICS | PSUTIL_RUNTIME_METRICS | ||
|
||
RUNTIME_ID = 'runtime.python.runtime-id' | ||
majorgreys marked this conversation as resolved.
Show resolved
Hide resolved
|
||
SERVICE = 'runtime.python.service' | ||
majorgreys marked this conversation as resolved.
Show resolved
Hide resolved
|
||
LANG_INTERPRETER = 'runtime.python.lang_interpreter' | ||
majorgreys marked this conversation as resolved.
Show resolved
Hide resolved
|
||
LANG_VERSION = 'runtime.python.lang_version' | ||
majorgreys marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
TRACER_TAGS = set([ | ||
RUNTIME_ID, | ||
SERVICE, | ||
]) | ||
|
||
PLATFORM_TAGS = set([ | ||
LANG_INTERPRETER, | ||
LANG_VERSION | ||
]) | ||
|
||
DEFAULT_RUNTIME_TAGS = TRACER_TAGS |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,92 @@ | ||
import os | ||
|
||
from .collector import ValueCollector | ||
from .constants import ( | ||
GC_GEN0_COUNT, | ||
GC_GEN1_COUNT, | ||
GC_GEN2_COUNT, | ||
THREAD_COUNT, | ||
MEM_RSS, | ||
CTX_SWITCH_VOLUNTARY, | ||
CTX_SWITCH_INVOLUNTARY, | ||
CPU_TIME_SYS, | ||
CPU_TIME_USER, | ||
CPU_PERCENT, | ||
) | ||
|
||
|
||
class RuntimeMetricCollector(ValueCollector): | ||
value = [] | ||
periodic = True | ||
|
||
|
||
class GCRuntimeMetricCollector(RuntimeMetricCollector): | ||
""" Collector for garbage collection generational counts | ||
|
||
More information at https://docs.python.org/3/library/gc.html | ||
""" | ||
required_modules = ['gc'] | ||
|
||
def collect_fn(self, keys): | ||
gc = self.modules.get('gc') | ||
|
||
counts = gc.get_count() | ||
metrics = [ | ||
(GC_GEN0_COUNT, counts[0]), | ||
(GC_GEN1_COUNT, counts[1]), | ||
(GC_GEN2_COUNT, counts[2]), | ||
] | ||
|
||
return metrics | ||
|
||
|
||
class PSUtilRuntimeMetricCollector(RuntimeMetricCollector): | ||
"""Collector for psutil metrics. | ||
|
||
Performs batched operations via proc.oneshot() to optimize the calls. | ||
See https://psutil.readthedocs.io/en/latest/#psutil.Process.oneshot | ||
for more information. | ||
""" | ||
required_modules = ['psutil'] | ||
stored_value = dict( | ||
CPU_TIME_SYS_TOTAL=0, | ||
CPU_TIME_USER_TOTAL=0, | ||
CTX_SWITCH_VOLUNTARY_TOTAL=0, | ||
CTX_SWITCH_INVOLUNTARY_TOTAL=0, | ||
) | ||
|
||
def _on_modules_load(self): | ||
self.proc = self.modules['psutil'].Process(os.getpid()) | ||
|
||
def collect_fn(self, keys): | ||
with self.proc.oneshot(): | ||
# only return time deltas | ||
# TODO[tahir]: better abstraction for metrics based on last value | ||
cpu_time_sys_total = self.proc.cpu_times().system | ||
cpu_time_user_total = self.proc.cpu_times().user | ||
cpu_time_sys = cpu_time_sys_total - self.stored_value['CPU_TIME_SYS_TOTAL'] | ||
cpu_time_user = cpu_time_user_total - self.stored_value['CPU_TIME_USER_TOTAL'] | ||
|
||
ctx_switch_voluntary_total = self.proc.num_ctx_switches().voluntary | ||
ctx_switch_involuntary_total = self.proc.num_ctx_switches().involuntary | ||
ctx_switch_voluntary = ctx_switch_voluntary_total - self.stored_value['CTX_SWITCH_VOLUNTARY_TOTAL'] | ||
ctx_switch_involuntary = ctx_switch_involuntary_total - self.stored_value['CTX_SWITCH_INVOLUNTARY_TOTAL'] | ||
|
||
self.stored_value = dict( | ||
CPU_TIME_SYS_TOTAL=cpu_time_sys_total, | ||
CPU_TIME_USER_TOTAL=cpu_time_user_total, | ||
CTX_SWITCH_VOLUNTARY_TOTAL=ctx_switch_voluntary_total, | ||
CTX_SWITCH_INVOLUNTARY_TOTAL=ctx_switch_involuntary_total, | ||
) | ||
|
||
metrics = [ | ||
(THREAD_COUNT, self.proc.num_threads()), | ||
(MEM_RSS, self.proc.memory_info().rss), | ||
(CTX_SWITCH_VOLUNTARY, ctx_switch_voluntary), | ||
(CTX_SWITCH_INVOLUNTARY, ctx_switch_involuntary), | ||
(CPU_TIME_SYS, cpu_time_sys), | ||
(CPU_TIME_USER, cpu_time_user), | ||
(CPU_PERCENT, self.proc.cpu_percent()), | ||
] | ||
|
||
return metrics |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,107 @@ | ||
import threading | ||
import time | ||
import itertools | ||
|
||
from ..logger import get_logger | ||
from .constants import ( | ||
DEFAULT_RUNTIME_METRICS, | ||
DEFAULT_RUNTIME_TAGS, | ||
) | ||
from .metric_collectors import ( | ||
GCRuntimeMetricCollector, | ||
PSUtilRuntimeMetricCollector, | ||
) | ||
from .tag_collectors import ( | ||
TracerTagCollector, | ||
) | ||
|
||
log = get_logger(__name__) | ||
|
||
|
||
class RuntimeCollectorsIterable(object): | ||
def __init__(self, enabled=None): | ||
self._enabled = enabled or self.ENABLED | ||
# Initialize the collectors. | ||
self._collectors = [c() for c in self.COLLECTORS] | ||
|
||
def __iter__(self): | ||
collected = ( | ||
collector.collect(self._enabled) | ||
for collector in self._collectors | ||
) | ||
return itertools.chain.from_iterable(collected) | ||
|
||
def __repr__(self): | ||
return '{}(enabled={})'.format( | ||
self.__class__.__name__, | ||
self._enabled, | ||
) | ||
|
||
|
||
class RuntimeTags(RuntimeCollectorsIterable): | ||
ENABLED = DEFAULT_RUNTIME_TAGS | ||
COLLECTORS = [ | ||
TracerTagCollector, | ||
] | ||
|
||
|
||
class RuntimeMetrics(RuntimeCollectorsIterable): | ||
ENABLED = DEFAULT_RUNTIME_METRICS | ||
COLLECTORS = [ | ||
GCRuntimeMetricCollector, | ||
PSUtilRuntimeMetricCollector, | ||
] | ||
|
||
|
||
class RuntimeWorker(object): | ||
""" Worker thread for collecting and writing runtime metrics to a DogStatsd | ||
client. | ||
""" | ||
|
||
FLUSH_INTERVAL = 10 | ||
|
||
def __init__(self, statsd_client, flush_interval=None): | ||
self._stay_alive = None | ||
self._thread = None | ||
self._flush_interval = flush_interval or self.FLUSH_INTERVAL | ||
self._statsd_client = statsd_client | ||
self._runtime_metrics = RuntimeMetrics() | ||
|
||
def _target(self): | ||
while self._stay_alive: | ||
self.flush() | ||
time.sleep(self._flush_interval) | ||
|
||
def start(self): | ||
if not self._thread: | ||
log.debug("Starting {}".format(self)) | ||
self._stay_alive = True | ||
self._thread = threading.Thread(target=self._target) | ||
self._thread.setDaemon(True) | ||
self._thread.start() | ||
|
||
def stop(self): | ||
if self._thread and self._stay_alive: | ||
log.debug("Stopping {}".format(self)) | ||
self._stay_alive = False | ||
|
||
def _write_metric(self, key, value): | ||
log.debug('Writing metric {}:{}'.format(key, value)) | ||
self._statsd_client.gauge(key, value) | ||
|
||
def flush(self): | ||
if not self._statsd_client: | ||
log.warn('Attempted flush with uninitialized or failed statsd client') | ||
return | ||
|
||
for key, value in self._runtime_metrics: | ||
self._write_metric(key, value) | ||
|
||
def reset(self): | ||
self._runtime_metrics = RuntimeMetrics() | ||
|
||
def __repr__(self): | ||
return '{}(runtime_metrics={})'.format( | ||
self.__class__.__name__, | ||
self._runtime_metrics, | ||
) |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we set a default, then we'll always have this, unless they do
DD_RUNTIME_METRICS=
and empty string is falsey.Also we don't really use "enabled" as a value for other things do we? we should just use
True
as the default.We should be able to change to:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to verify if we want this to be
True
orFalse
by default.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm, I completely forgot how
get_env
works, it isget_env(<integration>, <name>)
so this isDD_RUNTIME_METRICS_ENABLED
and the default isNone
.So this is totally fine to keep as-is! sorry about any confusion!