-
Notifications
You must be signed in to change notification settings - Fork 369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tracer health metrics #838
Conversation
0612cb6
to
937c45d
Compare
This pull request is still a work in progress; need to add the actual instrumentation into the tracer code and tests to verify its behavior. Should also run this in a sandbox application to verify it works end-to-end. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉 It looks very good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like a good start
4c7c636
to
f56a39c
Compare
Implemented some queue metrics, too. Right now it's pretty aggressive, and will send stats with each trace push. This might be too much ultimately, but I want to measure the upper bounds on what we can send, because if we can send for each push, we can get some nice distribution metrics about things like "whats the distribution of spans per trace." If we can't, then we can apply sample rates to Statsd, or rework these metrics to only compute/submit metric aggregations on flush, which would significantly reduce the number of stat calls, but also the granularity at which the data can be explored; a trade off we might have to take for sake of scale. Consider this a first iteration, to experiment with and explore the upper bounds of what could be done with stats, but with the caveat that it might have to be reworked slightly to scale it back. |
Also opened this additional branch, which implements a more conservative "aggregate at flush time" approach; its on the other end of the scale in relation to this original implementation. It definitely has flaws of its own (e.g. Feel free to compare the differences here and open it as a PR if necessary: https://github.com/DataDog/dd-trace-rb/compare/feature/debug_metrics...refactor/aggregated_queue_metrics?expand=1 |
METRIC_QUEUE_ACCEPTED_LENGTHS = 'datadog.tracer.queue.accepted_lengths'.freeze | ||
METRIC_QUEUE_ACCEPTED_SIZE = 'datadog.tracer.queue.accepted_size'.freeze |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For these two sub-metrics of datadog.tracer.queue.accepted
, would it make sense to have the metric name separator like so: datadog.tracer.queue.accepted.lengths
datadog.tracer.queue.accepted.size
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think that could make sense; something we'd want to reconcile with our standards though.
f56a39c
to
3d7c7ff
Compare
Pushed a rebase with latest master. |
|
||
# Rough calculation of bytesize; not very accurate. | ||
object.instance_variables.inject(::ObjectSpace.memsize_of(object)) do |sum, var| | ||
sum + ::ObjectSpace.memsize_of(object.instance_variable_get(var)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seem like a good compromise between a very shallow estimate (only ::ObjectSpace.memsize_of(object)
) and a full recursive memory measurement. 👍
1e5b2ce
to
ba75081
Compare
95bef86
to
84bdb44
Compare
To monitor the health and performance of the tracer, this pull request adds health metrics to the tracer and other core components, which are sent as Statsd metrics under the
datadog.tracer.*
prefix. These metrics can then be graphed to evaluate the health, stability and impact of the trace library within a Ruby application.By default, these metrics are enabled, if
dogstatsd-ruby
>= 3.3.0 is available. They can be disabled by setting theDD_DEBUG_HEALTH_METRICS_ENABLED
ENV var to0
.List of metrics added with this pull request: