-
Notifications
You must be signed in to change notification settings - Fork 369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for annotating profiler stacks with the resource of the active web trace, if any #1623
Conversation
Some integrations only set the resource name at the end of the request. When the profiler is taking a sample of a given thread with an active span, the resource name may not yet be available, or it may be incorrect. By introducing the `ResourceContainer` object in a span, the profiler can keep a reference to the `ResourceContainer` objects of all root spans it observes during sampling, and thus delay getting the resource name to as late as possible. Hopefully by the time the profiler is preparing a submission to the backend, the final correct resource name is available, and it can be used by the profiler. Note that this still doesn't guarantee 100% correctness, since the profiler will not delay sending a profile while a span is still ongoing, but this approach should provide us with a reasonable approximation.
This commit only modifies the `Event::Stack` and its users, but does not yet add support to write this information to the pprof nor gathering it from the tracer.
This extends the previous behavior of TraceIdentifiers::Ddtrace to also return the root span resource container, which will be used to enable aggregation-by-endpoint functionality in the profiler.
…vents/max_frames inside These settings should in general not be messed with, and I doubt any customers are, so while this is a backwards-incompatible change, I think since profiler is still in beta and so few (if any) customers are impacted, I'll do the change right away without waiting for GA or 1.0.
This (advanced, not recommended option) can be used to disable the gathering of trace resources (usually the endpoint names) to be reported together with profiles.
Codecov Report
@@ Coverage Diff @@
## ivoanjo/profiler-cleanups #1623 +/- ##
===========================================================
Coverage 98.30% 98.30%
===========================================================
Files 901 901
Lines 43253 43368 +115
===========================================================
+ Hits 42520 42635 +115
Misses 733 733
Continue to review full report at Codecov.
|
I'm glad the new struct fits in a VALUE slot: ObjectSpace.memsize_of(Struct.new(:latest).new('resource'))
=> 40 |
ᵃˡˡ ʰᵃᶦˡ ᵒᵘʳ ᵗᶦⁿʸ ᵒᵛᵉʳˡᵒʳᵈˢ |
As of #1623, the profiler records the `resource` of the root span, on top of the span id and trace id during sampling. Also, as discusssed in the description of #1623 (in the "Getting the correct `resource`" section) integrations where the `resource` is only set at the end pose an extra challenge, as the request may not be finished in time for the `resource` to be included in the profiler payload. This means that the profiler may miss the `resource` for some of the requests, depending on timing of when the sampling happens and when the request finishes. (Note that in this case, the trace id and span id will still be propagated, so the samples will always be able to be tied back to the trace that originated them; it's just the `resource` that will not be included in the profiler data). To avoid this issue for many of our customers, let's move the Rails `resource` setting to the beginning of the request, thus avoiding the issue altogether for the Rails (action_pack/action_controller) integration.
As of #1623, the profiler records the `resource` of the root span, on top of the span id and trace id during sampling. Also, as discusssed in the description of #1623 (in the "Getting the correct `resource`" section) integrations where the `resource` is only set at the end pose an extra challenge, as the request may not be finished in time for the `resource` to be included in the profiler payload. This means that the profiler may miss the `resource` for some of the requests, depending on timing of when the sampling happens and when the request finishes. (Note that in this case, the trace id and span id will still be propagated, so the samples will always be able to be tied back to the trace that originated them; it's just the `resource` that will not be included in the profiler data). To avoid this issue for many of our customers, let's move the Rails `resource` setting to the beginning of the request, thus avoiding the issue altogether for the Rails (action_pack/action_controller) integration.
As of #1623, the profiler records the `resource` of the root span, on top of the span id and trace id during sampling. Also, as discusssed in the description of #1623 (in the "Getting the correct `resource`" section) integrations where the `resource` is only set at the end pose an extra challenge, as the request may not be finished in time for the `resource` to be included in the profiler payload. This means that the profiler may miss the `resource` for some of the requests, depending on timing of when the sampling happens and when the request finishes. (Note that in this case, the trace id and span id will still be propagated, so the samples will always be able to be tied back to the trace that originated them; it's just the `resource` that will not be included in the profiler data). To avoid this issue for many of our customers, let's set the Sinatra `resource` setting to the beginning of the request, thus avoiding the issue altogether for Sinatra. Note that I did not touch the existing code to set the resource at the end, which had a quite scary TODO (lines 47-64) and seemed like something I shouldn't touch. Note 2: This is similar to what we did in #1626 for the Rails integration.
As of #1623, the profiler records the `resource` of the root span, on top of the span id and trace id during sampling. Also, as discusssed in the description of #1623 (in the "Getting the correct `resource`" section) integrations where the `resource` is only set at the end pose an extra challenge, as the request may not be finished in time for the `resource` to be included in the profiler payload. This means that the profiler may miss the `resource` for some of the requests, depending on timing of when the sampling happens and when the request finishes. (Note that in this case, the trace id and span id will still be propagated, so the samples will always be able to be tied back to the trace that originated them; it's just the `resource` that will not be included in the profiler data). To avoid this issue for many of our customers, let's move the Grape `resource` setting to the beginning of the request, thus avoiding the issue altogether for this integration. (Note: This is similar to what we did for Rails in #1626 and Sinatra in #1628)
As of #1623, the profiler records the `resource` of the root span, on top of the span id and trace id during sampling. Also, as discusssed in the description of #1623 (in the "Getting the correct `resource`" section) integrations where the `resource` is only set at the end pose an extra challenge, as the request may not be finished in time for the `resource` to be included in the profiler payload. This means that the profiler may miss the `resource` for some of the requests, depending on timing of when the sampling happens and when the request finishes. (Note that in this case, the trace id and span id will still be propagated, so the samples will always be able to be tied back to the trace that originated them; it's just the `resource` that will not be included in the profiler data). To avoid this issue for many of our customers, let's move the Grape `resource` setting to the beginning of the request, thus avoiding the issue altogether for this integration. (Note: This is similar to what we did for Rails in #1626 and Sinatra in #1628)
Just as an update to my promised plan of trying to move as many integrations to set the request at the beginning if possible, I went through all integrations that generate spans of type
So I think we're in pretty good shape, better than I expected! :) |
This PR updates the Profiler supported versions page with the new Ruby profile types (allocation and heap) that we just released this week. I also tweaked two additional things: * Recommended runtime version for Ruby: 2.3 => 3.1 (other Rubies are no longer kept up-to-date by the Ruby developers, and thus may not provide the best experience, although we still support and test them) * Minimum version for endpoint profiling for Ruby: 0.48.0 => 0.52.0 (the previous value was incorrect, this feature was initially added in DataDog/dd-trace-rb#1623 )
#21345) This PR updates the Profiler supported versions page with the new Ruby profile types (allocation and heap) that we just released this week. I also tweaked two additional things: * Recommended runtime version for Ruby: 2.3 => 3.1 (other Rubies are no longer kept up-to-date by the Ruby developers, and thus may not provide the best experience, although we still support and test them) * Minimum version for endpoint profiling for Ruby: 0.48.0 => 0.52.0 (the previous value was incorrect, this feature was initially added in DataDog/dd-trace-rb#1623 )
#21345) This PR updates the Profiler supported versions page with the new Ruby profile types (allocation and heap) that we just released this week. I also tweaked two additional things: * Recommended runtime version for Ruby: 2.3 => 3.1 (other Rubies are no longer kept up-to-date by the Ruby developers, and thus may not provide the best experience, although we still support and test them) * Minimum version for endpoint profiling for Ruby: 0.48.0 => 0.52.0 (the previous value was incorrect, this feature was initially added in DataDog/dd-trace-rb#1623 )
This PR extends the existing support for linking traces to profiles with one extra feature: the
resource
from root spans representing inbound web requests (aka "the endpoint") will be included as part of the profiling data.This will enable the profiler to expose this as part of the profiling UX (which is still WIP/under discussion).
Getting the correct
resource
This feature would've been really easy to add
if it weren't for those meddling kidsif it weren't for the fact that quite a few integrations set theresource
on the span at the end of the request, not at the beginning. This creates a problem: the profiler can (and probably will) observe the span before this, and would observe a missing or incorrectresource
if it just recorded the currentresource
at the time of sampling.To tackle this issue, I introduced a new
ResourceContainer
object inSpan
that serves only to keep a mutable reference to the latestresource
. This way, and during sampling, the profiler records not theresource
it sees but theResourceContainer
(which is always the same throughout the span's lifetime). Only when the profiling data is encoded does the profiler actually get theresource
from inside theResourceContainer
. (This is the equivalent of the profiler holding on to a reference to the span, but without actually doing that)While the above doesn't guarantee that the profiler will always observe the correct
resource
-- for instance we may observe a request that started just as the profiler was about to report the data, and is still ongoing -- I think this should be a "good enough" approximation to start. To improve on this, I plan to put aside some time to go through our most common tracing integrations and see if I can change any that set theresource
at the end and move it to the beginning. Worst case, the correct data will be sent with the trace, so we can always chose to correct this information on the backend by querying the apm data.Personal Identifiable Information (PII) in the resource?
When I started this work, there was some doubts on wether the
resource
could contain some PII data. While discussing with @ericmustin and @marcotc and inspired on DataDog/dd-trace-py#2573 it became clear theresource
would not include this information, especially not in traces with typeweb
.Of course, customers with really weird custom integrations can still "break" this, but that's the same with the rest of the profiling data -- you can always create a class, file name or method name with PII that will then be gathered by the profiler.
Nevertheless, I've added an option to disable the gathering of the
resource
, if needed (see below).Configuration
I've created a new
advanced
section in theprofiling
settings, and added a newextract_trace_resource
option that can be used to control this feature (defaults to enabled).I also moved two other obscure profiler settings --
max_events
andmax_frames
inside this group. This move is technically backwards-incompatible, but these settings were never documented, and the profiler still being in beta, I think we should just rip of the band-aid now, rather than waiting for a later GA/1.0 to change them.Final note
This PR is on top #1620 to preemptively fix a merge conflict. Once that one is merged, GitHub will change the target branch of this PR to master.