-
Notifications
You must be signed in to change notification settings - Fork 369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved support for multiple tracer instances #919
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Cool, I'm glad we got around to implementing this after talking quite a bit about it; nice work! I think this will actually compliment #879 well in the sense that if that PR is about fan-out from the tracer, this is the fan-in from the tracer to the worker. Gives us a lot of flexibility with tracer implementation which is always a good thing. |
self.local = Datadog::Context.new | ||
end | ||
|
||
# Override the thread-local context with a new context. | ||
def local=(ctx) | ||
Thread.current[:datadog_context] = ctx | ||
Thread.current[@key] = ctx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that on the one thread with one tracer, multiple contexts might be used over time? Such that we end up with multiple contexts that build up in the thread local variables?
I'm trying to see if there's some possible path for contexts to hang around in the thread local memory in a way that would cause a leak. Given we have a number of integrations that reset the context, and others that overwrite it for the purposes of distributed tracing, I'm wondering if either of those could cause a problem like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@delner ThreadLocalContext
instances are currently strongly tied to Tracer
instance: they are created at Tracer#initialize
and all request for "current thread context" go though, either directly or indirectly, a tracer instance.
For applications configuring ddtrace
today nothing will change: only one context will exist per thread, and only if a span was created during that thread's execution.
For applications that call Datadog.configure
multiple times, nothing changes, as Tracer#initialize
is still only called once, even though Tracer#configure
is called many times. Currently, Tracer#configure
does not reconfigure the context_provider
.
For applications that manually invoke Tracer.new
, those will have multiple contexts per thread: one Context
per thread, per tracer instance maximum.
I considered creating a #shutdown!
method to ThreadLocalContext
(trigger from Tracer#shutdown!
), but that would only clean up the current thread's Context
. I'm not sure if there's a feasible way to clean up another thread's thread-local variable.
I guess this is one of those situations where the drawbacks of thread-local variables come back to bite us 🤔.
Managing the context as an explicit resource is the "ideal" option, albeit a much larger endeavour at this point:
tracer1.call_context do |ctx1| # Initialize thread-local context for tracer 1
tracer2.call_context do |ctx2| # Initialize thread-local context for tracer 2
...
end # Remove thread-local variable for tracer 2's context
tracer1.call_context do |ctx1| # Thread-local context for tracer 1 already exists, nothing to do
...
end # I did not create the thread-local context, nothing to do
...
end # Remove thread-local variable for tracer 1's context
This would requiring wrapping all usages of the current context, so quite a lot of work, but it would make managing the lifetime cycle much clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say the use case for creating multiple tracers + trace local contexts is pretty minimal. Maybe in a testing env there might be a few hundred created, but I'd say in an application there is probably max of 2 per-thread (default one + one manually created by the customer?).
Also, as far as shutdown goes, if the thread closes, the thread local data should be cleaned up as well, so we might not need to try and be fancy here.
All of this hinges on the idea that a minimal number of tracers + thread local contexts exist per-thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
this new test coverage is great
@@ -24,7 +24,7 @@ | |||
before do | |||
Datadog.configure do |c| | |||
c.use :rails, options | |||
c.use :action_cable | |||
c.use :action_cable, options |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test started to fail because it was not configuring a test tracer, using the global tracer instead.
The global tracer can have traces unrelated to this test, so the total number of spans in the tracer writer might differ from expected.
In this test there was one span unrelated to the test that we were manually filtering out before, which we don't have to now anymore.
@@ -12,7 +12,7 @@ | |||
# This is because Rails instrumentation normally defers patching until #after_initialize | |||
# when it activates and configures each of the Rails components with application details. | |||
# We aren't initializing a full Rails application here, so the patch doesn't auto-apply. | |||
c.use :action_pack | |||
c.use :action_pack, configuration_options |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test started to fail because it was not configuring a test tracer, using the global tracer instead.
@@ -20,7 +20,7 @@ | |||
before(:each) do | |||
Datadog.configure do |c| | |||
c.use :sinatra, options | |||
c.use :active_record | |||
c.use :active_record, options |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test started to fail because it was not configuring a test tracer, using the global tracer instead.
The global tracer was missing a few ActiveRecord bootstrap queries (PRAGMA
queries). We assumed they were never present, even though these queries do show in a real application.
@@ -10,43 +10,41 @@ def tracer | |||
end | |||
|
|||
def new_tracer(options = {}) | |||
@tracer ||= begin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are overaggressively caching the new_tracer
in an instance variable, even though the method tracer
defined just above takes care of that (def tracer; @tracer ||= new_tracer; end
).
I've made this change to allow multiple calls to new_tracer
to actually return different instances. This necessary to test that multiple instances of the tracer each have their own thread-local contexts.
No test currently points directly to new_tracer
, making this change easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, II'm happy with the new tests, covers some good cases. Thanks @marcotc! 👍
When instantiating multiple
Datadog::Tracer
s, we currently override the thread-localContext
with the one belonging to the latest instantiated Tracer. This happens because of this line:dd-trace-rb/lib/ddtrace/tracer.rb
Line 76 in adc046a
By creating a new
DefaultContextProvider
for each tracer, we override the previousContext
that was present inThread.current[:datadog_context]
.This PR adds support for multiple concurrent thread-local contexts.
This is done by uniquely "namespacing" the thread-local variable for each
DefaultContextProvider
instance:dd-trace-rb/lib/ddtrace/context_provider.rb
Line 35 in adc046a
This allows for the follow code to work as expected:
In practice, this only affects us today while testing with custom tracer instances, like the ones created by
get_test_tracer
: these custom instances compete with the defaultTracer.trace
for the context thread-local spot and only one of them can win.A few tests came out as misconfigured after implementing these changes: they were not properly providing the
{tracer: get_test_tracer}
option to their integration but were lucky thatget_test_tracer
hijacked the thread-local context fromTracer.trace
, thus the expected spans ended up in the "wrong" tracer (get_test_tracer
) which incidentally is the one under test.An alternative to implementing this is enforcing a singleton
Tracer
: making it impossible to have two tracers at one time, even for testing, and always retrieving the same instance from a central point.