Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved support for multiple tracer instances #919

Merged
merged 5 commits into from
Jan 18, 2020

Conversation

marcotc
Copy link
Member

@marcotc marcotc commented Jan 10, 2020

When instantiating multiple Datadog::Tracers, we currently override the thread-local Context with the one belonging to the latest instantiated Tracer. This happens because of this line:

@provider = options.fetch(:context_provider, Datadog::DefaultContextProvider.new)

By creating a new DefaultContextProvider for each tracer, we override the previous Context that was present in Thread.current[:datadog_context].

This PR adds support for multiple concurrent thread-local contexts.
This is done by uniquely "namespacing" the thread-local variable for each DefaultContextProvider instance:

@key = "datadog_context_#{object_id}".to_sym

This allows for the follow code to work as expected:

tracer1 = Datadog::Tracer.new
tracer2 = Datadog::Tracer.new

tracer1.trace('1.1') do
  tracer2.trace('2') do
    tracer1.trace('1.2') {}
  end
end

tracer1.shutdown! # tracer1 flushes: [Span('1.1'), Span('1.2', parent_id: '1.1')
tracer2.shutdown! # tracer2 flushes: [Span('2')]

In practice, this only affects us today while testing with custom tracer instances, like the ones created by get_test_tracer: these custom instances compete with the default Tracer.trace for the context thread-local spot and only one of them can win.
A few tests came out as misconfigured after implementing these changes: they were not properly providing the {tracer: get_test_tracer} option to their integration but were lucky that get_test_tracer hijacked the thread-local context from Tracer.trace, thus the expected spans ended up in the "wrong" tracer (get_test_tracer) which incidentally is the one under test.

An alternative to implementing this is enforcing a singleton Tracer: making it impossible to have two tracers at one time, even for testing, and always retrieving the same instance from a central point.

@marcotc marcotc added core Involves Datadog core libraries feature Involves a product feature labels Jan 10, 2020
@marcotc marcotc requested a review from a team January 10, 2020 22:00
@marcotc marcotc self-assigned this Jan 10, 2020
brettlangdon
brettlangdon previously approved these changes Jan 13, 2020
Copy link
Member

@brettlangdon brettlangdon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@delner
Copy link
Contributor

delner commented Jan 13, 2020

Cool, I'm glad we got around to implementing this after talking quite a bit about it; nice work!

I think this will actually compliment #879 well in the sense that if that PR is about fan-out from the tracer, this is the fan-in from the tracer to the worker. Gives us a lot of flexibility with tracer implementation which is always a good thing.

self.local = Datadog::Context.new
end

# Override the thread-local context with a new context.
def local=(ctx)
Thread.current[:datadog_context] = ctx
Thread.current[@key] = ctx
Copy link
Contributor

@delner delner Jan 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that on the one thread with one tracer, multiple contexts might be used over time? Such that we end up with multiple contexts that build up in the thread local variables?

I'm trying to see if there's some possible path for contexts to hang around in the thread local memory in a way that would cause a leak. Given we have a number of integrations that reset the context, and others that overwrite it for the purposes of distributed tracing, I'm wondering if either of those could cause a problem like this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@delner ThreadLocalContext instances are currently strongly tied to Tracer instance: they are created at Tracer#initialize and all request for "current thread context" go though, either directly or indirectly, a tracer instance.

For applications configuring ddtrace today nothing will change: only one context will exist per thread, and only if a span was created during that thread's execution.

For applications that call Datadog.configure multiple times, nothing changes, as Tracer#initialize is still only called once, even though Tracer#configure is called many times. Currently, Tracer#configure does not reconfigure the context_provider.

For applications that manually invoke Tracer.new, those will have multiple contexts per thread: one Context per thread, per tracer instance maximum.

I considered creating a #shutdown! method to ThreadLocalContext (trigger from Tracer#shutdown!), but that would only clean up the current thread's Context. I'm not sure if there's a feasible way to clean up another thread's thread-local variable.

I guess this is one of those situations where the drawbacks of thread-local variables come back to bite us 🤔.
Managing the context as an explicit resource is the "ideal" option, albeit a much larger endeavour at this point:

tracer1.call_context do |ctx1| # Initialize thread-local context for tracer 1
  tracer2.call_context do |ctx2| # Initialize thread-local context for tracer 2
    ...
  end # Remove thread-local variable for tracer 2's context

  tracer1.call_context do |ctx1| # Thread-local context for tracer 1 already exists, nothing to do
    ...
  end # I did not create the thread-local context, nothing to do
  ...
end # Remove thread-local variable for tracer 1's context

This would requiring wrapping all usages of the current context, so quite a lot of work, but it would make managing the lifetime cycle much clearer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say the use case for creating multiple tracers + trace local contexts is pretty minimal. Maybe in a testing env there might be a few hundred created, but I'd say in an application there is probably max of 2 per-thread (default one + one manually created by the customer?).

Also, as far as shutdown goes, if the thread closes, the thread local data should be cleaned up as well, so we might not need to try and be fancy here.

All of this hinges on the idea that a minimal number of tracers + thread local contexts exist per-thread.

Copy link
Member

@brettlangdon brettlangdon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

this new test coverage is great

@@ -24,7 +24,7 @@
before do
Datadog.configure do |c|
c.use :rails, options
c.use :action_cable
c.use :action_cable, options
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test started to fail because it was not configuring a test tracer, using the global tracer instead.
The global tracer can have traces unrelated to this test, so the total number of spans in the tracer writer might differ from expected.

In this test there was one span unrelated to the test that we were manually filtering out before, which we don't have to now anymore.

@@ -12,7 +12,7 @@
# This is because Rails instrumentation normally defers patching until #after_initialize
# when it activates and configures each of the Rails components with application details.
# We aren't initializing a full Rails application here, so the patch doesn't auto-apply.
c.use :action_pack
c.use :action_pack, configuration_options
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test started to fail because it was not configuring a test tracer, using the global tracer instead.

@@ -20,7 +20,7 @@
before(:each) do
Datadog.configure do |c|
c.use :sinatra, options
c.use :active_record
c.use :active_record, options
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test started to fail because it was not configuring a test tracer, using the global tracer instead.
The global tracer was missing a few ActiveRecord bootstrap queries (PRAGMA queries). We assumed they were never present, even though these queries do show in a real application.

@@ -10,43 +10,41 @@ def tracer
end

def new_tracer(options = {})
@tracer ||= begin
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are overaggressively caching the new_tracer in an instance variable, even though the method tracer defined just above takes care of that (def tracer; @tracer ||= new_tracer; end).

I've made this change to allow multiple calls to new_tracer to actually return different instances. This necessary to test that multiple instances of the tracer each have their own thread-local contexts.

No test currently points directly to new_tracer, making this change easier.

Copy link
Contributor

@delner delner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, II'm happy with the new tests, covers some good cases. Thanks @marcotc! 👍

@brettlangdon brettlangdon merged commit bd853ed into master Jan 18, 2020
@marcotc marcotc added this to the 0.32.0 milestone Jan 22, 2020
@marcotc marcotc added this to Merged & awaiting release in Active work Jan 22, 2020
@marcotc marcotc moved this from Merged & awaiting release to Released in Active work Jan 22, 2020
@marcotc marcotc deleted the feat/improved-multiple-instance branch January 23, 2020 20:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Involves Datadog core libraries feature Involves a product feature
Projects
Active work
  
Released
Development

Successfully merging this pull request may close these issues.

None yet

3 participants