Improved support for multiple tracer instances #919

marcotc · 2020-01-10T22:00:14Z

When instantiating multiple Datadog::Tracers, we currently override the thread-local Context with the one belonging to the latest instantiated Tracer. This happens because of this line:

dd-trace-rb/lib/ddtrace/tracer.rb

Line 76 in adc046a

    
           @provider = options.fetch(:context_provider, Datadog::DefaultContextProvider.new)

By creating a new DefaultContextProvider for each tracer, we override the previous Context that was present in Thread.current[:datadog_context].

This PR adds support for multiple concurrent thread-local contexts.
This is done by uniquely "namespacing" the thread-local variable for each DefaultContextProvider instance:

dd-trace-rb/lib/ddtrace/context_provider.rb

Line 35 in adc046a

@key = "datadog_context_#{object_id}".to_sym

This allows for the follow code to work as expected:

tracer1 = Datadog::Tracer.new
tracer2 = Datadog::Tracer.new

tracer1.trace('1.1') do
  tracer2.trace('2') do
    tracer1.trace('1.2') {}
  end
end

tracer1.shutdown! # tracer1 flushes: [Span('1.1'), Span('1.2', parent_id: '1.1')
tracer2.shutdown! # tracer2 flushes: [Span('2')]

In practice, this only affects us today while testing with custom tracer instances, like the ones created by get_test_tracer: these custom instances compete with the default Tracer.trace for the context thread-local spot and only one of them can win.
A few tests came out as misconfigured after implementing these changes: they were not properly providing the {tracer: get_test_tracer} option to their integration but were lucky that get_test_tracer hijacked the thread-local context from Tracer.trace, thus the expected spans ended up in the "wrong" tracer (get_test_tracer) which incidentally is the one under test.

An alternative to implementing this is enforcing a singleton Tracer: making it impossible to have two tracers at one time, even for testing, and always retrieving the same instance from a central point.

brettlangdon

lgtm

delner · 2020-01-13T19:58:40Z

Cool, I'm glad we got around to implementing this after talking quite a bit about it; nice work!

I think this will actually compliment #879 well in the sense that if that PR is about fan-out from the tracer, this is the fan-in from the tracer to the worker. Gives us a lot of flexibility with tracer implementation which is always a good thing.

delner · 2020-01-13T20:04:02Z

lib/ddtrace/context_provider.rb

      self.local = Datadog::Context.new
    end

    # Override the thread-local context with a new context.
    def local=(ctx)
-      Thread.current[:datadog_context] = ctx
+      Thread.current[@key] = ctx


Is it possible that on the one thread with one tracer, multiple contexts might be used over time? Such that we end up with multiple contexts that build up in the thread local variables?

I'm trying to see if there's some possible path for contexts to hang around in the thread local memory in a way that would cause a leak. Given we have a number of integrations that reset the context, and others that overwrite it for the purposes of distributed tracing, I'm wondering if either of those could cause a problem like this.

@delner ThreadLocalContext instances are currently strongly tied to Tracer instance: they are created at Tracer#initialize and all request for "current thread context" go though, either directly or indirectly, a tracer instance.

For applications configuring ddtrace today nothing will change: only one context will exist per thread, and only if a span was created during that thread's execution.

For applications that call Datadog.configure multiple times, nothing changes, as Tracer#initialize is still only called once, even though Tracer#configure is called many times. Currently, Tracer#configure does not reconfigure the context_provider.

For applications that manually invoke Tracer.new, those will have multiple contexts per thread: one Context per thread, per tracer instance maximum.

I considered creating a #shutdown! method to ThreadLocalContext (trigger from Tracer#shutdown!), but that would only clean up the current thread's Context. I'm not sure if there's a feasible way to clean up another thread's thread-local variable.

I guess this is one of those situations where the drawbacks of thread-local variables come back to bite us 🤔.
Managing the context as an explicit resource is the "ideal" option, albeit a much larger endeavour at this point:

tracer1.call_context do |ctx1| # Initialize thread-local context for tracer 1 tracer2.call_context do |ctx2| # Initialize thread-local context for tracer 2 ... end # Remove thread-local variable for tracer 2's context tracer1.call_context do |ctx1| # Thread-local context for tracer 1 already exists, nothing to do ... end # I did not create the thread-local context, nothing to do ... end # Remove thread-local variable for tracer 1's context

This would requiring wrapping all usages of the current context, so quite a lot of work, but it would make managing the lifetime cycle much clearer.

I would say the use case for creating multiple tracers + trace local contexts is pretty minimal. Maybe in a testing env there might be a few hundred created, but I'd say in an application there is probably max of 2 per-thread (default one + one manually created by the customer?).

Also, as far as shutdown goes, if the thread closes, the thread local data should be cleaned up as well, so we might not need to try and be fancy here.

All of this hinges on the idea that a minimal number of tracers + thread local contexts exist per-thread.

spec/ddtrace/context_provider_spec.rb

brettlangdon

lgtm

this new test coverage is great

marcotc · 2020-01-17T22:45:51Z

spec/ddtrace/contrib/action_cable/instrumentation_spec.rb

@@ -24,7 +24,7 @@
  before do
    Datadog.configure do |c|
      c.use :rails, options
-      c.use :action_cable
+      c.use :action_cable, options


This test started to fail because it was not configuring a test tracer, using the global tracer instead.
The global tracer can have traces unrelated to this test, so the total number of spans in the tracer writer might differ from expected.

In this test there was one span unrelated to the test that we were manually filtering out before, which we don't have to now anymore.

marcotc · 2020-01-17T22:46:10Z

spec/ddtrace/contrib/rails/analytics_spec.rb

@@ -12,7 +12,7 @@
      # This is because Rails instrumentation normally defers patching until #after_initialize
      # when it activates and configures each of the Rails components with application details.
      # We aren't initializing a full Rails application here, so the patch doesn't auto-apply.
-      c.use :action_pack
+      c.use :action_pack, configuration_options


This test started to fail because it was not configuring a test tracer, using the global tracer instead.

marcotc · 2020-01-17T22:49:44Z

spec/ddtrace/contrib/sinatra/activerecord_spec.rb

@@ -20,7 +20,7 @@
  before(:each) do
    Datadog.configure do |c|
      c.use :sinatra, options
-      c.use :active_record
+      c.use :active_record, options


This test started to fail because it was not configuring a test tracer, using the global tracer instead.
The global tracer was missing a few ActiveRecord bootstrap queries (PRAGMA queries). We assumed they were never present, even though these queries do show in a real application.

marcotc · 2020-01-17T22:52:31Z

spec/support/tracer_helpers.rb

@@ -10,43 +10,41 @@ def tracer
  end

  def new_tracer(options = {})
-    @tracer ||= begin


We are overaggressively caching the new_tracer in an instance variable, even though the method tracer defined just above takes care of that (def tracer; @tracer ||= new_tracer; end).

I've made this change to allow multiple calls to new_tracer to actually return different instances. This necessary to test that multiple instances of the tracer each have their own thread-local contexts.

No test currently points directly to new_tracer, making this change easier.

delner

Okay, II'm happy with the new tests, covers some good cases. Thanks @marcotc! 👍

Improved support for multiple tracer instances

adc046a

marcotc added core Involves Datadog core libraries feature Involves a product feature labels Jan 10, 2020

marcotc requested a review from a team January 10, 2020 22:00

marcotc self-assigned this Jan 10, 2020

marcotc mentioned this pull request Jan 10, 2020

Make RuleSampler the default sampler #917

Merged

brettlangdon previously approved these changes Jan 13, 2020

View reviewed changes

delner reviewed Jan 13, 2020

View reviewed changes

Add more tests for ThreadLocalContext

f9c0731

marcotc dismissed brettlangdon’s stale review via f9c0731 January 16, 2020 22:06

brettlangdon reviewed Jan 17, 2020

View reviewed changes

spec/ddtrace/context_provider_spec.rb Outdated Show resolved Hide resolved

marcotc added 3 commits January 17, 2020 16:22

Select only relevant thread local variables

c3e8838

Rubocop

a63452c

Add integration test for concurrent tracers

e26e4d7

brettlangdon approved these changes Jan 17, 2020

View reviewed changes

marcotc commented Jan 17, 2020

View reviewed changes

delner approved these changes Jan 18, 2020

View reviewed changes

brettlangdon merged commit bd853ed into master Jan 18, 2020

marcotc added this to the 0.32.0 milestone Jan 22, 2020

marcotc added this to Merged & awaiting release in Active work Jan 22, 2020

marcotc moved this from Merged & awaiting release to Released in Active work Jan 22, 2020

marcotc deleted the feat/improved-multiple-instance branch January 23, 2020 20:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved support for multiple tracer instances #919

Improved support for multiple tracer instances #919

marcotc commented Jan 10, 2020

brettlangdon left a comment

delner commented Jan 13, 2020

delner Jan 13, 2020 •

edited

marcotc Jan 13, 2020

brettlangdon Jan 14, 2020

brettlangdon left a comment

marcotc Jan 17, 2020

marcotc Jan 17, 2020

marcotc Jan 17, 2020

marcotc Jan 17, 2020

delner left a comment

Improved support for multiple tracer instances #919

Improved support for multiple tracer instances #919

Conversation

marcotc commented Jan 10, 2020

brettlangdon left a comment

Choose a reason for hiding this comment

delner commented Jan 13, 2020

delner Jan 13, 2020 • edited

Choose a reason for hiding this comment

marcotc Jan 13, 2020

Choose a reason for hiding this comment

brettlangdon Jan 14, 2020

Choose a reason for hiding this comment

brettlangdon left a comment

Choose a reason for hiding this comment

marcotc Jan 17, 2020

Choose a reason for hiding this comment

marcotc Jan 17, 2020

Choose a reason for hiding this comment

marcotc Jan 17, 2020

Choose a reason for hiding this comment

marcotc Jan 17, 2020

Choose a reason for hiding this comment

delner left a comment

Choose a reason for hiding this comment

delner Jan 13, 2020 •

edited