Sidekiq client-side tracing #602

dirk · 2018-11-01T19:04:22Z

Sidekiq enqueues are normally fast, but it seems like it'd be nice to have tracing of those enqueue/push events and therefore be able to monitor the latency of enqueues in case they aren't fast (as these can often be a good warning sign of Redis issues).

Note that I made a number of guesses here, especially concerning the span name and omitting the service (ie. that client-side enqueues should inherit their parent's service).

delner

Thanks for the contribution! We love to see improvements like this to our instrumentation.

Overall I think this is looking pretty good, just a few points of feedback. I think the span name seems right, but it's important its assigned a service name.

For that, we'll need to consider a couple of scenarios, which we'll have to factor into the behavior we want to see:

This sidekiq.push span is the top-level span. This could happen realistically if, say, a Ruby script or Rake task is written which only queues jobs. It will need a default service name in such a case. perhaps we adopt the default Sidekiq service name, or perhaps something different.
The sidekiq.push span is nested under another span/service (the most common case, I think.) It might make sense to make it belong to the parent span's service (we do something like this with ActiveRecord), but is that the best place to categorize it? I tend to agree that might be the best solution here, but want to consider this a bit more carefully. If we do want to go in this direction, you'll probably need to use span.parent.service and do some kind of check if there is no parent.

lib/ddtrace/contrib/sidekiq/client_tracer.rb

delner · 2018-11-02T19:47:34Z

lib/ddtrace/contrib/sidekiq/patcher.rb

+              require 'ddtrace/contrib/sidekiq/client_tracer'
+              ::Sidekiq.configure_client do |config|
+                config.client_middleware do |chain|
+                  chain.add(Sidekiq::ClientTracer)


lib/ddtrace/contrib/sidekiq/client_tracer.rb

dirk · 2018-11-06T19:00:57Z

For that, we'll need to consider a couple of scenarios, which we'll have to factor into the behavior we want to see:

@delner: Thanks for your kind and thoughtful review! 💖 This question re. service attribute is something I was struggling to figure out. After reading your comments and further reflection, I think it makes the most sense for the service to be the Sidekiq service as what this is measuring is mostly a) time spent talking to the Redis server backing Sidekiq and sometimes b) JSON serialization if you have a really big job payload. Having that time-on-the-wire be attributed to the service (instead of the parent span's service) matches how this library currently attributes ActiveRecord SQL execution. How does all that sound?

Since there was now so much common functionality between the two tracers I abstracted that out into a BaseTracer class. Please let me know if there's a different design pattern y'all would prefer!

delner · 2018-11-06T20:00:20Z

lib/ddtrace/contrib/sidekiq/base_tracer.rb

+module Datadog
+  module Contrib
+    module Sidekiq
+      class BaseTracer


Could we turn this into a module instead of a class? (For composition vs inheritance?) We could probably name this something like module Tracing so the classes instead read:

class ClientTracer include Tracing end class ServerTracer include Tracing end

delner · 2018-11-06T20:11:44Z

Thanks for your hard work on this @dirk ! This is definitely shaping up. I really like that you extracted the common code for re-usability; that will help us maintain the code more consistently.

I think the implementation strategy for service as you suggested is reasonable. I want to check on one more thing on my end before we're 100% committed to that, but I think in all likelihood that's how we'll do it. And if for some reason that isn't what we want, we can always change it later.

In the mean time, if we can get the build fixed up and this rebased on 0.18-dev, then I think we can target this for an 0.18.0 release in the next few weeks.

dirk · 2018-11-07T02:07:35Z

@delner: Comments addressed and branch rebased (and base changed on GitHub)! I also noticed that CI is failing for this PR and other recently-updated PRs because a new version of concurrent-ruby came out, so I opened PR #607 to (hopefully) fix that. 🙃

delner · 2018-11-07T18:46:10Z

Thanks for updating this so quickly! That build issue should be fixed with 1.1.3 of concurrent-ruby, so if you change or add a commit SHA with a rebase/force push, the CI build should invalidate the cache and pass now (hopefully.)

dirk · 2018-11-09T01:33:45Z

@delner: CI's happy! 🙃

delner · 2018-11-12T19:21:03Z

lib/ddtrace/contrib/sidekiq/client_tracer.rb

+        #   https://github.com/mperham/sidekiq/wiki/Middleware#client-middleware
+        def call(worker_class, job, queue, redis_pool)
+          resource = job_resource(job)
+          service = sidekiq_service(resource)


Just following up on what service names should be, we should 1) avoid having the service match the resource, and 2) the server/client service names matching each other.

For number 1, the service name should act as a group name for a logical deployment of Sidekiq (e.g. application), as users will want to see application performance statistics aggregated by role first, then broken down by task (which is where resource comes into play.) Users should be able to configure the client and service names separately from one another, so to support that, we might need to add in another configuration option like client_service_name or something similar.

For number 2, the client shouldn't match the server because otherwise we see some conflation of statistics between the two. We should probably keep the default service name for the server side as sidekiq (so as to not break user's dependency on this existing default), and give the client service a new default, perhaps something like sidekiq-client or similar.

Oh, okay, I may have misread the implementation here a bit. I see you're not copying the resource in for the service name, but mapping it to some other service name that might exist on the class in question. This starts to cross into what we've referred to as "multiplexing"; mapping tracer configuration based on trace content.

Is this important to implement in this PR? (The mapping resource --> service name on the job?) I might suggest keeping it simple with straight configuration variables from Datadog.configuration[:sidekiq] for now, since this is a bit more complicated, and we have a new prescribed first-class means of doing such mappings as a part of the integration's configuration (such as what was done in #451.)

If we want to simplify this part a bit, we can probably get this PR merged and open another one for the multiplexing feature. Might be the easiest way to get parts of this out more quickly.

[...] This starts to cross into what we've referred to as "multiplexing"; mapping tracer configuration based on trace content.

@delner: Thanks again for taking the time to share all this background context; I realize I was following existing patterns without understanding they were something y'all were refactoring.

If we want to simplify this part a bit, we can probably get this PR merged and open another one for the multiplexing feature. Might be the easiest way to get parts of this out more quickly.

I think I achieved this in 1cdebfd. It sets up a separate configuration for the client service name (defaulted to "sidekiq-client") and maintains the ability to define a .datadog_tracer_config worker class method for setting the service when on a Sidekiq server (so that we don't regress on that feature).

dirk · 2018-11-15T22:59:13Z

@delner: Thanks for taking the time to share so much context! I've read over your comments once, and I think I need to give them a couple more read-overs to make sure I understand it all. I plan on getting back to this PR and getting those comments addressed tomorrow or this weekend. 🙃

…nt-side tracer

Original rename commit: 4f57b00

Don't yet fully understand the best value to use here, so it's prudent to just leave it out until we figure it out.

…service

…arent span

dirk · 2018-11-17T03:44:42Z

@delner: Comments addressed, rebased on 0.18-dev, and CI is happy! 🙂

delner

This is looking very good! I'm very happy with the changes and the new functionality we're adding here.

I left a few comments that mostly concerns old implementation that was refactored here, but nothing that should be a blocker. Any comments or thoughts on those are welcome, otherwise I think this one is ready.

Thank you so much for your hard work and contribution! 🎉

delner · 2018-11-19T19:56:34Z

lib/ddtrace/contrib/sidekiq/server_tracer.rb

+
+        private
+
+        def service_from_worker_config(resource)


Note to self (and out of scope for this PR): we should remove this. We don't want jobs defining tracer functions on themselves; this should be handled in our integration configuration (probably by leveraging the new multiplexing feature.)

As this is apparently existing behavior from the previous file (before you refactored this), this isn't a blocker. We'll do this in a future PR.

delner · 2018-11-19T20:01:44Z

lib/ddtrace/contrib/sidekiq/server_tracer.rb

+        def service_from_worker_config(resource)
+          # Try to get the Ruby class from the resource name.
+          worker_klass = begin
+            Object.const_get(resource)


This is also old code but... can you think of any scenarios where the resource (i.e. class name) would not exist in the namespace?

If failing to resolve a constant and raising an error is a very common case, then this could be incurring a significant performance cost. If that is true, we'd want to fix this (now or at some point in the future.) Otherwise if this is just an edge case, and a safety of sorts, then I don't think we need to worry about this.

This isn't a blocker for this PR most likely, since it's existing behavior.

@delner: This is adapted from the original tracer implementation, but given your comment above I think this seems like a good enough time as any to remove support for workers defining a .datadog_tracer_config method (and therefore not need this Object.const_get). What do you think?

Yeah I think we'll want to remove this too, but to do so, we'll have to deprecate the feature and remove it in a future release. I've added an item to our backlog to do each of these things in future PRs.

delner reviewed Nov 2, 2018

View reviewed changes

delner added integrations Involves tracing integrations community Was opened by a community member feature Involves a product feature labels Nov 2, 2018

delner assigned dirk Nov 2, 2018

delner reviewed Nov 6, 2018

View reviewed changes

delner added this to the 0.18.0 milestone Nov 6, 2018

dirk force-pushed the sidekiq-client-tracer branch 2 times, most recently from 0deff48 to 82b75ab Compare November 7, 2018 01:43

dirk changed the base branch from master to 0.18-dev November 7, 2018 01:44

dirk force-pushed the sidekiq-client-tracer branch from 82b75ab to 3b159e1 Compare November 8, 2018 18:23

delner reviewed Nov 12, 2018

View reviewed changes

dirk added 12 commits November 16, 2018 18:22

Rename server-side Tracer to ServerTracer to make room for a clie…

79c2eff

…nt-side tracer

Missed a few spots with the rename

30bb41e

Original rename commit: 4f57b00

Add Sidekiq client-side middleware to trace job enqueues/pushes

7253b0a

Include client tracer in Sidekiq patcher

68a3ad1

Remove potentially-misleading span_type

894fb3c

Don't yet fully understand the best value to use here, so it's prudent to just leave it out until we figure it out.

Abstract out common Sidekiq tracer functionality into base class

6a7471e

Add top-level class documentation comment

bae1b48

Use mix-in instead of inheritance design pattern

ffd6cfd

Add assertion to check that the Sidekiq ClientTracer set the right …

c398e7e

…service

Add test for client-side Sidekiq tracer's behavior when there is no p…

0b1a092

…arent span

Follow style in other tests

a2fc245

Make Sidekiq client trace with a different service name

d84dfd2

dirk force-pushed the sidekiq-client-tracer branch from 1cdebfd to d84dfd2 Compare November 17, 2018 02:22

delner approved these changes Nov 19, 2018

View reviewed changes

delner merged commit 3ce6f3a into DataDog:0.18-dev Nov 26, 2018

delner mentioned this pull request Dec 6, 2018

Add Sidekiq client_service_name option to documentation #650

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sidekiq client-side tracing #602

Sidekiq client-side tracing #602

dirk commented Nov 1, 2018

delner left a comment

delner Nov 2, 2018

dirk commented Nov 6, 2018

delner Nov 6, 2018

delner commented Nov 6, 2018

dirk commented Nov 7, 2018

delner commented Nov 7, 2018

dirk commented Nov 9, 2018

delner Nov 12, 2018

delner Nov 12, 2018 •

edited

dirk Nov 17, 2018

dirk commented Nov 15, 2018

dirk commented Nov 17, 2018

delner left a comment

delner Nov 19, 2018

delner Nov 19, 2018

dirk Nov 21, 2018

delner Nov 21, 2018

Sidekiq client-side tracing #602

Sidekiq client-side tracing #602

Conversation

dirk commented Nov 1, 2018

delner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dirk commented Nov 6, 2018

Choose a reason for hiding this comment

delner commented Nov 6, 2018

dirk commented Nov 7, 2018

delner commented Nov 7, 2018

dirk commented Nov 9, 2018

Choose a reason for hiding this comment

delner Nov 12, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dirk commented Nov 15, 2018

dirk commented Nov 17, 2018

delner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

delner Nov 12, 2018 •

edited