Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto instrument Resque workers by default #1400

Merged
merged 7 commits into from
Apr 1, 2021
Merged

Auto instrument Resque workers by default #1400

merged 7 commits into from
Apr 1, 2021

Conversation

marcotc
Copy link
Member

@marcotc marcotc commented Mar 9, 2021

This PR adds dynamic auto instrumentation for Resque workers.

Currently users have to list all worker classes that need to be instrumented at configuration time. This is quite more work than required in most cases, given we can automatically instrument all jobs by patching Resque, and it can be challenging to have all worker classes loaded at application configuration time.

Users currently setting an explicit value to the workers: option will not be affected.
Users not setting any value to workers: (or setting it to nil) will now have auto instrumentation of all available workers enabled.

The workers: option itself does not follow our tracer philosophy: if we are able to automatically instrument, we should. With this in mind, this option has been marked for deprecation. We recommend removing it a letting ddtrace perform automatic instrumentation.

@marcotc marcotc added integrations Involves tracing integrations feature Involves a product feature labels Mar 9, 2021
@marcotc marcotc self-assigned this Mar 9, 2021
@marcotc marcotc requested a review from a team March 9, 2021 20:10
ivoanjo
ivoanjo previously approved these changes Mar 11, 2021
Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few suggestions, none of which are blocking. LGTM 👍

Datadog.configure do |c|
c.use :resque, options
c.use :resque, auto_instrument: true, **options
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change to **options seems like it would make sense for all other integrations described in this doc. Can I convince you to open a separate PR to convert all of them? ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

Comment on lines +1581 to 1582
require 'resque'
require 'ddtrace'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've noticed we suggest adding the require for most integrations, but I'm curious if we could do the require on our side instead -- if the customer explicitly states c.use :resque, would it make sense for us to auto-require?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upside: clients don't have to write require themselves.
Downside: it might not be transparent what's being required in their application: "Did ddtrace require 'active_support' or 'active_support/all'?".
Also, they might not find any require statements by "grepping" their code anymore for a library they use: "Where is Redis required? I can't find it anywhere...".

By the power vested in me, I invoke @delner to chime in here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe this comments blocks this PR, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeap, not blocking at all!

Comment on lines 26 to 29

# TODO: 1.0: When moving to auto patching all workers by default
# we should remove this setting, as it will be a no-op.
option :workers, default: []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the plan is to remove this, I suggest adding a warning now -- that way users get a bit more time to move and hopefully have updated their config when in the future we make this a no-op.

Comment on lines 32 to 35
# TODO: 1.0: Automatic patching should be the default behavior.
# We should not provide this option anymore when making it the default,
# as our integrations should always instrument all possible scenarios when feasible.
option :auto_instrument, default: false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mad scientist in me wonders, would it make sense for us to make these 1.0 intentions into code? E.g. something like option :auto_instrument, default: false, "default_for_v1": true and then allowing customers to do something like Datadog.configure { |c| c.use_v1_defaults = true }?

That way customers onboarding today could already use the recommended future configuration, rather than needing to start with the legacy configuration and then in future update it again for 1.0.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's too smart for my brain.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this, and the main issue with use_v1_defaults is that what use_v1_defaults means today will likely change tomorrow, when we decide to schedule the next breaking change for 1.0.

For example, if we had introduced use_v1_defaults in the previous release, this release would change how Redis workers are configured, effectively being a breaking change for use_v1_defaults users.
It would effectively mean that use_v1_defaults == "every release can be a breaking release".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this PR though, I think maybe the best thing to do is, for users that have c.use resque, but no workers setting, to effectively become auto_instrument.
Maybe change the semantics: remove auto_instrument option, and allow empty workers to be "auto-instrument".

What do you think? There's no right or wrong here: users with an explicit worker: nil will likely see instrumentation that they didn't expect, but a nil value is not even supported here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would effectively mean that use_v1_defaults == "every release can be a breaking release".

Yes, but you'd be signing up for those semantics for use_v1_defaults (e.g. that the defaults may change, as they adopt the v1 conventions), so I don't see that as a big issue. (I may be underestimating how often we run into issues, but my expectation is that if we consider something as the new default, that we're reasonably confident that it won't break)

What do you think? There's no right or wrong here: users with an explicit worker: nil will likely see instrumentation that they didn't expect, but a nil value is not even supported here.

Seems pretty reasonable, and it does the right thing for new customers, I like it!

Comment on lines 1595 to 1596
| `auto_instrument` | Instrument all Resque jobs (recommended). | `false` |
| `workers` | An array including all worker classes you want to trace (e.g. `[MyJob]`). Use `auto_instrument` instead if you'd like to instrument all jobs. | `[]` |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we document here as well the 1.0 intentions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not the default with latest changes, but still maintains backwards compatibility for users setting it explicitly.

Comment on lines 10 to 21
# Automatically configures jobs with {ResqueJob} plugin.
module Job
def perform
if Datadog.configuration[:resque][:auto_instrument]
job = payload_class
job.extend(Datadog::Contrib::Resque::ResqueJob) unless job.is_a? Datadog::Contrib::Resque::ResqueJob
end
ensure
super
end
end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if we could use Resque's hooks to do this instead, but indeed there seems to be no available option for "global hooks" that apply to every job.

Would it make sense to ask upstream for it? They seem to be interested in providing this kind of functionality, so they may be open to our request.

Of course we'd still need to keep this to support older Resque versions, but at least it'd be harder to break with newer Resque versions/other gems trying to instrument it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hooks, as they have today, don't exist at a global level, only on a per-worker basis, thus why we register hooks on each worker we encounter.
I think that suggesting global hooks make sense, I'll look into that.
But, like you said, we still need to support existing versions of Resque.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's definitely a case of "let's lay the groundwork for a better approach, so that we can reap those rewards in a few years" :)

super
end
end

# Uses Resque job hooks to create traces
module ResqueJob
def around_perform(*args)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we perhaps rename this around_perform_datadog_tracing, to avoid collisions? The hook documentation mentions we can (and should) use prefixes for naming the hooks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment on lines 14 to 19
let(:worker) { Resque::Worker.new(queue_name) }
let(:job_class) do
stub_const('TestJob', Class.new).tap do |mod|
mod.send(:extend, Datadog::Contrib::Resque::ResqueJob)
mod.send(:define_singleton_method, :perform) do |*args|
# Do nothing by default.
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we merge this file back into the instrumentation_spec? I was scratching my head for a few minutes trying to see where the job_class came from in that spec, and it comes from this separate file that's only used once -- by the other spec 😭

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@marcotc marcotc changed the title Option to auto instrument Resque Auto instrument Resque workers by default Mar 31, 2021
@marcotc marcotc requested review from ivoanjo and delner March 31, 2021 22:32
ivoanjo
ivoanjo previously approved these changes Apr 1, 2021
Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Comment on lines +34 to +40
o.on_set do |value|
unless value.nil?
Datadog.logger.warn(
"DEPRECATED: Resque integration now instruments all workers. \n" \
'The `workers:` option is unnecessary and will be removed in the future.'
)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Should we also warn when value.empty? E.g. if you're enabling the integration, but then not instrumenting any workers, that seems like a configuration mistake.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the end of day, the message is the same for folks with [] or [MyWorker]: don't configure it explicitly.

I think that the specific case for users with [] doesn't warrant it's own case, given that the deprecation warning gives them the correct instructions going forward.

On a side note, validation options is something that we should address regardless in the tracer as a whole.

#
# We could also just use `around_perform` but this might override the user's
# own method.
def around_perform0ddtrace(*args)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Maybe

Suggested change
def around_perform0ddtrace(*args)
def around_perform0_ddtrace(*args)

just to make it a little more readable?

Comment on lines +23 to +24
workers = Datadog.configuration[:resque][:workers] || []
workers.each { |worker| worker.extend(ResqueJob) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I was thinking we could instead tweak Datadog::Contrib::Resque::Job to instrument workers in the list "just-in-time", similar to how we do it when workers is nil. This way, we would have almost the same behavior between both options, instead of the current setup where the extend happens at different times based on the configuration. (And we could remove these two lines entirely)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to touch these lines, because this behaviour (explicit configuration) is going to be deprecate anyway. It's a low risk change, but given the low value, I thought leaving it as it, and simply removing in the future was the best thing to to day.

@marcotc marcotc merged commit 51789cb into master Apr 1, 2021
@marcotc marcotc deleted the resque-auto branch April 1, 2021 18:15
@github-actions github-actions bot added this to the 0.48.0 milestone Apr 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Involves a product feature integrations Involves tracing integrations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants