Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[contrib] Separate resolver configuration from resolution #1319

Merged
merged 3 commits into from
Mar 9, 2021

Conversation

marcotc
Copy link
Member

@marcotc marcotc commented Jan 21, 2021

Motivation

This PR separates the process of configuring an integration describes field:

c.use :active_record, describes: { adapter: 'mysql2', database:  'reports'}

From the actual matching of that describes matcher with the real production values:

Datadog.configuration[:active_record, db_hash_config]

Currently, both operations are performed by the method Datadog::Contrib::Configuration::Resolver#resolve(key).

These operations are intrinsically different: the former stores a matching object for future matching, while the later perform any necessary matching.

The result of #resolve is then used as the key to a Hash object holding all configurations for an integration.

This works well when the configuration describes object can be directly translated into a single object that is equal to the production value eventually passed to #resolve for later matching.

However, when the configuration logic has to differ from the production value matching logic, this interface does not work anymore. This happens when the describes object supports complex matching, for example, partial matching using regular expressions or procs. Also, when the production value needs to be processed into a canonical form.

Our existing matchers canonicalise all describes matchers and production values into the same format. This works fine for their current scope, where all configuration objects match 1:1 to production values, but prevents us from creating more user-friendly matcher, like partial matchers.

Implementation

This PR breaks down #resolve into two methods: #add, which configures a new matcher, and #resolve which performs the matching of previously configured matchers with a production value.

With this change, all logic around the fallback :default configuration was better suited to reside in the parent object of the integration resolvers: Datadog::Contrib::Configurable. Resolvers now return nil if they can't find a configuration: Datadog::Contrib::Configurable takes care of falling back to :default.

To support all needed matching logic, resolvers are now responsible for storing all configured describes matchers for an integration. This is necessary because some matchers might want to iterate over configured matchers to find the appropriate one, instead of the previous logic of a simple Hash key lookup matching.

With the refactor of existing resolver implementations, the ActiveRecord resolver now supports partial matching: by providing only a subset of the database configuration fields, it's possible to match a connection (e.g. describes: { database: 'reports' } will successfully match any connection with database name reports, regardless of adapter, host, port or username). When multiple matchers would match, the latest #added one will be matched. This behaviour is explained in our documentation.

Breaking change

Please use this section the release notes. Links in this snippet purposely point to a future, nonexistent release at this point.

Datadog::Contrib::Configuration::Resolver interface change

The interface of Datadog::Contrib::Configuration::Resolver has changed: custom configuration resolvers that inherit from Datadog::Contrib::Configuration::Resolver will need be changed to fulfill the new interface. See inline documentation for Datadog::Contrib::Configuration::Resolver for specific requirements: https://github.com/DataDog/dd-trace-rb/blob/v0.45.0/lib/ddtrace/contrib/configuration/resolver.rb

ActiveRecord describes supports partial matching

This PR also broadens ActiveSupport describes configuration rules: partial matching of connection fields (adapter, username, host, port, database) now allowed. Previously, only an exact match of ActiveRecord connections was allowed.

If you have any c.use active_record, describe: statements in your application that are currently not matching any connection, you might start seeing them match after this release.

c.use active_record, describe: statements that are currently matching a connection will continue to match it.

You can refer to the expanded ActiveSupport documentation for details on how to use the new partial matchers and code examples: https://github.com/DataDog/dd-trace-rb/blob/v0.45.0/docs/GettingStarted.md#active-record

@marcotc marcotc added integrations Involves tracing integrations breaking-change Involves a breaking change labels Jan 21, 2021
@marcotc marcotc requested a review from a team January 21, 2021 21:08
@marcotc marcotc self-assigned this Jan 21, 2021
Copy link
Contributor

@delner delner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I like what this does. When resolvers were first implemented, they were meant to be simplistic because they didn't have need for complex use cases like partial matching. This seems like a natural expansion to that.

On this:

This is necessary because some matchers might want to iterate over configured matchers to find the appropriate one, instead of the previous logic of a simple Hash key lookup matching.

Originally it was made with a simple hash key in part because it would be fast to lookup, particularly because there could be multiple accesses in time-sensitive blocks (like trace)... minimizing overhead (especially evaluation of a list of regexes/functions) is of a concern.

What's the performance impact on average cases? e.g. Rails instrumentation? This might be hard to measure directly, but I'd like to know how much faster/slower access is in a general sense with this resolution strategy. If we had some kind of modeled benchmark that would be sufficient.

@marcotc
Copy link
Member Author

marcotc commented Mar 9, 2021

@delner I've added a couple benchmarks in the latest commit. Here are the results:

ActiveRecord

Calling Gadget.count with the following configuration:

c.use :active_record, describes: :gadget do |gadget_db|
  gadget_db.service_name = 'gadget-db'
end
BEFORE
Warming up --------------------------------------
               count    86.000  i/100ms
Calculating -------------------------------------
               count    747.401  (± 7.6%) i/s -     11.180k in  15.059467s

AFTER
Warming up --------------------------------------
               count    84.000  i/100ms
Calculating -------------------------------------
               count    750.801  (± 8.1%) i/s -     11.256k in  15.092532s

My interpretation is that the results here fall within the margin of error. I wouldn't read too much into the newer version been slightly faster, given the variance.

I've ran this quite a few times and the results are consistently in the range of 750 i/s.

PatternResolver

This is the resolver used by our HTTP clients:

BEFORE
Warming up --------------------------------------
resolver.resolve(str)
                        49.060k i/100ms
Calculating -------------------------------------
resolver.resolve(str)
                        496.762k (± 3.4%) i/s -      7.457M in  15.032359s

AFTER
Warming up --------------------------------------
resolver.resolve(str)
                        67.633k i/100ms
Calculating -------------------------------------
resolver.resolve(str)
                        677.540k (± 1.7%) i/s -     10.213M in  15.077617s

This version is actually measurably faster, as the logic in https://github.com/DataDog/dd-trace-rb/pull/1319/files#diff-cabd766bde8c4f7486ea81031a768e22d808bac0d30bd58c58341fa355778313L15-L20 was simplified due to the fact that we don't have to defensively check for (pattern == name) # Only required during configuration time every time: that logic was moved to the #add method, instead of #resolve.

This is actually one of the code smells that we aimed to address with this PR.

@marcotc marcotc requested a review from delner March 9, 2021 21:59
Copy link
Contributor

@delner delner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome; glad to hear this improves performance! 👍

@marcotc marcotc merged commit 8e0cceb into master Mar 9, 2021
@marcotc marcotc deleted the contrib-settings-resolver branch March 9, 2021 23:09
@github-actions github-actions bot added this to the 0.47.0 milestone Mar 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change Involves a breaking change integrations Involves tracing integrations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants