[contrib] Separate resolver configuration from resolution #1319

marcotc · 2021-01-21T21:08:50Z

Motivation

This PR separates the process of configuring an integration describes field:

c.use :active_record, describes: { adapter: 'mysql2', database:  'reports'}

From the actual matching of that describes matcher with the real production values:

Datadog.configuration[:active_record, db_hash_config]

Currently, both operations are performed by the method Datadog::Contrib::Configuration::Resolver#resolve(key).

These operations are intrinsically different: the former stores a matching object for future matching, while the later perform any necessary matching.

The result of #resolve is then used as the key to a Hash object holding all configurations for an integration.

This works well when the configuration describes object can be directly translated into a single object that is equal to the production value eventually passed to #resolve for later matching.

However, when the configuration logic has to differ from the production value matching logic, this interface does not work anymore. This happens when the describes object supports complex matching, for example, partial matching using regular expressions or procs. Also, when the production value needs to be processed into a canonical form.

Our existing matchers canonicalise all describes matchers and production values into the same format. This works fine for their current scope, where all configuration objects match 1:1 to production values, but prevents us from creating more user-friendly matcher, like partial matchers.

Implementation

This PR breaks down #resolve into two methods: #add, which configures a new matcher, and #resolve which performs the matching of previously configured matchers with a production value.

With this change, all logic around the fallback :default configuration was better suited to reside in the parent object of the integration resolvers: Datadog::Contrib::Configurable. Resolvers now return nil if they can't find a configuration: Datadog::Contrib::Configurable takes care of falling back to :default.

To support all needed matching logic, resolvers are now responsible for storing all configured describes matchers for an integration. This is necessary because some matchers might want to iterate over configured matchers to find the appropriate one, instead of the previous logic of a simple Hash key lookup matching.

With the refactor of existing resolver implementations, the ActiveRecord resolver now supports partial matching: by providing only a subset of the database configuration fields, it's possible to match a connection (e.g. describes: { database: 'reports' } will successfully match any connection with database name reports, regardless of adapter, host, port or username). When multiple matchers would match, the latest #added one will be matched. This behaviour is explained in our documentation.

Breaking change

Please use this section the release notes. Links in this snippet purposely point to a future, nonexistent release at this point.

`Datadog::Contrib::Configuration::Resolver` interface change

The interface of Datadog::Contrib::Configuration::Resolver has changed: custom configuration resolvers that inherit from Datadog::Contrib::Configuration::Resolver will need be changed to fulfill the new interface. See inline documentation for Datadog::Contrib::Configuration::Resolver for specific requirements: https://github.com/DataDog/dd-trace-rb/blob/v0.45.0/lib/ddtrace/contrib/configuration/resolver.rb

ActiveRecord `describes` supports partial matching

This PR also broadens ActiveSupport describes configuration rules: partial matching of connection fields (adapter, username, host, port, database) now allowed. Previously, only an exact match of ActiveRecord connections was allowed.

If you have any c.use active_record, describe: statements in your application that are currently not matching any connection, you might start seeing them match after this release.

c.use active_record, describe: statements that are currently matching a connection will continue to match it.

You can refer to the expanded ActiveSupport documentation for details on how to use the new partial matchers and code examples: https://github.com/DataDog/dd-trace-rb/blob/v0.45.0/docs/GettingStarted.md#active-record

delner

Overall, I like what this does. When resolvers were first implemented, they were meant to be simplistic because they didn't have need for complex use cases like partial matching. This seems like a natural expansion to that.

On this:

This is necessary because some matchers might want to iterate over configured matchers to find the appropriate one, instead of the previous logic of a simple Hash key lookup matching.

Originally it was made with a simple hash key in part because it would be fast to lookup, particularly because there could be multiple accesses in time-sensitive blocks (like trace)... minimizing overhead (especially evaluation of a list of regexes/functions) is of a concern.

What's the performance impact on average cases? e.g. Rails instrumentation? This might be hard to measure directly, but I'd like to know how much faster/slower access is in a general sense with this resolution strategy. If we had some kind of modeled benchmark that would be sufficient.

marcotc · 2021-03-09T21:59:52Z

@delner I've added a couple benchmarks in the latest commit. Here are the results:

ActiveRecord

Calling Gadget.count with the following configuration:

c.use :active_record, describes: :gadget do |gadget_db|
  gadget_db.service_name = 'gadget-db'
end

BEFORE
Warming up --------------------------------------
               count    86.000  i/100ms
Calculating -------------------------------------
               count    747.401  (± 7.6%) i/s -     11.180k in  15.059467s

AFTER
Warming up --------------------------------------
               count    84.000  i/100ms
Calculating -------------------------------------
               count    750.801  (± 8.1%) i/s -     11.256k in  15.092532s

My interpretation is that the results here fall within the margin of error. I wouldn't read too much into the newer version been slightly faster, given the variance.

I've ran this quite a few times and the results are consistently in the range of 750 i/s.

PatternResolver

This is the resolver used by our HTTP clients:

BEFORE
Warming up --------------------------------------
resolver.resolve(str)
                        49.060k i/100ms
Calculating -------------------------------------
resolver.resolve(str)
                        496.762k (± 3.4%) i/s -      7.457M in  15.032359s

AFTER
Warming up --------------------------------------
resolver.resolve(str)
                        67.633k i/100ms
Calculating -------------------------------------
resolver.resolve(str)
                        677.540k (± 1.7%) i/s -     10.213M in  15.077617s

This version is actually measurably faster, as the logic in https://github.com/DataDog/dd-trace-rb/pull/1319/files#diff-cabd766bde8c4f7486ea81031a768e22d808bac0d30bd58c58341fa355778313L15-L20 was simplified due to the fact that we don't have to defensively check for (pattern == name) # Only required during configuration time every time: that logic was moved to the #add method, instead of #resolve.

This is actually one of the code smells that we aimed to address with this PR.

delner

Awesome; glad to hear this improves performance! 👍

[contrib] Separate resolver configuration from resolution

0c80add

marcotc added integrations Involves tracing integrations breaking-change Involves a breaking change labels Jan 21, 2021

marcotc requested a review from a team January 21, 2021 21:08

marcotc self-assigned this Jan 21, 2021

Merge branch 'master' into contrib-settings-resolver

7e0449a

delner reviewed Mar 8, 2021

View reviewed changes

Add benchmarks

b541055

marcotc requested a review from delner March 9, 2021 21:59

delner approved these changes Mar 9, 2021

View reviewed changes

marcotc merged commit 8e0cceb into master Mar 9, 2021

marcotc deleted the contrib-settings-resolver branch March 9, 2021 23:09

github-actions bot added this to the 0.47.0 milestone Mar 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[contrib] Separate resolver configuration from resolution #1319

[contrib] Separate resolver configuration from resolution #1319

marcotc commented Jan 21, 2021 •

edited

delner left a comment

marcotc commented Mar 9, 2021 •

edited

delner left a comment

[contrib] Separate resolver configuration from resolution #1319

[contrib] Separate resolver configuration from resolution #1319

Conversation

marcotc commented Jan 21, 2021 • edited

Motivation

Implementation

Breaking change

Datadog::Contrib::Configuration::Resolver interface change

ActiveRecord describes supports partial matching

delner left a comment

Choose a reason for hiding this comment

marcotc commented Mar 9, 2021 • edited

ActiveRecord

PatternResolver

delner left a comment

Choose a reason for hiding this comment

marcotc commented Jan 21, 2021 •

edited

`Datadog::Contrib::Configuration::Resolver` interface change

ActiveRecord `describes` supports partial matching

marcotc commented Mar 9, 2021 •

edited