Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add query string automatic redaction #2283

Merged
merged 14 commits into from
Sep 23, 2022

Conversation

lloeki
Copy link
Contributor

@lloeki lloeki commented Sep 19, 2022

What does this PR do?

Redacts query string sensitive data using a regex-based heuristic.

Motivation

Smartly prevent information leakage when query quantization is set to all

Additional Notes

Needed by ASM to allow quantize.query.show to safely change its default to :all.

The proposed code attempts to implement the query string obfuscator specification as accurately as possible.

How to test the change?

Specs updated with as much corner cases as I could think of.

Manually, try something like this:

curl -vvv 'http://127.0.0.1:9292/hi?password=bar&token=42&json=%7B%22sign%22%3A%22foo%22%7D'

@lloeki lloeki requested a review from a team September 19, 2022 16:24
@lloeki lloeki force-pushed the add-query-string-automatic-redaction branch from cd4f348 to cd9711d Compare September 20, 2022 08:45
| `quantize.fragment` | Defines behavior for URL fragments. Removes fragments by default. May be `:show` to show URL fragments. Option must be nested inside the `quantize` option. | `nil` |
| `request_queuing` | Track HTTP request time spent in the queue of the frontend server. See [HTTP request queuing](#http-request-queuing) for setup details. Set to `true` to enable. | `false` |
| `web_service_name` | Service name for frontend server request queuing spans. (e.g. `'nginx'`) | `'web-server'` |

Deprecation notice:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these should be notified in the changelog as well.

Should we also have them shown at install time via the gemspec post install callback?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first step is to go through the GettingStarted.md and replacing all places where we want quantize: { base: :show }, { query: { show: :all }, obfuscate: :internal }} (Is this the desired behaviour? Please let me know) to be the new default and recommend people to use that configuration.
New users installing ddtrace will see those instructions and starting using 2.0 defaults today.

Regarding how to inform users, it seems to me that we need an official way to track future breaking changes, both to inform clients in advance and also for ourselves to not forget about them in the major version migration guide.

I think the way you capture it here is a good start, under a Deprecation notice: section.
I'm not 100% on the best way to warn users, but definitely a changelog entry for the next 1.x release should be present.

Copy link
Contributor Author

@lloeki lloeki Sep 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first step is to go through the GettingStarted.md and replacing all places where we want

Very good point!

quantize: { base: :show }, { query: { show: :all }, obfuscate: :internal }} (Is this the desired behaviour? Please let me know)

Definitely, although query: { obfuscate: :internal } defaults query: { show: to :all.

Regarding how to inform users, it seems to me that we need an official way to track future breaking changes, both to inform clients in advance and also for ourselves to not forget about them in the major version migration guide.

Yeah, I was thinking about leveraging gem post install, and having a list of warnings encoded in a (ruby? json? yaml? custom?) file that can be both programmatically referred to (thus outputting only relevant messages) as well as human readable for users to follow and go through in the repo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll do that in another PR, along with this suggestion: #2265 (comment)

| `quantize.query` | Hash containing options for query portion of URL quantization. May include `:show` or `:exclude`. See options below. Option must be nested inside the `quantize` option. | `{}` |
| `quantize.query.show` | Defines which values should always be shown. Shows no values by default. May be an Array of strings, or `:all` to show all values. Option must be nested inside the `query` option. | `nil` |
| `quantize.query.exclude` | Defines which values should be removed entirely. Excludes nothing by default. May be an Array of strings, or `:all` to remove the query string entirely. Option must be nested inside the `query` option. | `nil` |
| `quantize.query.obfuscate` | Defines query string redaction behaviour. Redacts nothing by default. May be a hash of options, or `:internal` to use the default internal obfuscation settings. Note that obfuscation is a string-wise operation, not a key-value operation. When enabled, `query.show` defaults to `:all` if otherwise unset. Option must be nested inside the `query` option. | `nil` |
Copy link
Contributor Author

@lloeki lloeki Sep 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative to quantize.query.obfuscate is to have another value for quantize.query.show, e.g :obfuscate, but then it becomes trickier to control which are shown, as well as providing obfuscator options.

@lloeki lloeki changed the base branch from master to use-full-url-in-http-url-tag September 20, 2022 08:51
@@ -59,22 +59,24 @@ def query(query, options = {})

def query!(query, options = {})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method now has crazy logic to implement the expected behaviour, especially when :show (whether Array or :all), :exclude, and :obfuscate interact.

[key, value]
end
end
end unless options[:show] == :all && !(options[:obfuscate] && options[:exclude])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Short circuit when :all is known not to come from :obfuscate, but it still needs to go through if options[:exclude] is set.

if options[:exclude].include?(key)
[nil, nil]
else
value = options[:show].include?(key) ? value : nil
value = (options[:show] == :all || options[:show].include?(key)) ? value : nil
Copy link
Contributor Author

@lloeki lloeki Sep 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed because of the (options[:obfuscate] && options[:exclude]) case, which may have options[:show] == :all.

end
end unless options[:show] == :all && !(options[:obfuscate] && options[:exclude])

options[:obfuscate] ? obfuscate_query(query, options[:obfuscate]) : query
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contrary to what I previously thought, obfuscation does not operate on key value pairs but on the string itself. Therefore it is a separate, second stage, taking place after collect_query.

@lloeki lloeki force-pushed the add-query-string-automatic-redaction branch from cd9711d to 9290466 Compare September 20, 2022 09:03
@lloeki lloeki force-pushed the add-query-string-automatic-redaction branch 2 times, most recently from bccc876 to e35bcbd Compare September 20, 2022 13:31
docs/GettingStarted.md Outdated Show resolved Hide resolved
docs/GettingStarted.md Outdated Show resolved Hide resolved
| `quantize.fragment` | Defines behavior for URL fragments. Removes fragments by default. May be `:show` to show URL fragments. Option must be nested inside the `quantize` option. | `nil` |
| `request_queuing` | Track HTTP request time spent in the queue of the frontend server. See [HTTP request queuing](#http-request-queuing) for setup details. Set to `true` to enable. | `false` |
| `web_service_name` | Service name for frontend server request queuing spans. (e.g. `'nginx'`) | `'web-server'` |

Deprecation notice:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first step is to go through the GettingStarted.md and replacing all places where we want quantize: { base: :show }, { query: { show: :all }, obfuscate: :internal }} (Is this the desired behaviour? Please let me know) to be the new default and recommend people to use that configuration.
New users installing ddtrace will see those instructions and starting using 2.0 defaults today.

Regarding how to inform users, it seems to me that we need an official way to track future breaking changes, both to inform clients in advance and also for ourselves to not forget about them in the major version migration guide.

I think the way you capture it here is a good start, under a Deprecation notice: section.
I'm not 100% on the best way to warn users, but definitely a changelog entry for the next 1.x release should be present.

docs/GettingStarted.md Show resolved Hide resolved
lib/datadog/tracing/contrib/utils/quantization/http.rb Outdated Show resolved Hide resolved
@lloeki lloeki force-pushed the add-query-string-automatic-redaction branch from e35bcbd to 016beca Compare September 21, 2022 12:56
Copy link
Member

@marcotc marcotc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Only a few minor comments left to address, but ✅ overall.

@lloeki lloeki force-pushed the add-query-string-automatic-redaction branch from 6c9c329 to d374ea6 Compare September 23, 2022 11:11
@lloeki
Copy link
Contributor Author

lloeki commented Sep 23, 2022

I think all items have been addressed, on to fix the linter breakage.

Base automatically changed from use-full-url-in-http-url-tag to master September 23, 2022 11:47
@lloeki lloeki force-pushed the add-query-string-automatic-redaction branch from d374ea6 to 3c7bacf Compare September 23, 2022 12:08
In practice this cop seems to be disabled as soon as we reach the limit.
@lloeki lloeki merged commit 0a9a7ae into master Sep 23, 2022
@lloeki lloeki deleted the add-query-string-automatic-redaction branch September 23, 2022 12:35
@github-actions github-actions bot added this to the 1.5.0 milestone Sep 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants