-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Normalized URL as the resource name #442
Conversation
8f3fdcf
to
4de654d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome stuff here! 😄
Just added a few comments if they make sense to you
src/DDTrace/Bootstrap.php
Outdated
@@ -112,6 +110,12 @@ private static function initRootSpan() | |||
$span->setTag(Tag::HTTP_URL, $_SERVER['REQUEST_URI']); | |||
// Status code defaults to 200, will be later on changed when http_response_code will be called | |||
$span->setTag(Tag::HTTP_STATUS_CODE, 200); | |||
// Normalized URL as the resource name | |||
$normalizer = new Urls(explode(',', getenv('DD_TRACE_RESOURCE_URI_MAPPING'))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of comments here, if they make sense to you:
- While I see our conversation about simplifying the way we read configurations, wouldn't you see more strategic to not use
getenv
here? What I mean is that you may just want to move thisgetenv()
call to a method inConfig
and not use all the underlying infrastructure. - I was evaluating pros and cons of doing this here or as a post processing operation. Doing it here means that devs cannot, as an example, set this env in their
index.php
or the likes. In the end we need that info only at the very end, just before we send the trace. Wouldn't you think it is a good idea to push this setting to the very end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great points!
What I mean is that you may just want to move this getenv() call to a method in Config and not use all the underlying infrastructure.
I'm certainly open to moving the getenv()
to an abstraction. :) What do you see as the main advantage for abstracting this one-time env access?
Wouldn't you think it is a good idea to push this setting to the very end?
This is a great idea! I'll move it to right before the flush... maybe we should start a discussion about possibly adding span filters in a future PR? :)
@@ -0,0 +1,48 @@ | |||
<?php | |||
|
|||
namespace DDTrace\Configuration; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love this class, I wonder why it is in the Configuration
namespace though? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good point. Naming is hard. 😄
What do you think about adding an Obfuscation
namespace? We could also move the existing obfuscation stuff to that namespace in a separate PR. And possibly organize that code a bit better. What do you think? :)
src/DDTrace/Http/Urls.php
Outdated
// UUID's | ||
'|\b([0-9a-f]{8}-[0-9a-f]{4}-[1-5][0-9a-f]{3}-[89AB][0-9a-f]{3}-[0-9a-f]{12})\b|i', | ||
// 16-512 bit hex hashes | ||
'|\b([0-9a-f]{4,128})\b|i', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest I would not add this. Isn't it too risky to have false positives with a 4-128 range? WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a security standpoint, I'd rather err on the side of over-obfuscation, but I totally see your point as well.
I was thinking about valid words that are 4+ characters that contain only a-f (like cafe). One way around erroneous obfuscation would be curating a whitelist of valid words that are 4+ and a-f. Perhaps we could eventually offer a feature to add words to the whitelist in the event there is a proper noun or a made-up word that's important (like affeca). There are also i18n considerations to support other languages.
In the end, a configurable whitelist that's also very fast (maybe making use of a Trie data structure or something) would be a bit of an undertaking, but it might be worth the tradeoff to ensure better security.
If you're OK with it, I'd like to ping someone from the security department to weigh in on this one. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now I've narrowed the scope of this to start at 32-bit hashes since an 8-character word with only a-f is much less likely to occur. :)
src/DDTrace/Http/Urls.php
Outdated
/** | ||
* A utility class that provides methods to work on urls | ||
*/ | ||
class Urls | ||
{ | ||
private static $defaultPatterns = [ | ||
// UUID's | ||
'|\b([0-9a-f]{8}-[0-9a-f]{4}-[1-5][0-9a-f]{3}-[89AB][0-9a-f]{3}-[0-9a-f]{12})\b|i', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just based on my previous experience, can we make the -
optional? We slightly increase the risk to get false positives here but it is limited to only hex strings with one exact number of chars, which is perfectly acceptable in my opinion.
I really appreciate the detailed regex considering [89AB]
from the spec. I would ask you though:
- to add a comment to the section in wikipedia explaining it as it surprised me :) and I had to go through a few google links to find it 😄
- for consistency with the other parts of the expression can we make it lowercase, the pattern in
i
nsensitive so it should make no difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just based on my previous experience, can we make the - optional?
Good call! There's a test for this and it is obfuscated by the regex that catches "16-512 bit hex hashes". But since we may end up changing/removing that regex based on your other comment above, I'll change the regex to make the dashes optional. :)
to add a comment to the section in wikipedia explaining it
Good idea! I'll add that. :)
for consistency with the other parts of the expression can we make it lowercase
Another good idea. I'll change that. :)
src/DDTrace/Http/Urls.php
Outdated
*/ | ||
public function __construct(array $patternsWithWildcards = []) | ||
{ | ||
$this->replacementPatterns = array_map( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Optional]: this is a matter of styles and this comment is totally optional. I love to Ctrl+click
around to see who is using a method. While I know that UTs would detect something wrong here, I still prefer the plain old for
loop. This is personal here :) don't hate me 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lol - I don't mind refactoring this to a foreach
loop since we aren't able to support ClassName::class
. :)
public function wildcardToRegexExamples() | ||
{ | ||
return [ | ||
['/foo/*/bar', ['|^/foo/.+/bar$|', '/foo/?/bar']], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens to ^/foo/.+/bar$
with the pattern /foo/123/legitimate/bar
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wildcards are greedy by default. Do you think we should add a non-greedy wildcard like *?
? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's leave like this, and users can add their own specific paths. Also, if we receive feedback about it then we can iterate on it.
d45db31
to
a5843a9
Compare
…se-positive matches
…etting custom URL rules at runtime
a5843a9
to
b4e06ae
Compare
@@ -34,7 +34,7 @@ public function load() | |||
|
|||
dd_trace('Laravel\Lumen\Application', 'dispatch', function () use ($span) { | |||
$response = dd_trace_forward_call(); | |||
$resourceName = 'unnamed_route'; | |||
$resourceName = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look at who is sneaking in 😄 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey just a couple of things that I think we forgot about and then we are in awesome-land :)
@@ -0,0 +1,29 @@ | |||
<?php |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shuold we move this to Obfuscation namespace as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doh! 🤦♂ Nice catch! 😄
public function wildcardToRegexExamples() | ||
{ | ||
return [ | ||
['/foo/*/bar', ['|^/foo/.+/bar$|', '/foo/?/bar']], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's leave like this, and users can add their own specific paths. Also, if we receive feedback about it then we can iterate on it.
@@ -399,6 +404,25 @@ private function addHostnameToRootSpan() | |||
} | |||
} | |||
|
|||
private function addUrlAsResourceNameToRootSpan() | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the only thing we miss is to make this OFF by default. This in order to prevent unexpected huge cardinality (and so bill) issues out of the blu.
…CE_URL_AS_RESOURCE_NAMES_ENABLED=true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love this new feature. Excellent work!
Description
This PR adds URL normalization so that that they can be used as a low-cardinality resource name. The default substitution patterns scan for UUID's, hexadecimal hashes, and int's. We can certainly add more to that list.
The
DD_TRACE_RESOURCE_URI_MAPPING
environment variable can be set to a CSV list of URL-to-resource-name mapping rules that can contain*
and$*
wildcards.*
wildcard will match one or more characters to be replaced with?
$*
wildcard will match one or more characters without any replacementFor example, given
DD_TRACE_RESOURCE_URI_MAPPING=/user/*,/city/$*/*
:/user/sammyk
/user/?
/city/foo/123
/city/foo/?
/city/bar/123
/city/bar/?
Readiness checklist