-
Notifications
You must be signed in to change notification settings - Fork 464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keyword detector optimization #396
Keyword detector optimization #396
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR introduces a variety of different, distinct changes, that I have comments on individually. As such, I figured it might be better to write them here, rather than at specific lines.
Increasing "filetype" intelligence
All for this. Yay! I was originally worried about the compatibility with the config transformer, but it looks like the config transformer just attempts all files anyway, so we should be fine.
Keyword Detector Improvements
TBH, this file continues to mutate, and becomes increasingly hard to understand -- so it takes me a while to understand your changes. I like your FOLLOWED_BY_COMPARATION_QUOTES_REQUIRED_REGEX
and FOLLOWED_BY_REV_COMPARATION_QUOTES_REQUIRED_REGEX
, but I'm uncertain of the implications of modifying the core DENYLIST_REGEX
and SECRET
regexes.
It should be the case that secret values don't have quotes in them: but I need to think more deeply about whether we can assert this.
KeywordXML
Ok, this absolutely needs to change. Along the same lines of why the KeywordDetector is in need for some serious love (to increase code flexibility), I'm against a direct copy-paste of the KeywordDetector
to make a version for XML files.
Have you considered writing an XML transformer instead? This will allow it to take advantage of all the other features that v1 provides. For prior art, check out YAMLTransformer.
FOLLOWED_BY_QUOTES_AND_SEMICOLON_REGEX = re.compile( | ||
# e.g. private_key "something"; | ||
r'({denylist}){nonWhitespace}{whitespace}({quote})({secret})(\2);'.format( | ||
r'({denylist}){nonWhitespace}{whitespace}({quote})({secret})({quote});'.format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was written as such to make sure that we don't have a situation as such:
private_key "something'
Same thing goes for your other regex modifications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This regex was updated to be sure that the last quote won't be included as a secret in the audit code snippet. The original SECRET
regex allows the '
and "
characters, so the last quote will be included in the secret value. I have modified the SECRET
regex to match only with empty values (""
or ''
, commented below) or whatever value without '
, "
and whitespaces. With this changes, I have included the last quote as a new item in this regex to make it more specific.
@pytest.mark.skip( | ||
reason='TODO: false positive heuristics need to be migrated over to filters/*', | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean to skip this test? All filters have been migrated already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry, when I have implemented this feature, the filters haven't been migrated already. I will commit it comming soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @domanchi, I have reviewed the default filters of detect-secrets:
"filters_used": [
{
"path": "detect_secrets.filters.allowlist.is_line_allowlisted"
},
{
"path": "detect_secrets.filters.common.is_ignored_due_to_verification_policies",
"min_level": 2
},
{
"path": "detect_secrets.filters.heuristic.is_likely_id_string"
},
{
"path": "detect_secrets.filters.heuristic.is_potential_uuid"
},
{
"path": "detect_secrets.filters.heuristic.is_sequential_string"
}
],
In the default filters, we haven't found any adequate to our test's needs. I think that the most useful filter in this case, can be heuristic.is_templated_secret
. We haven't found any references to this filter in the documentation and we think that it should be established by default. Are this tasks in your roadmap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I must have missed this during my migration effort. I'll cut a ticket for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the modifications in this variables are based on our experience analyzing multiples repositories. With this modifications we note an improvement in the results obtained with detect-secrets. |
Okey, I will work in your proposal comming soon. If you like the other changes included in this Pull Request, we can remove the |
Hi @domanchi, I close this Pull Request. @syn-4ck and I will contribute all this features adapted to the new version of detect-secrets comming soon. Here is our first contribution #418 related to this Pull Request, I think that it will be easier to discuss every change in keyword plugin using smaller Pull Requests. Thank you very much! |
Co-authored-by: Renovate Bot <renovate@whitesourcesoftware.com> Co-authored-by: Victoria M Miltcheva <Victoria.Miltcheva@ibm.com>
This pull request includes the following improvements to the
Keyword Detector
plugin:INI
,YAML
,properties
,C
,C++
,C#
,bash
,PowerShell
.secret
will detect as secret the following linessecret = value
andmy_special_secret_for_my_db = value
. Closes Should 'secret_key' be added to the keyword plugin? #148if (password == 'value')
orif ('value' == password)
.DENYLIST
andFALSE_POSITIVES
lists.This pull request includes a new plugin called
Keyword XML Detector
to analyze all the situations where a secret could be hardcoded in a XML file. Closes #270The new plugins have the associated tests and all the implementation pass the pre-commit-hook established by the open source project.