Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keyword detector optimization #396

Closed
wants to merge 10 commits into from
Closed

Keyword detector optimization #396

wants to merge 10 commits into from

Conversation

pablosnt
Copy link
Contributor

This pull request includes the following improvements to the Keyword Detector plugin:

  • Specifically support for the secret detection in new file types: INI, YAML, properties, C, C++, C#, bash, PowerShell.
  • Secret detection by keyword combined with any other value. For example: the keyword secret will detect as secret the following lines secret = value and my_special_secret_for_my_db = value. Closes Should 'secret_key' be added to the keyword plugin? #148
  • Consideration of new situations where a secret could be hardcoded. For example: comparations between a variable and a hardcoded value if (password == 'value') or if ('value' == password).
  • Improvement of the secret selection based on the regex group.
  • New items in DENYLIST and FALSE_POSITIVES lists.

This pull request includes a new plugin called Keyword XML Detector to analyze all the situations where a secret could be hardcoded in a XML file. Closes #270

The new plugins have the associated tests and all the implementation pass the pre-commit-hook established by the open source project.

@pablosnt pablosnt mentioned this pull request Jan 29, 2021
Copy link
Contributor

@domanchi domanchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR introduces a variety of different, distinct changes, that I have comments on individually. As such, I figured it might be better to write them here, rather than at specific lines.

Increasing "filetype" intelligence

All for this. Yay! I was originally worried about the compatibility with the config transformer, but it looks like the config transformer just attempts all files anyway, so we should be fine.

Keyword Detector Improvements

TBH, this file continues to mutate, and becomes increasingly hard to understand -- so it takes me a while to understand your changes. I like your FOLLOWED_BY_COMPARATION_QUOTES_REQUIRED_REGEX and FOLLOWED_BY_REV_COMPARATION_QUOTES_REQUIRED_REGEX, but I'm uncertain of the implications of modifying the core DENYLIST_REGEX and SECRET regexes.

It should be the case that secret values don't have quotes in them: but I need to think more deeply about whether we can assert this.

KeywordXML

Ok, this absolutely needs to change. Along the same lines of why the KeywordDetector is in need for some serious love (to increase code flexibility), I'm against a direct copy-paste of the KeywordDetector to make a version for XML files.

Have you considered writing an XML transformer instead? This will allow it to take advantage of all the other features that v1 provides. For prior art, check out YAMLTransformer.

FOLLOWED_BY_QUOTES_AND_SEMICOLON_REGEX = re.compile(
# e.g. private_key "something";
r'({denylist}){nonWhitespace}{whitespace}({quote})({secret})(\2);'.format(
r'({denylist}){nonWhitespace}{whitespace}({quote})({secret})({quote});'.format(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was written as such to make sure that we don't have a situation as such:

private_key "something'

Same thing goes for your other regex modifications.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regex was updated to be sure that the last quote won't be included as a secret in the audit code snippet. The original SECRET regex allows the ' and " characters, so the last quote will be included in the secret value. I have modified the SECRET regex to match only with empty values ("" or '', commented below) or whatever value without ', " and whitespaces. With this changes, I have included the last quote as a new item in this regex to make it more specific.

detect_secrets/plugins/keyword.py Outdated Show resolved Hide resolved
detect_secrets/plugins/keyword.py Show resolved Hide resolved
Comment on lines +373 to +375
@pytest.mark.skip(
reason='TODO: false positive heuristics need to be migrated over to filters/*',
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to skip this test? All filters have been migrated already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, when I have implemented this feature, the filters haven't been migrated already. I will commit it comming soon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @domanchi, I have reviewed the default filters of detect-secrets:

"filters_used": [
    {
      "path": "detect_secrets.filters.allowlist.is_line_allowlisted"
    },
    {
      "path": "detect_secrets.filters.common.is_ignored_due_to_verification_policies",
      "min_level": 2
    },
    {
      "path": "detect_secrets.filters.heuristic.is_likely_id_string"
    },
    {
      "path": "detect_secrets.filters.heuristic.is_potential_uuid"
    },
    {
      "path": "detect_secrets.filters.heuristic.is_sequential_string"
    }
  ],

In the default filters, we haven't found any adequate to our test's needs. I think that the most useful filter in this case, can be heuristic.is_templated_secret. We haven't found any references to this filter in the documentation and we think that it should be established by default. Are this tasks in your roadmap?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I must have missed this during my migration effort. I'll cut a ticket for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pablosnt
Copy link
Contributor Author

pablosnt commented Feb 4, 2021

I'm uncertain of the implications of modifying the core DENYLIST_REGEX and SECRET regexes.

All the modifications in this variables are based on our experience analyzing multiples repositories. With this modifications we note an improvement in the results obtained with detect-secrets.

@pablosnt
Copy link
Contributor Author

pablosnt commented Feb 4, 2021

Ok, this absolutely needs to change

Okey, I will work in your proposal comming soon. If you like the other changes included in this Pull Request, we can remove the KeywordXMLDetector code from it and contributing this feature later.

@pablosnt
Copy link
Contributor Author

pablosnt commented Mar 4, 2021

Hi @domanchi, I close this Pull Request. @syn-4ck and I will contribute all this features adapted to the new version of detect-secrets comming soon. Here is our first contribution #418 related to this Pull Request, I think that it will be easier to discuss every change in keyword plugin using smaller Pull Requests. Thank you very much!

@pablosnt pablosnt closed this Mar 4, 2021
jimmyhlee94 pushed a commit to jimmyhlee94/detect-secrets that referenced this pull request Aug 19, 2021
Co-authored-by: Renovate Bot <renovate@whitesourcesoftware.com>
Co-authored-by: Victoria M Miltcheva <Victoria.Miltcheva@ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants