Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use different regexes in KeywordDetector to improve accuracy #86

Merged
merged 19 commits into from
Jan 3, 2019

Conversation

KevinHock
Copy link
Collaborator

No description provided.

tests/plugins/keyword_test.py Outdated Show resolved Hide resolved
detect_secrets/plugins/keyword.py Outdated Show resolved Hide resolved
detect_secrets/plugins/keyword.py Outdated Show resolved Hide resolved
detect_secrets/plugins/keyword.py Outdated Show resolved Hide resolved
detect_secrets/plugins/keyword.py Outdated Show resolved Hide resolved
detect_secrets/plugins/keyword.py Outdated Show resolved Hide resolved
detect_secrets/plugins/keyword.py Outdated Show resolved Hide resolved
detect_secrets/plugins/keyword.py Outdated Show resolved Hide resolved
tests/plugins/keyword_test.py Outdated Show resolved Hide resolved
Copy link
Collaborator Author

@KevinHock KevinHock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit

tests/plugins/keyword_test.py Outdated Show resolved Hide resolved
@KevinHock
Copy link
Collaborator Author

Just stating for memories sake, that we had a discussion that I'll paraphrase here, I'll maybe make a separate issue:

Currently, the assumption is that "no secret is repeated within a single file" however, with this plugin it is more likely that this assumption is gonna break e.g. if you have 2 different assignments of password = "ehehehwew" in the same file, the hash of ehehehwew will be in 1 place in the baseline, but 2 places in the file. This is currently true for e.g. high-entropy secrets right now, for instance.

We can either (a) incorporate line numbers in

self.fields_to_compare = ['filename', 'secret_hash', 'type']
, or (b) put a count of how many secrets are in the file, and put the count in baseline.

The downsides of doing nothing, and merging the PR as is:

  1. It looks unintelligent to users, as in "This tool doesn't even detect the same secret on the next line."
  2. When removing a secret, you suddenly get alerted to a new one that we did not complain about before.
  3. Less concerning: Adding a new secret can be missed, if it is already in that file.
  4. Less concerning: The audit command does not show the secret more than once.

We agree changing that assumption is a larger task than what's at hand.

Filter out $variables for PHP files
Filter out `(|[` followed by `)|]`
Add `not`, more empty quotes and `password` variable names to FALSE_POSITIVES
After merging in master
Trim uncovered code
Change tox to ensure tests are covered 100%
Removed `token` as a keyword
Made FOLLOWED_BY_EQUAL_SIGNS_RE require variable ends with keyword
"""Generates raw secrets by re-scanning the line, with the specified plugin"""
for raw_secret in plugin.secret_generator(secret_line):
yield raw_secret
if isinstance(plugin, KeywordDetector):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I felt :/ about writing it like this, but didn't see a better way.

Just wanted to put neon lights on it for the review 😅

Made quotes required in Python files/added regexes for this
Added a Filetype Enum and `determine_file_type` function

Replaced 'pass' with 'db_pass' in BLACKLIST
Added 'aws_secret_access_key' to BLACKLIST
Added some trailing char cases to FALSE_POSITIVES

:boom: Changed secret_type to 'Secret Keyword'
By adding an optional `((\'|")])?` to the regexes
This is to catch 'foo' in e.g. `some_dict["secret"] = "foo"`
@calvinli calvinli self-requested a review January 3, 2019 01:18
Copy link
Member

@calvinli calvinli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM based on internal testing. There are some remaining false positives but it should be okay.

Added Javascript specific false-positive checks
Added ${ before } heuristic for  e.g. ${link}
Added more false-positives to FALSE_POSITIVES

:zap: keyword_test.py
Make STANDARD_NEGATIVES list and STANDARD_POSITIVES set for DRYness
@KevinHock KevinHock merged commit 164b7eb into master Jan 3, 2019
@KevinHock KevinHock deleted the upgrade_keyword_detector branch March 21, 2019 22:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants