Use different regexes in KeywordDetector to improve accuracy #86

KevinHock · 2018-10-22T18:13:20Z

No description provided.

detect_secrets/plugins/keyword.py

tests/plugins/keyword_test.py

detect_secrets/plugins/keyword.py

tests/plugins/keyword_test.py

Turned on Keyword detector by default Down-graded version to '0.1.666' to test it on a few repos without causing havoc

KevinHock

One nit

tests/plugins/keyword_test.py

KevinHock · 2018-12-11T19:51:23Z

Just stating for memories sake, that we had a discussion that I'll paraphrase here, I'll maybe make a separate issue:

Currently, the assumption is that "no secret is repeated within a single file" however, with this plugin it is more likely that this assumption is gonna break e.g. if you have 2 different assignments of password = "ehehehwew" in the same file, the hash of ehehehwew will be in 1 place in the baseline, but 2 places in the file. This is currently true for e.g. high-entropy secrets right now, for instance.

We can either (a) incorporate line numbers in

detect-secrets/detect_secrets/core/potential_secret.py

Line 53 in 5f4a055

self.fields_to_compare = ['filename', 'secret_hash', 'type']

, or (b) put a count of how many secrets are in the file, and put the count in baseline.

The downsides of doing nothing, and merging the PR as is:

It looks unintelligent to users, as in "This tool doesn't even detect the same secret on the next line."
When removing a secret, you suddenly get alerted to a new one that we did not complain about before.
Less concerning: Adding a new secret can be missed, if it is already in that file.
Less concerning: The audit command does not show the secret more than once.

We agree changing that assumption is a larger task than what's at hand.

Filter out $variables for PHP files Filter out `(|[` followed by `)|]` Add `not`, more empty quotes and `password` variable names to FALSE_POSITIVES

After merging in master

Trim uncovered code Change tox to ensure tests are covered 100%

Removed `token` as a keyword Made FOLLOWED_BY_EQUAL_SIGNS_RE require variable ends with keyword

KevinHock · 2018-12-14T20:14:07Z

detect_secrets/core/audit.py

    """Generates raw secrets by re-scanning the line, with the specified plugin"""
-    for raw_secret in plugin.secret_generator(secret_line):
-        yield raw_secret
+    if isinstance(plugin, KeywordDetector):


I felt :/ about writing it like this, but didn't see a better way.

Just wanted to put neon lights on it for the review 😅

tests/plugins/keyword_test.py

In keyword_test.py

Made quotes required in Python files/added regexes for this Added a Filetype Enum and `determine_file_type` function Replaced 'pass' with 'db_pass' in BLACKLIST Added 'aws_secret_access_key' to BLACKLIST Added some trailing char cases to FALSE_POSITIVES :boom: Changed secret_type to 'Secret Keyword'

By adding an optional `((\'|")])?` to the regexes This is to catch 'foo' in e.g. `some_dict["secret"] = "foo"`

Resolved

detect_secrets/plugins/keyword.py

calvinli

LGTM based on internal testing. There are some remaining false positives but it should be okay.

detect_secrets/plugins/keyword.py

Added Javascript specific false-positive checks Added ${ before } heuristic for e.g. ${link} Added more false-positives to FALSE_POSITIVES :zap: keyword_test.py Make STANDARD_NEGATIVES list and STANDARD_POSITIVES set for DRYness

Use different regexes in KeywordDetector to improve accuracy

b9c80f9

KevinHock assigned domanchi Oct 22, 2018

KevinHock commented Oct 22, 2018

View reviewed changes

detect_secrets/plugins/keyword.py Outdated Show resolved Hide resolved

domanchi previously requested changes Oct 22, 2018

View reviewed changes

tests/plugins/keyword_test.py Outdated Show resolved Hide resolved

detect_secrets/plugins/keyword.py Outdated Show resolved Hide resolved

detect_secrets/plugins/keyword.py Outdated Show resolved Hide resolved

detect_secrets/plugins/keyword.py Outdated Show resolved Hide resolved

KevinHock added 2 commits October 22, 2018 14:22

Trim regexes down, include secret 'foo'; as well.

c4e7676

Do not alert if whitespace is in secret

fce2837

domanchi reviewed Oct 22, 2018

View reviewed changes

KevinHock added 4 commits October 30, 2018 18:15

[Keyword Plugin] Add 3 re's and their secret groups in dict

3167cb1

Merge branch 'master' into upgrade_keyword_detector

b6dc9cc

Add a few more Keyword negative test-cases

b89209f

Added a variety of accuracy improvements to Keyword Plugin (see tests)

431ad12

Turned on Keyword detector by default Down-graded version to '0.1.666' to test it on a few repos without causing havoc

KevinHock mentioned this pull request Nov 12, 2018

Adding audit --diff functionality #95

Merged

KevinHock referenced this pull request Nov 20, 2018

Fix tests w/ no KeywordDetector, add comment

6d98c81

KevinHock mentioned this pull request Nov 26, 2018

Keyword Detector is not used in 0.10.5 #97

Closed

KevinHock commented Nov 27, 2018

View reviewed changes

tests/plugins/keyword_test.py Outdated Show resolved Hide resolved

domanchi mentioned this pull request Dec 10, 2018

add plugin to look for AWS key IDs #100

Merged

KevinHock added 5 commits December 13, 2018 17:13

🔭[Keyword Plugin] Filter false-positives

a27659a

Filter out $variables for PHP files Filter out `(|[` followed by `)|]` Add `not`, more empty quotes and `password` variable names to FALSE_POSITIVES

Merge branch 'master' into upgrade_keyword_detector

a4c0432

🐍 Make tests pass

d8f4e29

After merging in master

🐍 Improve test coverage

ed6a374

Trim uncovered code Change tox to ensure tests are covered 100%

🔭[Keyword Plugin] Precision improvements

3fd9e87

Removed `token` as a keyword Made FOLLOWED_BY_EQUAL_SIGNS_RE require variable ends with keyword

KevinHock commented Dec 14, 2018

View reviewed changes

KevinHock mentioned this pull request Dec 18, 2018

refactor various detectors to use RegexBasedDetector #103

Merged

joshuarli mentioned this pull request Dec 19, 2018

Refactor applicable plugins to all inherit from RegexBasedDetector #102

Closed

joshuarli reviewed Dec 19, 2018

View reviewed changes

tests/plugins/keyword_test.py Outdated Show resolved Hide resolved

KevinHock added 2 commits December 21, 2018 12:20

⚡ Remove unnecessary wrapping parens

f15366e

In keyword_test.py

KevinHock force-pushed the upgrade_keyword_detector branch from ea99830 to e01d818 Compare December 28, 2018 22:43

🎓 Eg. -> E.g.

4581aa8

KevinHock force-pushed the upgrade_keyword_detector branch from ce5862b to ec1e0cd Compare December 28, 2018 23:48

🔭[Keyword Plugin] Handle dict['keyword']

b7e48ab

By adding an optional `((\'|")])?` to the regexes This is to catch 'foo' in e.g. `some_dict["secret"] = "foo"`

KevinHock force-pushed the upgrade_keyword_detector branch from ec1e0cd to b7e48ab Compare December 28, 2018 23:49

🔄 Merge branch 'master' into upgrade_keyword_detector

a37a9c9

KevinHock force-pushed the upgrade_keyword_detector branch from 76ddcdf to a37a9c9 Compare December 29, 2018 00:05

🐍[consistency] 2 spaces before pragma comment

d314550

KevinHock commented Jan 2, 2019

View reviewed changes

detect_secrets/plugins/keyword.py Show resolved Hide resolved

calvinli self-requested a review January 3, 2019 01:18

calvinli approved these changes Jan 3, 2019

View reviewed changes

calvinli reviewed Jan 3, 2019

View reviewed changes

detect_secrets/plugins/keyword.py Outdated Show resolved Hide resolved

🔭[Keyword Plugin] Precision improvements

a29108b

Added Javascript specific false-positive checks Added ${ before } heuristic for e.g. ${link} Added more false-positives to FALSE_POSITIVES :zap: keyword_test.py Make STANDARD_NEGATIVES list and STANDARD_POSITIVES set for DRYness

KevinHock force-pushed the upgrade_keyword_detector branch from 7dd4926 to a29108b Compare January 3, 2019 23:03

KevinHock merged commit 164b7eb into master Jan 3, 2019

KevinHock mentioned this pull request Feb 21, 2019

Same secret multiple times in the same file #134

Closed

KevinHock deleted the upgrade_keyword_detector branch March 21, 2019 22:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use different regexes in KeywordDetector to improve accuracy #86

Use different regexes in KeywordDetector to improve accuracy #86

KevinHock commented Oct 22, 2018

KevinHock left a comment

KevinHock commented Dec 11, 2018

KevinHock Dec 14, 2018

calvinli left a comment

Use different regexes in KeywordDetector to improve accuracy #86

Use different regexes in KeywordDetector to improve accuracy #86

Conversation

KevinHock commented Oct 22, 2018

KevinHock left a comment

Choose a reason for hiding this comment

KevinHock commented Dec 11, 2018

KevinHock Dec 14, 2018

Choose a reason for hiding this comment

calvinli left a comment

Choose a reason for hiding this comment