Support url-safe base64 secrets #245

OiCMudkips · 2019-09-26T21:39:45Z

This updates the base64 plugin to support url-safe plugins by just adding - and _ to the charset. However, this won't be merged until I add some additional functionality to reduce noise.

detect_secrets/plugins/high_entropy_strings.py

KevinHock · 2019-09-26T21:45:44Z

test_data/config.yaml

@@ -1,5 +1,5 @@
 credentials:
-    some_value_here: not_a_secret
+    some_value_here: not_secret


Was this necessary b/c the entropy calculation with the new chars alerted on not_a_secret?

Yeah, I don't think we need to be too concerned though because we now have the wordlist filtering.

I'm more concerned that, we'll have large diffs in baseline's when people update detect-secrets.

This isn't as concerning as changing a secret type like we did in #26, (where all old secrets were removed and re-added), but it is a little, especially if it reduces TP's to some extent. (We'll see what the data says though, I can't really say how it'll effect signal.)

Why will we have large diffs? A lot of new secrets?

If all the e.g. not_a_secret potential secrets disappear from existing baselines, then there is a possibility we will have large diffs, in the case of FP's that's great, in the case of TP's that would be a regression visible to users. (We can't really say there are minimal regressions without data though.)

This commit updates the base64 plugin to support url-safe plugins by just adding - and _ to the charset.

We already check for whitelists in `ignored_lines = parser.get_ignored_lines()` call above, so calling `analyze_string` wastes time with the duplicated check.

We already check for whitelists in the IniFileParser, so doing another whitelist check here is redundant.

This renaming more accurately reflects what the function does in all the plugins (at the moment) and more clearly distinguishes it from `analyze_string_content`

detect_secrets/plugins/common/filters.py

KevinHock · 2019-10-08T18:56:53Z

detect_secrets/plugins/common/filters.py

+
+# NOTE: this doesn't handle key-values on a line properly.
+# NOTE: words that end in "id" will be treated as ids
+_ID_DETECTOR_REGEX = re.compile(r'[iI][dD][^A-Za-z0-9]')


We might be able to do _id, we'll see what the data says though.

I guess it depends on whether we want to ignore keys like BusinessId. I think at Yelp this isn't likely but it's probably more likely in camelCase language repos.

That's a good point, we do have a lot of python biases.

detect_secrets/plugins/common/filters.py

KevinHock · 2019-10-09T23:27:27Z

detect_secrets/plugins/high_entropy_strings.py

+            # py2+py3 compatible way of copying a list
+            functions = list(DEFAULT_FALSE_POSITIVE_HEURISTICS)
+            functions.append(is_potential_uuid)
+
+            if is_false_positive(result, self.automaton, functions=functions):


What are your thoughts on passing additional_heuristics instead? I'm not sure when we would want to call is_false_positive without the defaults (main motivation is prettifying though)

I was actually thinking about moving is_false_positive to be a method in BasePlugin and then make subclass re-implement it. This would allow us to override the filters used on a plugin-level (suggested in #250), but also set some reasonable defaults. In addition we can include the heuristics used in the configs for the plugins in baselines.

i.e. in code

class BasePlugin(): def __init__(self, false_positive_heuristics=None): self.false_positive_heuristics = false_positive_heuristics if false_positive_heuristics else [] def is_false_positive(self, potential_secret): return any(func(potential_secret) for func in self.false_positive_heuristics) def get_config(self): # include the fp heuristics used if applicable class Plugin(BasePlugin): def __init__(self, false_positive_heuristics=DEFAULT_HEURSTICS_FOR_PLUGIN): # I remember the default list in Python function constructor, I'll fix it in real code :) super(Plugin, self).__init__(false_positive_heuristics) def analyze_string_content(self, string): for potential_secret in self.secret_generator(string): if self.is_false_positive(string): continue

That sounds great to me 🎈

I'm only unsure of the In addition we can include the heuristics used in the configs for the plugins in baselines. part, as I'm kind of okay with leaving that part blind to the user. (There are also the lesser possible objections someone could say that diffs in baselines should be minimal, and I'm not sure how we would say which heuristics each plugin used in a DRY way.)

detect_secrets/plugins/base.py

use correct docker setting (Yelp#246) Use escape sequence to replace clear (Yelp#247) Build docker images for DSS client (Yelp#248) Build on tag push (Yelp#249) Publish to Artifactory (Yelp#250)

KevinHock reviewed Sep 26, 2019

View reviewed changes

Victor Zhou added 7 commits October 7, 2019 14:01

Support url-safe base64 secrets

e10b9a3

This commit updates the base64 plugin to support url-safe plugins by just adding - and _ to the charset.

Scan string directly in YAML parser

de2cbd8

We already check for whitelists in `ignored_lines = parser.get_ignored_lines()` call above, so calling `analyze_string` wastes time with the duplicated check.

Scan string directly in INI parser

d61baab

We already check for whitelists in the IniFileParser, so doing another whitelist check here is redundant.

Ignore id's in the high-entropy plugin

e1fa566

Rename analyze_string to analyze_line

0c9e97e

This renaming more accurately reflects what the function does in all the plugins (at the moment) and more clearly distinguishes it from `analyze_string_content`

Ignore yaml high-entropy secrets whose keys are ids

ab78151

Ignore id values in ini files

3d0dc36

OiCMudkips force-pushed the url_safe_base64 branch from 0539c43 to 3d0dc36 Compare October 7, 2019 21:01

Capitalize comments by PR request

2cfea37

OiCMudkips marked this pull request as ready for review October 8, 2019 16:50

KevinHock mentioned this pull request Oct 8, 2019

High Entropy string matchers have false positives on uuids. #250

Closed

KevinHock reviewed Oct 8, 2019

View reviewed changes

detect_secrets/plugins/common/filters.py Show resolved Hide resolved

KevinHock reviewed Oct 8, 2019

View reviewed changes

Ignore UUID values in high-entropy plugin

0115efd

KevinHock reviewed Oct 9, 2019

View reviewed changes

detect_secrets/plugins/common/filters.py Outdated Show resolved Hide resolved

Make some filter regexes case-insensitive

b402f51

KevinHock reviewed Oct 9, 2019

View reviewed changes

Adjust filter regex usage per Yelp#245 comments

ece342b

KevinHock reviewed Oct 11, 2019

View reviewed changes

detect_secrets/plugins/base.py Outdated Show resolved Hide resolved

Refactor secret filtering to be a instance method

488334f

OiCMudkips force-pushed the url_safe_base64 branch from 0a34639 to 488334f Compare October 11, 2019 22:08

KevinHock approved these changes Oct 24, 2019

View reviewed changes

OiCMudkips merged commit ab56422 into Yelp:master Oct 24, 2019

killuazhu pushed a commit to IBM/detect-secrets that referenced this pull request May 28, 2020

create image for dss (Yelp#245)

39b40d4

killuazhu pushed a commit to IBM/detect-secrets that referenced this pull request Jul 9, 2020

create image for dss (Yelp#245)

6d8a307

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support url-safe base64 secrets #245

Support url-safe base64 secrets #245

OiCMudkips commented Sep 26, 2019

KevinHock Sep 26, 2019

OiCMudkips Oct 7, 2019

KevinHock Oct 8, 2019

OiCMudkips Oct 9, 2019

KevinHock Oct 10, 2019

KevinHock Oct 8, 2019

OiCMudkips Oct 8, 2019

KevinHock Oct 8, 2019

KevinHock Oct 9, 2019 •

edited

Loading

OiCMudkips Oct 9, 2019 •

edited

Loading

KevinHock Oct 9, 2019

Support url-safe base64 secrets #245

Support url-safe base64 secrets #245

Conversation

OiCMudkips commented Sep 26, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KevinHock Oct 9, 2019 • edited Loading

Choose a reason for hiding this comment

OiCMudkips Oct 9, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KevinHock Oct 9, 2019 •

edited

Loading

OiCMudkips Oct 9, 2019 •

edited

Loading