Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support url-safe base64 secrets #245

Merged
merged 12 commits into from
Oct 24, 2019
Merged

Conversation

OiCMudkips
Copy link
Contributor

This updates the base64 plugin to support url-safe plugins by just adding - and _ to the charset. However, this won't be merged until I add some additional functionality to reduce noise.

detect_secrets/plugins/high_entropy_strings.py Outdated Show resolved Hide resolved
@@ -1,5 +1,5 @@
credentials:
some_value_here: not_a_secret
some_value_here: not_secret
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this necessary b/c the entropy calculation with the new chars alerted on not_a_secret?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't think we need to be too concerned though because we now have the wordlist filtering.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm more concerned that, we'll have large diffs in baseline's when people update detect-secrets.

This isn't as concerning as changing a secret type like we did in #26, (where all old secrets were removed and re-added), but it is a little, especially if it reduces TP's to some extent. (We'll see what the data says though, I can't really say how it'll effect signal.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why will we have large diffs? A lot of new secrets?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all the e.g. not_a_secret potential secrets disappear from existing baselines, then there is a possibility we will have large diffs, in the case of FP's that's great, in the case of TP's that would be a regression visible to users. (We can't really say there are minimal regressions without data though.)

Victor Zhou added 7 commits October 7, 2019 14:01
This commit updates the base64 plugin to support url-safe
plugins by just adding - and _ to the charset.
We already check for whitelists in `ignored_lines =
parser.get_ignored_lines()` call above, so calling
`analyze_string` wastes time with the duplicated check.
We already check for whitelists in the IniFileParser,
so doing another whitelist check here is redundant.
This renaming more accurately reflects what the function
does in all the plugins (at the moment) and more clearly
distinguishes it from `analyze_string_content`

# NOTE: this doesn't handle key-values on a line properly.
# NOTE: words that end in "id" will be treated as ids
_ID_DETECTOR_REGEX = re.compile(r'[iI][dD][^A-Za-z0-9]')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might be able to do _id, we'll see what the data says though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it depends on whether we want to ignore keys like BusinessId. I think at Yelp this isn't likely but it's probably more likely in camelCase language repos.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, we do have a lot of python biases.

Comment on lines 118 to 122
# py2+py3 compatible way of copying a list
functions = list(DEFAULT_FALSE_POSITIVE_HEURISTICS)
functions.append(is_potential_uuid)

if is_false_positive(result, self.automaton, functions=functions):
Copy link
Collaborator

@KevinHock KevinHock Oct 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are your thoughts on passing additional_heuristics instead? I'm not sure when we would want to call is_false_positive without the defaults (main motivation is prettifying though)

Copy link
Contributor Author

@OiCMudkips OiCMudkips Oct 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually thinking about moving is_false_positive to be a method in BasePlugin and then make subclass re-implement it. This would allow us to override the filters used on a plugin-level (suggested in #250), but also set some reasonable defaults. In addition we can include the heuristics used in the configs for the plugins in baselines.

i.e. in code

class BasePlugin():
    def __init__(self, false_positive_heuristics=None):
        self.false_positive_heuristics = false_positive_heuristics if false_positive_heuristics else []

    def is_false_positive(self, potential_secret):
         return any(func(potential_secret) for func in self.false_positive_heuristics)

    def get_config(self):
         # include the fp heuristics used if applicable


class Plugin(BasePlugin):
    def __init__(self, false_positive_heuristics=DEFAULT_HEURSTICS_FOR_PLUGIN):  # I remember the default list in Python function constructor, I'll fix it in real code :)
        super(Plugin, self).__init__(false_positive_heuristics)

    def analyze_string_content(self, string):
        for potential_secret in self.secret_generator(string):
            if self.is_false_positive(string):
                continue

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds great to me 🎈

I'm only unsure of the In addition we can include the heuristics used in the configs for the plugins in baselines. part, as I'm kind of okay with leaving that part blind to the user. (There are also the lesser possible objections someone could say that diffs in baselines should be minimal, and I'm not sure how we would say which heuristics each plugin used in a DRY way.)

@OiCMudkips OiCMudkips merged commit ab56422 into Yelp:master Oct 24, 2019
killuazhu pushed a commit to IBM/detect-secrets that referenced this pull request May 28, 2020
killuazhu pushed a commit to IBM/detect-secrets that referenced this pull request Jul 9, 2020
killuazhu pushed a commit to IBM/detect-secrets that referenced this pull request Sep 17, 2020
use correct docker setting (Yelp#246)

Use escape sequence to replace clear (Yelp#247)

Build docker images for DSS client (Yelp#248)

Build on tag push (Yelp#249)

Publish to Artifactory (Yelp#250)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants