Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

False positive for short regular expressions #178

Closed
meanrin opened this issue Aug 8, 2022 · 5 comments
Closed

False positive for short regular expressions #178

meanrin opened this issue Aug 8, 2022 · 5 comments
Assignees

Comments

@meanrin
Copy link
Contributor

meanrin commented Aug 8, 2022

Many short regular expressions may randomly apear in the hashes, or base64 lines

Example:

https://github.com/tosca/web/blob/master/packages/EntityFramework.5.0.0/tools/EntityFramework.PS3.psd1#L172

$ python -m credsweeper --path EntityFramework.PS3.psd1 
rule: JSON Web Token / severity: medium / line_data_list: [line: '# UK3O3RhOJA/u0afRTK10MCAR6wfVVJUVSZQbQpKumFwwJtoAa+h7veyJBw/3DgSY' / line_num: 172 / path: EntityFramework.PS3.psd1 / value: 'eyJBw' / entropy_validation: False] / api_validation: NOT_AVAILABLE / ml_validation: VALIDATED_KEY

99.9% sure it's not really a JWT

While it is a JWT example, this can probably also happen with other short regular expressions (e.g. Google API based on ya29. prefix)

As for now i see few options to solve it:

  1. Add prefix check and forbid matching regexes if they have a lot of alphanumeric stuff directly to the left. Example: =eyJBw... - ok, dasjkfseyJBw - bad
  2. Add minimal value length for some regexes. For example require minimal JWT length

Maybe there are some different options. Please propose if you'll have any

I haven't tested it with the CredData metrics yet, just ideas

@csh519
Copy link
Contributor

csh519 commented Aug 8, 2022

Both options are good.
Regarding 1. option, maybe prefix filter can be added like below.

filtering_prefix = [" ", "=", ":"]

if candidate_prefix in filtering_prefix:
    return True  # Filtered

return False

@babenek babenek mentioned this issue Aug 8, 2022
@babenek
Copy link
Contributor

babenek commented Aug 8, 2022

ML decision = 0.546 for example from http://calebb.net/
Possibly there might be false-negative

@meanrin
Copy link
Contributor Author

meanrin commented Aug 9, 2022

2. Add minimal value length for some regexes. For example require minimal JWT length

Adding minimal length of 12 results in fixing issue for mentioned examples without reducing metrics on the CredData
It can be implemented by adding ValueLengthCheck to get_pattern_base_filters
Will open a PR today

@meanrin
Copy link
Contributor Author

meanrin commented Aug 11, 2022

Other related PR: #184

@meanrin
Copy link
Contributor Author

meanrin commented Aug 11, 2022

I'll close this issue as both ideas already implemented (left alphanumeric character and min length)

@meanrin meanrin closed this as completed Aug 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants