Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix of Yaml exception with simple quotes #414

Merged
merged 2 commits into from
Mar 2, 2021
Merged

Bugfix of Yaml exception with simple quotes #414

merged 2 commits into from
Mar 2, 2021

Conversation

syn-4ck
Copy link
Contributor

@syn-4ck syn-4ck commented Mar 1, 2021

The following exception has been raised during a Yaml file scan:

evidence

The YAML file has the following content:

example:
  test:
    username: user
    password: 'my password'

To fix the bug, I just remove the ? in the following keyword plugin line:

OPTIONAL_WHITESPACE = r'\s*?'

I think that the ? in this regex is not necessary because the * matches from 0 to n times.

Also, in our experience, it is very common for developers to use passwords with whitespaces. Currently, detect-secrets doesn't report the secrets if the value has been defined between quotes and report the first string in other cases. I suggest the change of SECRET = r'[^\s]+' regex to SECRET = r'[^\r\n]+'.

fix

In closing, I remove the FALSE_POSITIVES list to clear the code, I think it is unused and it has been replaced by the filters.

@domanchi
Copy link
Contributor

domanchi commented Mar 1, 2021

@syn-4ck , what version of detect-secrets are you running? I would think that YAMLTransformer would handle this better.

@syn-4ck
Copy link
Contributor Author

syn-4ck commented Mar 1, 2021

Hi @domanchi, I detect the bug in the v1.0.3.

@domanchi
Copy link
Contributor

domanchi commented Mar 1, 2021

Ok, I had a chance to look this over. Couple of comments:

  1. You're absolutely right about the FALSE_POSITIVES list. I had missed migrating https://github.com/Yelp/detect-secrets/blob/v0.14.3/detect_secrets/plugins/keyword.py#L352 in the new version (not by design), so it's technically not currently used.

    Do you have thoughts on whether detect-secrets should continue to ship with a hardcoded list of false positives? On one hand, this list was generated through hours of sifting through raw data; but on the other hand, I hypothesize that it's pretty environment specific.

  2. The modification from SECRET = r'[^\s]+' regex to SECRET = r'[^\r\n]+' makes sense to me. Nice catch.

  3. I was successfully able to reproduce the bug you demonstrated (thank you for the minimal proof of concept), but I was unable to reproduce the fix on my end. I'll need to explore this in more depth.

@syn-4ck
Copy link
Contributor Author

syn-4ck commented Mar 2, 2021

Thanks for checking the changes. I would like to answer your comments below:

  1. I think the best option in this case is use the filters for a global context. In my opinion, if a finding is considered false positive by detect-secrets in the keyword plugin, for consistency it not be reported by any plugin. For this specific use case I think that the filters detect_secrets.filters.regex.should_exclude_secret and detect_secrets.filters.wordlist.should_exclude_secret can be a good approach. However, if you prefer save this denylist in the plugin until migrate it, I can change the PR and not delete it.
  2. Nice, thank you!
  3. I have tested again on the branch to merge with Debian and Windows and it works for me. Below I attach a new screenshot in the Linux machine.:

evidence2

I hope this comment helps you @domanchi.

@pablosnt
Copy link
Contributor

pablosnt commented Mar 2, 2021

Hi @domanchi and @syn-4ck, I was successfully able to reproduce the bug and the fix in Microsoft Windows and MacOS, next you can see the evidence in the second one:

yaml_fix_macos

I hope this to be useful in your discussion and thank you @syn-4ck for the fix!

@syn-4ck
Copy link
Contributor Author

syn-4ck commented Mar 2, 2021

Thanks for your feedback @pablosantiagolopez!

@domanchi
Copy link
Contributor

domanchi commented Mar 2, 2021

Ah, I see it now. You need both the SECRET change, and the OPTIONAL_WHITESPACE change. I also realize that the ? is not necessary for the whitespace, since we want it to be greedy (and consume any and all whitespace it finds).

Yes, let's not remove the FALSE_POSITIVES set yet (even better, just add a comment on it saying that it's scheduled for deprecation), and we can merge this. I have a lightweight ML model in the pipeline that replaces it quite nicely.

@syn-4ck
Copy link
Contributor Author

syn-4ck commented Mar 2, 2021

Hi @domanchi! I have just uploaded the changes, now the FALSE_POSITiVES list is commented for a future deprecation.

In effect, you need both changes to correct it. I look forward to more comments or to the merge.

@domanchi
Copy link
Contributor

domanchi commented Mar 2, 2021

There is a test failing, but is not related to this branch. Will fix on master.

@domanchi domanchi merged commit 37e9f30 into Yelp:master Mar 2, 2021
@domanchi
Copy link
Contributor

domanchi commented Mar 2, 2021

In a quick test of this new SECRET regex, it looks like it's too permissive.

Secret:      1 of 24
Filename:    README.md
Secret Type: Secret Keyword
----------
478:```
479:
480:Or you can specify multiple regex rules as such:
481:
482:```bash
483:$ detect-secrets scan --exclude-lines 'password = >> blah' --exclude-lines 'password = fake' <<
484:```
485:
486:#### --exclude-files
487:
488:Sometimes, you want to be able to ignore certain files in your scan. You can specify a regex

where >> and << indicate the highlighted portion.

We will need to change this to exclude quote marks.

@domanchi
Copy link
Contributor

domanchi commented Mar 2, 2021

In further comparisons (within this repo alone), it looks like it does have better signal, but with some trade-offs. For example:

Secret:      15 of 24
Filename:    docs/audit.md
Secret Type: Secret Keyword
----------
11:## Manually Labelling Secrets
12:
13:```bash
14:$ detect-secrets scan test_data > .secrets.baseline
15:$ detect-secrets audit .secrets.baseline
16:Secret:      >> 1 of 80 <<
17:Filename:    test_data/baseline.file
18:Secret Type: Secret Keyword
19:----------
20:59:    }
21:60:  ],
----------
Is this a secret that should be committed to this repository? (y)es, (n)o, (s)kip, (b)ack, (q)uit: 
  • Catches random things (bad)
Secret:      24 of 24
Filename:    docs/filters.md
Secret Type: Secret Keyword
----------
126:`/path/to/file::function_name`. For example:
127:
128:```bash
129:$ cat custom_filter.py
130:def is_invalid_secret(secret: str) -> bool:
131:    return secret =>>= <<'invalid'
132:
133:$ detect-secrets scan --filter custom_filter.py::is_invalid_secret
134:```
135:
136:You can also provide an import path to the filter function you desire:
----------
Is this a secret that should be committed to this repository? (y)es, (n)o, (s)kip, (b)ack, (q)uit: 

I think our previous assumption is a whitespace delineated secret, but it's acting funky now that we're changing that assumption. 🤔

@syn-4ck
Copy link
Contributor Author

syn-4ck commented Mar 2, 2021

We are currently checking this.

  • In the first one, in the line Secret: >> 1 of 80 << I think the detect-secrets behaviour is correct, it's only a false positive.

  • In the second one, the line return secret == 'invalid' should be detected by detect-secrets, because a developer can compare a variable with a hardcoded secret. Obviously, mark = in the audit it isn't correct, but maybe it's a good idea to develop a regex for this behaviour. This regex has been implemented in Keyword detector optimization #396, but we still working in it.

Do you detect more "random things"? :(

@domanchi
Copy link
Contributor

domanchi commented Mar 2, 2021

compare variable with hardcoded secret

That's a good point. I guess the only bad thing here is that there's no ascii characters in this (if we assume that secrets should have some ascii letters in it). Another way we might be able to address this is through filters, and write a filter that ensures the secret meets this bar.

However, I remember that y'all were saying how you found that there were several real cases you found where secrets were just blank strings. How does this work in this case?

@syn-4ck
Copy link
Contributor Author

syn-4ck commented Mar 2, 2021

That's a good point. I guess the only bad thing here is that there's no ascii characters in this (if we assume that secrets should have some ascii letters in it). Another way we might be able to address this is through filters, and write a filter that ensures the secret meets this bar.

We are totally agree with you! We consider do a filter or change the SECRET regex to check that the finding has at least 1 character a-z, A-Z or 0-9, for example. This can improve this situations and report less false positives.

However, I remember that y'all were saying how you found that there were several real cases you found where secrets were just blank strings. How does this work in this case?

In this version with the fix, detect-secrets doesn't report empty passwords (like String password = "" or password= ). In future developments, we will include this in the tool. In our experience, although such secrets may sometimes appear, they are usually false positives, so we do not give them much priority.

killuazhu pushed a commit to IBM/detect-secrets that referenced this pull request Mar 3, 2021
* Catch more cases for IAM

* Fix build
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants