Bugfix of Yaml exception with simple quotes #414

syn-4ck · 2021-03-01T16:32:50Z

The following exception has been raised during a Yaml file scan:

The YAML file has the following content:

example:
  test:
    username: user
    password: 'my password'

To fix the bug, I just remove the ? in the following keyword plugin line:

OPTIONAL_WHITESPACE = r'\s*?'

I think that the ? in this regex is not necessary because the * matches from 0 to n times.

Also, in our experience, it is very common for developers to use passwords with whitespaces. Currently, detect-secrets doesn't report the secrets if the value has been defined between quotes and report the first string in other cases. I suggest the change of SECRET = r'[^\s]+' regex to SECRET = r'[^\r\n]+'.

In closing, I remove the FALSE_POSITIVES list to clear the code, I think it is unused and it has been replaced by the filters.

domanchi · 2021-03-01T16:43:16Z

@syn-4ck , what version of detect-secrets are you running? I would think that YAMLTransformer would handle this better.

syn-4ck · 2021-03-01T16:45:18Z

Hi @domanchi, I detect the bug in the v1.0.3.

domanchi · 2021-03-01T23:46:38Z

Ok, I had a chance to look this over. Couple of comments:

You're absolutely right about the FALSE_POSITIVES list. I had missed migrating https://github.com/Yelp/detect-secrets/blob/v0.14.3/detect_secrets/plugins/keyword.py#L352 in the new version (not by design), so it's technically not currently used.

Do you have thoughts on whether detect-secrets should continue to ship with a hardcoded list of false positives? On one hand, this list was generated through hours of sifting through raw data; but on the other hand, I hypothesize that it's pretty environment specific.
The modification from SECRET = r'[^\s]+' regex to SECRET = r'[^\r\n]+' makes sense to me. Nice catch.
I was successfully able to reproduce the bug you demonstrated (thank you for the minimal proof of concept), but I was unable to reproduce the fix on my end. I'll need to explore this in more depth.

syn-4ck · 2021-03-02T08:31:27Z

Thanks for checking the changes. I would like to answer your comments below:

I think the best option in this case is use the filters for a global context. In my opinion, if a finding is considered false positive by detect-secrets in the keyword plugin, for consistency it not be reported by any plugin. For this specific use case I think that the filters detect_secrets.filters.regex.should_exclude_secret and detect_secrets.filters.wordlist.should_exclude_secret can be a good approach. However, if you prefer save this denylist in the plugin until migrate it, I can change the PR and not delete it.
Nice, thank you!
I have tested again on the branch to merge with Debian and Windows and it works for me. Below I attach a new screenshot in the Linux machine.:

I hope this comment helps you @domanchi.

pablosnt · 2021-03-02T08:56:14Z

Hi @domanchi and @syn-4ck, I was successfully able to reproduce the bug and the fix in Microsoft Windows and MacOS, next you can see the evidence in the second one:

I hope this to be useful in your discussion and thank you @syn-4ck for the fix!

syn-4ck · 2021-03-02T09:41:41Z

Thanks for your feedback @pablosantiagolopez!

domanchi · 2021-03-02T14:36:00Z

Ah, I see it now. You need both the SECRET change, and the OPTIONAL_WHITESPACE change. I also realize that the ? is not necessary for the whitespace, since we want it to be greedy (and consume any and all whitespace it finds).

Yes, let's not remove the FALSE_POSITIVES set yet (even better, just add a comment on it saying that it's scheduled for deprecation), and we can merge this. I have a lightweight ML model in the pipeline that replaces it quite nicely.

syn-4ck · 2021-03-02T15:22:33Z

Hi @domanchi! I have just uploaded the changes, now the FALSE_POSITiVES list is commented for a future deprecation.

In effect, you need both changes to correct it. I look forward to more comments or to the merge.

domanchi · 2021-03-02T15:29:54Z

There is a test failing, but is not related to this branch. Will fix on master.

domanchi · 2021-03-02T15:33:59Z

In a quick test of this new SECRET regex, it looks like it's too permissive.

Secret:      1 of 24
Filename:    README.md
Secret Type: Secret Keyword
----------
478:```
479:
480:Or you can specify multiple regex rules as such:
481:
482:```bash
483:$ detect-secrets scan --exclude-lines 'password = >> blah' --exclude-lines 'password = fake' <<
484:```
485:
486:#### --exclude-files
487:
488:Sometimes, you want to be able to ignore certain files in your scan. You can specify a regex

where >> and << indicate the highlighted portion.

We will need to change this to exclude quote marks.

domanchi · 2021-03-02T15:49:10Z

In further comparisons (within this repo alone), it looks like it does have better signal, but with some trade-offs. For example:

Catches secrets with spaces (good)
Catches too many secrets? (maybe good)

Secret:      15 of 24
Filename:    docs/audit.md
Secret Type: Secret Keyword
----------
11:## Manually Labelling Secrets
12:
13:```bash
14:$ detect-secrets scan test_data > .secrets.baseline
15:$ detect-secrets audit .secrets.baseline
16:Secret:      >> 1 of 80 <<
17:Filename:    test_data/baseline.file
18:Secret Type: Secret Keyword
19:----------
20:59:    }
21:60:  ],
----------
Is this a secret that should be committed to this repository? (y)es, (n)o, (s)kip, (b)ack, (q)uit:

Catches random things (bad)

Secret:      24 of 24
Filename:    docs/filters.md
Secret Type: Secret Keyword
----------
126:`/path/to/file::function_name`. For example:
127:
128:```bash
129:$ cat custom_filter.py
130:def is_invalid_secret(secret: str) -> bool:
131:    return secret =>>= <<'invalid'
132:
133:$ detect-secrets scan --filter custom_filter.py::is_invalid_secret
134:```
135:
136:You can also provide an import path to the filter function you desire:
----------
Is this a secret that should be committed to this repository? (y)es, (n)o, (s)kip, (b)ack, (q)uit:

I think our previous assumption is a whitespace delineated secret, but it's acting funky now that we're changing that assumption. 🤔

syn-4ck · 2021-03-02T16:27:05Z

We are currently checking this.

In the first one, in the line Secret: >> 1 of 80 << I think the detect-secrets behaviour is correct, it's only a false positive.
In the second one, the line return secret == 'invalid' should be detected by detect-secrets, because a developer can compare a variable with a hardcoded secret. Obviously, mark = in the audit it isn't correct, but maybe it's a good idea to develop a regex for this behaviour. This regex has been implemented in Keyword detector optimization #396, but we still working in it.

Do you detect more "random things"? :(

domanchi · 2021-03-02T16:41:31Z

compare variable with hardcoded secret

That's a good point. I guess the only bad thing here is that there's no ascii characters in this (if we assume that secrets should have some ascii letters in it). Another way we might be able to address this is through filters, and write a filter that ensures the secret meets this bar.

However, I remember that y'all were saying how you found that there were several real cases you found where secrets were just blank strings. How does this work in this case?

syn-4ck · 2021-03-02T17:22:12Z

That's a good point. I guess the only bad thing here is that there's no ascii characters in this (if we assume that secrets should have some ascii letters in it). Another way we might be able to address this is through filters, and write a filter that ensures the secret meets this bar.

We are totally agree with you! We consider do a filter or change the SECRET regex to check that the finding has at least 1 character a-z, A-Z or 0-9, for example. This can improve this situations and report less false positives.

However, I remember that y'all were saying how you found that there were several real cases you found where secrets were just blank strings. How does this work in this case?

In this version with the fix, detect-secrets doesn't report empty passwords (like String password = "" or password= ). In future developments, we will include this in the tool. In our experience, although such secrets may sometimes appear, they are usually false positives, so we do not give them much priority.

* Catch more cases for IAM * Fix build

Bugfix of Yaml exception with simple quotes

35b829d

Add commented FALSE_POSITIVE list until migrate it

10e730a

domanchi approved these changes Mar 2, 2021

View reviewed changes

domanchi merged commit 37e9f30 into Yelp:master Mar 2, 2021

killuazhu pushed a commit to IBM/detect-secrets that referenced this pull request Mar 3, 2021

Catch more cases for IAM (Yelp#414)

467f9f1

* Catch more cases for IAM * Fix build

This was referenced Mar 4, 2021

Keyword regexes improvement #418

Merged

Bug auditing test_data/config.yaml with binary values #421

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bugfix of Yaml exception with simple quotes #414

Bugfix of Yaml exception with simple quotes #414

syn-4ck commented Mar 1, 2021

domanchi commented Mar 1, 2021

syn-4ck commented Mar 1, 2021

domanchi commented Mar 1, 2021

syn-4ck commented Mar 2, 2021

pablosnt commented Mar 2, 2021

syn-4ck commented Mar 2, 2021

domanchi commented Mar 2, 2021

syn-4ck commented Mar 2, 2021

domanchi commented Mar 2, 2021

domanchi commented Mar 2, 2021

domanchi commented Mar 2, 2021

syn-4ck commented Mar 2, 2021

domanchi commented Mar 2, 2021

syn-4ck commented Mar 2, 2021 •

edited

Loading

Bugfix of Yaml exception with simple quotes #414

Bugfix of Yaml exception with simple quotes #414

Conversation

syn-4ck commented Mar 1, 2021

domanchi commented Mar 1, 2021

syn-4ck commented Mar 1, 2021

domanchi commented Mar 1, 2021

syn-4ck commented Mar 2, 2021

pablosnt commented Mar 2, 2021

syn-4ck commented Mar 2, 2021

domanchi commented Mar 2, 2021

syn-4ck commented Mar 2, 2021

domanchi commented Mar 2, 2021

domanchi commented Mar 2, 2021

domanchi commented Mar 2, 2021

syn-4ck commented Mar 2, 2021

domanchi commented Mar 2, 2021

syn-4ck commented Mar 2, 2021 • edited Loading

syn-4ck commented Mar 2, 2021 •

edited

Loading