New keywords in the denylist #430

pablosnt · 2021-03-15T12:15:25Z

This Pull Request includes new items in the DENYLIST variable of the KeywordDetector plugin. Moreover, It also removes the following words:

aws_secret_access_key: it will be detected due to the keyword secret. The aws_ section will be captured as a prefix and _access_key as a suffix.
secrete: it will be detected due to the keyword secret and the character e will be captured as a suffix.

All the tests has been passed and I think the performance is not affected by this changes.

pablosnt · 2021-03-24T16:34:30Z

Hi @domanchi, what do you think about that? Thank you!

domanchi · 2021-04-12T22:05:38Z

From my new testing system, I found a good chunk more results with this change (2 unique results -> 11 unique results). From these additional results, I got only two false positives:

line: `data = JSON.stringify {service_key: pagerDutyServiceApiKey, event_type: "trigger", description: description}`
flag: `pagerDutyServiceApiKey, event_type: `

line: `$this->privateKey = new PrivateKey($keys['privatekey']);`
flag: `new PrivateKey($keys[`

Strangely, it was able to find auth_key but not auth_secret in this following example:

auth_key=836e3fdd362bdc7d
auth_secret=3ec93ccfbe14d941

Pretty good so far -- I wonder whether we should/can do anything about these FPs? They seem like regex issues, rather than candidates for additional filtering.

pablosnt · 2021-04-13T13:09:40Z

Hi @domanchi, I tested your lines and I get only the real secrets:

I also include the test file that I used to check your comment: test.txt.

Can you check if this test file is correct and can you test again if you get the same results? Thank you very much in advance.

I can understand the detection of these false positives because the presence of keywords like privateKey, however as can you see, I think that detect-secrets doesn't detect it. In the case of auth_secret we don't modify the secret keyword in the denylist, so I think that this line should be detected in the same way than in the master branch.

Thank you, we are waiting your feedback about this tests!

domanchi · 2021-04-13T14:47:57Z

Try without the line: prefix. That was just me trying to format the payloads to demonstrate what was the line, and what was found.

$ cat false-positives
data = JSON.stringify {service_key: pagerDutyServiceApiKey, event_type: "trigger", description: description}
$this->privateKey = new PrivateKey($keys['privatekey']);

pablosnt · 2021-04-13T15:35:44Z

Try without the line: prefix. That was just me trying to format the payloads to demonstrate what was the line, and what was found.
$ cat false-positives
data = JSON.stringify {service_key: pagerDutyServiceApiKey, event_type: "trigger", description: description}
$this->privateKey = new PrivateKey($keys['privatekey']);

Okey I will test it, thank you!

pablosnt · 2021-04-13T15:56:53Z

Okey, I tested it again and I got the same results:

In this case the test.txt content are:

data = JSON.stringify {service_key: pagerDutyServiceApiKey, event_type: "trigger", description: description}
$this->privateKey = new PrivateKey($keys['privatekey']);

auth_key=836e3fdd362bdc7d
auth_secret=3ec93ccfbe14d941

Are you sure that you are testing this version? Am I doing something wrong?

domanchi · 2021-04-13T16:13:57Z

Ohh! I think I know what's up. You might have the gibberish-detector installed, so it's filtering things out?

$ git status
On branch feature/keyword-denylist
Your branch is up to date with 'pablo/feature/keyword-denylist'.
$ cat blah.txt
data = JSON.stringify {service_key: pagerDutyServiceApiKey, event_type: "trigger", description: description}
$this->privateKey = new PrivateKey($keys['privatekey']);

$ python -m detect_secrets scan blah.txt
{
  "version": "1.0.3",
  "plugins_used": [
    {
      "name": "ArtifactoryDetector"
    },
    {
      "name": "AWSKeyDetector"
    },
    {
      "name": "AzureStorageKeyDetector"
    },
    {
      "name": "Base64HighEntropyString",
      "limit": 4.5
    },
    {
      "name": "BasicAuthDetector"
    },
    {
      "name": "CloudantDetector"
    },
    {
      "name": "HexHighEntropyString",
      "limit": 3.0
    },
    {
      "name": "IbmCloudIamDetector"
    },
    {
      "name": "IbmCosHmacDetector"
    },
    {
      "name": "JwtTokenDetector"
    },
    {
      "name": "KeywordDetector",
      "keyword_exclude": ""
    },
    {
      "name": "MailchimpDetector"
    },
    {
      "name": "NpmDetector"
    },
    {
      "name": "PrivateKeyDetector"
    },
    {
      "name": "SlackDetector"
    },
    {
      "name": "SoftlayerDetector"
    },
    {
      "name": "SquareOAuthDetector"
    },
    {
      "name": "StripeDetector"
    },
    {
      "name": "TwilioKeyDetector"
    }
  ],
  "filters_used": [
    {
      "path": "detect_secrets.filters.allowlist.is_line_allowlisted"
    },
    {
      "path": "detect_secrets.filters.common.is_ignored_due_to_verification_policies",
      "min_level": 2
    },
    {
      "path": "detect_secrets.filters.heuristic.is_indirect_reference"
    },
    {
      "path": "detect_secrets.filters.heuristic.is_likely_id_string"
    },
    {
      "path": "detect_secrets.filters.heuristic.is_lock_file"
    },
    {
      "path": "detect_secrets.filters.heuristic.is_potential_uuid"
    },
    {
      "path": "detect_secrets.filters.heuristic.is_prefixed_with_dollar_sign"
    },
    {
      "path": "detect_secrets.filters.heuristic.is_sequential_string"
    },
    {
      "path": "detect_secrets.filters.heuristic.is_swagger_file"
    },
    {
      "path": "detect_secrets.filters.heuristic.is_templated_secret"
    }
  ],
  "results": {
    "blah.txt": [
      {
        "type": "Secret Keyword",
        "filename": "blah.txt",
        "hashed_secret": "50a9328d2ab042de99987695dc49b43458228bf9",
        "is_verified": false,
        "line_number": 1
      },
      {
        "type": "Secret Keyword",
        "filename": "blah.txt",
        "hashed_secret": "ed2fc88365acf74b3229f52bb9d6dae24c3cc86d",
        "is_verified": false,
        "line_number": 2
      }
    ]
  },
  "generated_at": "2021-04-13T16:13:37Z"
}

pablosnt · 2021-04-13T17:00:14Z

Ohh you are right, the gibberish-detector was filtering my results. Now, I get the false positives. Do you have any approach to filter it?

I'm not sure about how to apply a filter to exclude this results because I think that this cases are very specific of some languages, but in the other cases won't make sense. The first idea that i had was filter the lines with JSON.stringify { (something more complex than this, but it's only a first idea) but I think that a real secret could be hardcoded in lines like these, so filter it can be dangerous.

In the PrivateKey case, I think that all the lines with a pattern like new <some letters>(<something>) could be filtered, but it maybe could be dangerous because there are cases like new String() or new StringBuilder() that could include real secrets.

If you have any better idea, please tell us. If not, can you give us some feedback about this?

Thank you very much!

domanchi · 2021-04-13T17:15:47Z

I think this is partially due to the introduction of spaces in a previous PR. After all, if we ignore the spaces, then a case could be made that this is behaving "as expected", since:

service_key: pagerDutyServiceApiKey

and

$this->privateKey = new

would have triggered the KeywordDetector (as I understand it). In this perspective, this issue lay dormant, mainly because our keyword list was short.

Perhaps we should invert our "default" logic: given that we want to support spaces by default in our "secret values", we should require quotes by default (and have other filetypes map to no quotes required, like .ini). I'll run the experiment with this to test the hypothesis, but in the meantime, thoughts?

pablosnt · 2021-04-13T17:50:27Z

I was thinking in the spaces problem and in these false positives, and I think that there is not easy solution for the PrivateKey case. The solution here, probably will require a big change like your proposal.

But in the case of the service_key, the problem is that we are scanning JSON content in one line into a file that it's not a JSON file. So I think that here the solution could be the exclusion of the , from the SECRET regex (currently is not allowed as the last secret character) since it is a common programming character. What do you think about this? I don't know how disruptive can it be or how often secrets include commas, but I think that it should work to avoid false positives like that.

If you think that it is not a valid solution, we can implement a filter to detect the secrets that includes a comma following by a white space (, ). I think that it will filter the most false positives like that, but it will require to check all the findings after the scan.

Please, share your opinion about this proposal @domanchi @syn-4ck .

domanchi · 2021-04-13T21:33:57Z

After looking at the experiment results, I think that this inversion makes a lot of sense. It reduced the total number of KeywordDetector unique results by 147 (out of 2631, ~5.5%), and almost all of the results that were eliminated were false positives. Some examples include:

scss:

.#{$fa-css-prefix}-user-secret:before { content: $fa-var-user-secret; }

coffee:

API_KEY = process.env.API_KEY

python:

def get_or_create_user(self, username, request=None, password=None):

rst:

.. _Bitbucket app passwords: https://bitbucket.org/account/admin/app-passwords

ipynb:

"os.environ['AWS_SECRET_ACCESS_KEY'] = creds.secret_key"

php:

final SecretKeySpec secret_key = new SecretKeySpec(asciiCs.encode(secret).array(), "HmacSHA256");

jsx:

const { ResetPasswordPage: componentState } = state.components;

Situations where it missed includes:

Script text: java --classpath=tools/mysql-connector-java.jar --driver=com.mysql.jdbc.Driver --username=$USERNAME --password=PROMPT updateSQL

This is what I propose:

Add a Filetype to determine configuration files. Seems like .cnf, .conf, .cfg, .cf or .ini are popular contenders.
For YAML and CONFIG files, require no quotes. Change the default to requiring quotes.

If you think about it, I think it makes sense. Many programming languages make a distinction between strings and variables, and we only want to flag the former.

pablosnt · 2021-04-14T13:14:20Z

Hi @domanchi, we discussed your proposal and it makes sense for us. We can implement it today, do you prefer an independent Pull Request for that? In this way, this Pull Request can be merged and the change in the default regex of the KeywordDetector can be tested and modified in the new one. Please, tell me your preference to contributing with this change. Thank you for your great idea!

domanchi · 2021-04-14T13:33:24Z

Sure, that sounds good.

Co-authored-by: detect-secrets-updater <detect-secrets-updater@ibm.com>

pablosnt added 2 commits March 15, 2021 13:07

New keywords in the denylist

0c79989

Pwd keyword

8a509d6

syn-4ck mentioned this pull request Apr 14, 2021

Keyword plugin: default regex modification #448

Merged

domanchi merged commit 16f6625 into Yelp:master Apr 14, 2021

killuazhu pushed a commit to IBM/detect-secrets that referenced this pull request May 18, 2021

Bump to version 0.13.1+ibm.35.dss to fix vulnerabilities (Yelp#430)

804455a

Co-authored-by: detect-secrets-updater <detect-secrets-updater@ibm.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New keywords in the denylist #430

New keywords in the denylist #430

pablosnt commented Mar 15, 2021

pablosnt commented Mar 24, 2021

domanchi commented Apr 12, 2021 •

edited

Loading

pablosnt commented Apr 13, 2021

domanchi commented Apr 13, 2021

pablosnt commented Apr 13, 2021

pablosnt commented Apr 13, 2021

domanchi commented Apr 13, 2021

pablosnt commented Apr 13, 2021

domanchi commented Apr 13, 2021

pablosnt commented Apr 13, 2021

domanchi commented Apr 13, 2021

pablosnt commented Apr 14, 2021

domanchi commented Apr 14, 2021

New keywords in the denylist #430

New keywords in the denylist #430

Conversation

pablosnt commented Mar 15, 2021

pablosnt commented Mar 24, 2021

domanchi commented Apr 12, 2021 • edited Loading

pablosnt commented Apr 13, 2021

domanchi commented Apr 13, 2021

pablosnt commented Apr 13, 2021

pablosnt commented Apr 13, 2021

domanchi commented Apr 13, 2021

pablosnt commented Apr 13, 2021

domanchi commented Apr 13, 2021

pablosnt commented Apr 13, 2021

domanchi commented Apr 13, 2021

pablosnt commented Apr 14, 2021

domanchi commented Apr 14, 2021

domanchi commented Apr 12, 2021 •

edited

Loading