Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New keywords in the denylist #430

Merged
merged 2 commits into from
Apr 14, 2021
Merged

New keywords in the denylist #430

merged 2 commits into from
Apr 14, 2021

Conversation

pablosnt
Copy link
Contributor

This Pull Request includes new items in the DENYLIST variable of the KeywordDetector plugin. Moreover, It also removes the following words:

  • aws_secret_access_key: it will be detected due to the keyword secret. The aws_ section will be captured as a prefix and _access_key as a suffix.
  • secrete: it will be detected due to the keyword secret and the character e will be captured as a suffix.

All the tests has been passed and I think the performance is not affected by this changes.

@pablosnt
Copy link
Contributor Author

Hi @domanchi, what do you think about that? Thank you!

@domanchi
Copy link
Contributor

domanchi commented Apr 12, 2021

From my new testing system, I found a good chunk more results with this change (2 unique results -> 11 unique results). From these additional results, I got only two false positives:

line: `data = JSON.stringify {service_key: pagerDutyServiceApiKey, event_type: "trigger", description: description}`
flag: `pagerDutyServiceApiKey, event_type: `
line: `$this->privateKey = new PrivateKey($keys['privatekey']);`
flag: `new PrivateKey($keys[`

Strangely, it was able to find auth_key but not auth_secret in this following example:

auth_key=836e3fdd362bdc7d
auth_secret=3ec93ccfbe14d941

Pretty good so far -- I wonder whether we should/can do anything about these FPs? They seem like regex issues, rather than candidates for additional filtering.

@pablosnt
Copy link
Contributor Author

Hi @domanchi, I tested your lines and I get only the real secrets:

denylist_test

I also include the test file that I used to check your comment: test.txt.

Can you check if this test file is correct and can you test again if you get the same results? Thank you very much in advance.

I can understand the detection of these false positives because the presence of keywords like privateKey, however as can you see, I think that detect-secrets doesn't detect it. In the case of auth_secret we don't modify the secret keyword in the denylist, so I think that this line should be detected in the same way than in the master branch.

Thank you, we are waiting your feedback about this tests!

@domanchi
Copy link
Contributor

Try without the line: prefix. That was just me trying to format the payloads to demonstrate what was the line, and what was found.

$ cat false-positives
data = JSON.stringify {service_key: pagerDutyServiceApiKey, event_type: "trigger", description: description}
$this->privateKey = new PrivateKey($keys['privatekey']);

@pablosnt
Copy link
Contributor Author

Try without the line: prefix. That was just me trying to format the payloads to demonstrate what was the line, and what was found.

$ cat false-positives
data = JSON.stringify {service_key: pagerDutyServiceApiKey, event_type: "trigger", description: description}
$this->privateKey = new PrivateKey($keys['privatekey']);

Okey I will test it, thank you!

@pablosnt
Copy link
Contributor Author

Okey, I tested it again and I got the same results:
test

In this case the test.txt content are:

data = JSON.stringify {service_key: pagerDutyServiceApiKey, event_type: "trigger", description: description}
$this->privateKey = new PrivateKey($keys['privatekey']);

auth_key=836e3fdd362bdc7d
auth_secret=3ec93ccfbe14d941

Are you sure that you are testing this version? Am I doing something wrong?

@domanchi
Copy link
Contributor

Ohh! I think I know what's up. You might have the gibberish-detector installed, so it's filtering things out?

$ git status
On branch feature/keyword-denylist
Your branch is up to date with 'pablo/feature/keyword-denylist'.
$ cat blah.txt
data = JSON.stringify {service_key: pagerDutyServiceApiKey, event_type: "trigger", description: description}
$this->privateKey = new PrivateKey($keys['privatekey']);

$ python -m detect_secrets scan blah.txt
{
  "version": "1.0.3",
  "plugins_used": [
    {
      "name": "ArtifactoryDetector"
    },
    {
      "name": "AWSKeyDetector"
    },
    {
      "name": "AzureStorageKeyDetector"
    },
    {
      "name": "Base64HighEntropyString",
      "limit": 4.5
    },
    {
      "name": "BasicAuthDetector"
    },
    {
      "name": "CloudantDetector"
    },
    {
      "name": "HexHighEntropyString",
      "limit": 3.0
    },
    {
      "name": "IbmCloudIamDetector"
    },
    {
      "name": "IbmCosHmacDetector"
    },
    {
      "name": "JwtTokenDetector"
    },
    {
      "name": "KeywordDetector",
      "keyword_exclude": ""
    },
    {
      "name": "MailchimpDetector"
    },
    {
      "name": "NpmDetector"
    },
    {
      "name": "PrivateKeyDetector"
    },
    {
      "name": "SlackDetector"
    },
    {
      "name": "SoftlayerDetector"
    },
    {
      "name": "SquareOAuthDetector"
    },
    {
      "name": "StripeDetector"
    },
    {
      "name": "TwilioKeyDetector"
    }
  ],
  "filters_used": [
    {
      "path": "detect_secrets.filters.allowlist.is_line_allowlisted"
    },
    {
      "path": "detect_secrets.filters.common.is_ignored_due_to_verification_policies",
      "min_level": 2
    },
    {
      "path": "detect_secrets.filters.heuristic.is_indirect_reference"
    },
    {
      "path": "detect_secrets.filters.heuristic.is_likely_id_string"
    },
    {
      "path": "detect_secrets.filters.heuristic.is_lock_file"
    },
    {
      "path": "detect_secrets.filters.heuristic.is_potential_uuid"
    },
    {
      "path": "detect_secrets.filters.heuristic.is_prefixed_with_dollar_sign"
    },
    {
      "path": "detect_secrets.filters.heuristic.is_sequential_string"
    },
    {
      "path": "detect_secrets.filters.heuristic.is_swagger_file"
    },
    {
      "path": "detect_secrets.filters.heuristic.is_templated_secret"
    }
  ],
  "results": {
    "blah.txt": [
      {
        "type": "Secret Keyword",
        "filename": "blah.txt",
        "hashed_secret": "50a9328d2ab042de99987695dc49b43458228bf9",
        "is_verified": false,
        "line_number": 1
      },
      {
        "type": "Secret Keyword",
        "filename": "blah.txt",
        "hashed_secret": "ed2fc88365acf74b3229f52bb9d6dae24c3cc86d",
        "is_verified": false,
        "line_number": 2
      }
    ]
  },
  "generated_at": "2021-04-13T16:13:37Z"
}

@pablosnt
Copy link
Contributor Author

Ohh you are right, the gibberish-detector was filtering my results. Now, I get the false positives. Do you have any approach to filter it?

I'm not sure about how to apply a filter to exclude this results because I think that this cases are very specific of some languages, but in the other cases won't make sense. The first idea that i had was filter the lines with JSON.stringify { (something more complex than this, but it's only a first idea) but I think that a real secret could be hardcoded in lines like these, so filter it can be dangerous.

In the PrivateKey case, I think that all the lines with a pattern like new <some letters>(<something>) could be filtered, but it maybe could be dangerous because there are cases like new String() or new StringBuilder() that could include real secrets.

If you have any better idea, please tell us. If not, can you give us some feedback about this?

Thank you very much!

@domanchi
Copy link
Contributor

I think this is partially due to the introduction of spaces in a previous PR. After all, if we ignore the spaces, then a case could be made that this is behaving "as expected", since:

service_key: pagerDutyServiceApiKey

and

$this->privateKey = new

would have triggered the KeywordDetector (as I understand it). In this perspective, this issue lay dormant, mainly because our keyword list was short.

Perhaps we should invert our "default" logic: given that we want to support spaces by default in our "secret values", we should require quotes by default (and have other filetypes map to no quotes required, like .ini). I'll run the experiment with this to test the hypothesis, but in the meantime, thoughts?

@pablosnt
Copy link
Contributor Author

I was thinking in the spaces problem and in these false positives, and I think that there is not easy solution for the PrivateKey case. The solution here, probably will require a big change like your proposal.

But in the case of the service_key, the problem is that we are scanning JSON content in one line into a file that it's not a JSON file. So I think that here the solution could be the exclusion of the , from the SECRET regex (currently is not allowed as the last secret character) since it is a common programming character. What do you think about this? I don't know how disruptive can it be or how often secrets include commas, but I think that it should work to avoid false positives like that.

If you think that it is not a valid solution, we can implement a filter to detect the secrets that includes a comma following by a white space (, ). I think that it will filter the most false positives like that, but it will require to check all the findings after the scan.

Please, share your opinion about this proposal @domanchi @syn-4ck .

@domanchi
Copy link
Contributor

After looking at the experiment results, I think that this inversion makes a lot of sense. It reduced the total number of KeywordDetector unique results by 147 (out of 2631, ~5.5%), and almost all of the results that were eliminated were false positives. Some examples include:

scss:

.#{$fa-css-prefix}-user-secret:before { content: $fa-var-user-secret; }

coffee:

API_KEY = process.env.API_KEY

python:

def get_or_create_user(self, username, request=None, password=None):

rst:

.. _Bitbucket app passwords: https://bitbucket.org/account/admin/app-passwords

ipynb:

"os.environ['AWS_SECRET_ACCESS_KEY'] = creds.secret_key"

php:

final SecretKeySpec secret_key = new SecretKeySpec(asciiCs.encode(secret).array(), "HmacSHA256");

jsx:

const { ResetPasswordPage: componentState } = state.components;

Situations where it missed includes:

Script text: java --classpath=tools/mysql-connector-java.jar --driver=com.mysql.jdbc.Driver --username=$USERNAME --password=PROMPT updateSQL

This is what I propose:

  1. Add a Filetype to determine configuration files. Seems like .cnf, .conf, .cfg, .cf or .ini are popular contenders.
  2. For YAML and CONFIG files, require no quotes. Change the default to requiring quotes.

If you think about it, I think it makes sense. Many programming languages make a distinction between strings and variables, and we only want to flag the former.

@pablosnt
Copy link
Contributor Author

Hi @domanchi, we discussed your proposal and it makes sense for us. We can implement it today, do you prefer an independent Pull Request for that? In this way, this Pull Request can be merged and the change in the default regex of the KeywordDetector can be tested and modified in the new one. Please, tell me your preference to contributing with this change. Thank you for your great idea!

@domanchi
Copy link
Contributor

Sure, that sounds good.

@domanchi domanchi merged commit 16f6625 into Yelp:master Apr 14, 2021
killuazhu pushed a commit to IBM/detect-secrets that referenced this pull request May 18, 2021
Co-authored-by: detect-secrets-updater <detect-secrets-updater@ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants