Classify base64 encoded tokens as belonging to particular tools/services #158

justineyster · 2019-04-09T17:20:32Z

Raising this as a parallel issue to one I opened today in the IBM fork.

Context

@jribm raised a point while I was working on #156 that, under our current approach, base64 encoded strings won't be classified as belonging to a particular tool. They may be caught by the base64 entropy scanner, but lack of association to a particular tool means that they will not be verifiable.

Examples:
X-JFrog-Art-Api: <some-base64-encoded-string>
artifactory:_password: <some-base64-encoded-string>

In both of these examples, there is a defined structure for the indicators that the key belongs to a particular service. However, the string itself won't match as an Artifactory key because the encoded string doesn't follow the expected token format.

Given this issue, we could design a two-step approach where we base64 decode suspicious strings to see if they match for a particular tool. I can imagine at least two approaches for doing this:

Search for indicators of a particular tool's token (like the authentication header X-JFrog-Art-Api in the examples above), decode the suspicious string near that indicator, and test it against the regex for that service.
base64 decode strings that are caught by the base64 entropy scanner and test the decoded string against all of the other secret detectors.

Subtasks & step(s)

Raise a parallel issue in https://github.com/Yelp/detect-secrets to gather feedback from upstream community.
Decide on general approach for decoding and testing suspicious strings.
Implement solution and merge in our codebase and upstream.

Success criteria

base64 encoded tokens will be classified as belonging to a particular service.

The text was updated successfully, but these errors were encountered:

KevinHock · 2019-05-15T21:55:54Z

I'm kind of ambivalent, which approach do you prefer?

I think 1. can be accomplished with something similar to the keyword detector.

For 2. what would your example base64 strings decode to?

KevinHock · 2019-06-24T21:51:48Z

Noting for posterity, and because we have verifiability now, GitHub API tokens are 40 chars, and can easily be verified via the oauth/scopes endpoint, though I am having a hard time finding the exact link to that API. I can say I hit it yesterday.

lorenzodb1 · 2024-05-09T17:17:03Z

We're going to close this issue as it hasn't received any update in a very long time. Feel free to re-open it if you think it's still relevant.

KevinHock added the enhancement label May 15, 2019

lorenzodb1 added pending The issue still needs to be reviewed by one of the maintainers. and removed enhancement labels Jun 13, 2022

lorenzodb1 closed this as completed May 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Classify base64 encoded tokens as belonging to particular tools/services #158

Classify base64 encoded tokens as belonging to particular tools/services #158

justineyster commented Apr 9, 2019 •

edited

Loading

KevinHock commented May 15, 2019

KevinHock commented Jun 24, 2019

lorenzodb1 commented May 9, 2024

Classify base64 encoded tokens as belonging to particular tools/services #158

Classify base64 encoded tokens as belonging to particular tools/services #158

Comments

justineyster commented Apr 9, 2019 • edited Loading

Context

Subtasks & step(s)

Success criteria

KevinHock commented May 15, 2019

KevinHock commented Jun 24, 2019

lorenzodb1 commented May 9, 2024

justineyster commented Apr 9, 2019 •

edited

Loading