Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect and ignore hashes #270

Closed
epage opened this issue Jun 1, 2021 · 2 comments
Closed

Detect and ignore hashes #270

epage opened this issue Jun 1, 2021 · 2 comments
Labels
enhancement Improve the expected question Uncertainty is involved

Comments

@epage
Copy link
Collaborator

epage commented Jun 1, 2021

For example, #269 has

"hash": "kqul2pqqzcpobrd6fh6ect7d3sxlvvks",

Though I wonder what other formats exist to detect

@vergoh
Copy link

vergoh commented Jun 10, 2021

Another example is pretty much any base64 encoded content used in files as variables. Such strings end up causing false positives rather easily. Simple way to test (in empty directory):

ls /bin | while read f ; do base64 /bin/$f >$f ; done
typos .

I suspect it could help to have some maximum length for strings that are interpreted as words that get checked.

@epage
Copy link
Collaborator Author

epage commented Jun 16, 2021

Challenges

  • UUIDs could look like var1 - var2 - var3 - var4
  • hashes could look like a variable name
  • base64 could look like var1 + var2 / var3 =
    • This is a similar challenge to detecting emails since @ can be an operator in some languages
  • Ignoring the separator can look indiscernible from multiple variables and it can subtly break, depending on the data (+ and / can easily change position, changing the length of "words")
  • If we use length as a determining factor, we have to consider what the longest Java class name might be...
    • Most ridiculous class name lengths are in the mid 90s (InternalFrameInternalFrameTitlePaneInternalFrameTitlePaneMaximizeButtonPainter from the JDK and HasThisTypePatternTriedToSneakInSomeGenericOrParameterizedTypePatternMatchingStuffAnywhereVisitor from AspectJ)

hex values are least likely to contain words and base64 i most likely, so I assume that should be the priority.

@epage epage closed this as completed in 23b6ad5 Jun 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improve the expected question Uncertainty is involved
Projects
None yet
Development

No branches or pull requests

2 participants