Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handling of !!binary in yaml files #202

Closed
KevinHock opened this issue Jun 25, 2019 · 6 comments · Fixed by #223
Closed

Improve handling of !!binary in yaml files #202

KevinHock opened this issue Jun 25, 2019 · 6 comments · Fixed by #223
Assignees

Comments

@KevinHock
Copy link
Collaborator

KevinHock commented Jun 25, 2019

There are various explicit tags in yaml https://yaml.org/spec/1.2/spec.html, after some testing I don't think we handle !!binary that well, for instance. Resulting in false-negatives. I will investigate more thoroughly and update this issue later.

@OiCMudkips
Copy link
Contributor

OiCMudkips commented Aug 2, 2019

It seems like there are two issues here. Consider this file:

some_binary_key: !!binary |
 R0lGODlhDAAMAIQAAP//9/X
 17unp5WZmZgAAAOfn515eXv
 Pz7Y6OjuDg4J+fn5OTk6enp
 56enmleECcgggoBADs=
some_oneline_binary_key: !!binary R0lGODlhDAAMAIQAAP//9/X17unp5WZmZgAAAOfn515eXvPz7Y6OjuDg4J+fn5OTk6enp56enmleECcgggoBADs=
some_str_key: !!str hahahaha
some_untagged_str_key: hahahaha
        
nested: 
    some_binary_key: !!binary R0lGODlhDAAMAIQAAP//9/X17unp5WZmZgAAAOfn515eXvPz7Y6OjuDg4J+fn5OTk6enp56enmleECcgggoBADs=

As of 220b37d our YamlFileParser gives this result to the high-entropy plugin:

{'nested': {'some_binary_key': "GIF89a\x0c\x00\x0c\x00\x84\x00\x00\xff\xff\xf7\xf5\xf5\xee\xe9\xe9\xe5fff\x00\x00\x00\xe7\xe7\xe7^^^\xf3\xf3\xed\x8e\x8e\x8e\xe0\xe0\xe0\x9f\x9f\x9f\x93\x93\x93\xa7\xa7\xa7\x9e\x9e\x9ei^\x10' \x82\n\x01\x00;"},
 'some_binary_key': "GIF89a\x0c\x00\x0c\x00\x84\x00\x00\xff\xff\xf7\xf5\xf5\xee\xe9\xe9\xe5fff\x00\x00\x00\xe7\xe7\xe7^^^\xf3\xf3\xed\x8e\x8e\x8e\xe0\xe0\xe0\x9f\x9f\x9f\x93\x93\x93\xa7\xa7\xa7\x9e\x9e\x9ei^\x10' \x82\n\x01\x00;",
 'some_oneline_binary_key': "GIF89a\x0c\x00\x0c\x00\x84\x00\x00\xff\xff\xf7\xf5\xf5\xee\xe9\xe9\xe5fff\x00\x00\x00\xe7\xe7\xe7^^^\xf3\xf3\xed\x8e\x8e\x8e\xe0\xe0\xe0\x9f\x9f\x9f\x93\x93\x93\xa7\xa7\xa7\x9e\x9e\x9ei^\x10' \x82\n\x01\x00;",
 'some_str_key': {'__line__': 7, '__value__': 'hahahaha'},
 'some_untagged_str_key': {'__line__': 8, '__value__': 'hahahaha'}}

The fist problem is that our parser doesn't output it in the __line__, __value__ dict format properly, causing detect_secrets to ignore it.

The second problem is that GIF89a\x0c\x00\x0c\x00\x84\x00\x00\xff\xff\xf7\xf5\xf5\xee\xe9\xe9\xe5fff\x00\x00\x00\xe7\xe7\xe7^^^\xf3\xf3\xed\x8e\x8e\x8e\xe0\xe0\xe0\x9f\x9f\x9f\x93\x93\x93\xa7\xa7\xa7\x9e\x9e\x9ei^\x10' \x82\n\x01\x00; probably isn't the form in which we want to scan it.

Side note: !!str doesn't seem to affect anything.

TODO: How does the Yaml parser handle other tags

@OiCMudkips
Copy link
Contributor

!!set also doesn't get parsed into the __line__, __value__ format:

my_set: !!set
    ? abc
    ? def
    ? ghi

is parsed as {'my_set': set(['abc', 'def', 'ghi'])}.

However, I'm inclined not to care, since it seems really unlikely that anyone would put secrets into a set. On the other hand, it might be easy to support this, would need to investigate further

@OiCMudkips
Copy link
Contributor

We seem to handle !!omap perfectly find without modification, so that's cool.

@OiCMudkips
Copy link
Contributor

Having application-specific tags just break YAML parsing since we haven't registered the tag in the detect-secrets. We would need to allow endusers to register their own tags to support this.

@OiCMudkips
Copy link
Contributor

OiCMudkips commented Aug 2, 2019

Our parser also doesn't parse multi-doc YAML files properly:

a: secret
---
b: secret
---     
c: secret

results in an error.

@OiCMudkips
Copy link
Contributor

@KevinHock and talked about this offline. We discovered the following.

Binary secrets, once we remove the \x from the string, have high hex entropy. For example, this is a common secret format in Python files: \xab\x1e\x77\xbb (not a real secret). Without the \x this would be ab1e77bb which is a high-entropy string (we made the high-entropy determination just by looking at the string, not with any math).

This is a opposed to the above GIF data in my first comment, which with its \xs removed looks like this: GIF89a0c000c00840000fffff7f5f5eee9e9e5fff000000e7e7e7^^^f3f3ed8e8e8ee0e0e09f9f9f939393a7a7a79e9e9ei^10' 82\n0100;. This has a lot more repeated characters (0, f, 8, 9) and is lower-entropy. Of course, this isn't a pure hex string, but we can fix this by converting the characters not in \x?? format to \x?? format before removing the \x.

In addition, there's some denylisting opportunities. For example, clearly the above is a negative because it starts with GIF (or \x47\x73\x70) and ends with an ;. JPG files start with \xff\xd8 and end with \xff\xd9, etc.

killuazhu pushed a commit to IBM/detect-secrets that referenced this issue May 28, 2020
Supports git-defenders/detect-secrets-discuss#203
killuazhu pushed a commit to IBM/detect-secrets that referenced this issue Jul 9, 2020
Supports git-defenders/detect-secrets-discuss#203
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants