Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle binary values in YAML files #223

Merged
merged 4 commits into from
Sep 4, 2019
Merged

Conversation

OiCMudkips
Copy link
Contributor

@OiCMudkips OiCMudkips commented Aug 14, 2019

(WIP as of writing)

This is step (1) in supporting binary in both YAML and non-YAML files.

This makes it so that instead of immediately converting the base64-encoded binary into a binary value in python, we just interpret the binary as a normal string, but annotate it as such with the is_binary flag.

This is needed so that plugins can scan a different value from the value hashed into baselines.

Alternatives:

  • Convert the binary to the desired form discussed here in YamlFileParser and put the desired form into __value__
    • Why not?: We need the original base64-format in the plugin to hash into baselines. It's possible to retrieve it, but why do extra work?
  • Instead of __value__ and __is_binary__, have fields __scan_value__ and __plaintext_value__
    • Why not?: I think we're going to end up with a new method in the high-entropy plugin analyze_binary_content (in addition to analyze_string_content) which knows how to do the conversion to the "desired format", but that depends on us knowing that the string is a binary as opposed to some other special sauce, so I think the is_binary information is important.
    • Also my gut feeling says what I've written will be fine; I don't think there's any other data types we'll need to consider.

Closes #202


The next steps would be to make the high-entropy plugin actually use the is_binary information to do the conversion and output results correctly.

@OiCMudkips OiCMudkips self-assigned this Aug 14, 2019
Copy link
Collaborator

@KevinHock KevinHock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good so far.

I like your plan, maybe it would be easier to do regular binary strings first, and then do yaml binary strings, but it's up to you.

@@ -75,7 +75,7 @@ def _tag_dict_values(self, map_node):
"""
new_values = []
for key, value in map_node.value:
if not value.tag.endswith(':str'):
if not value.tag.endswith(':str') and not value.tag.endswith(':binary'):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: feel free to ignore
i feel like

if not (
    value.tag.endswith(':str')
    or
    value.tag.endswith(':binary')
):

is a little prettier ⚡️ 💟

@@ -92,6 +92,11 @@ def _tag_dict_values(self, map_node):
str(value.__line__),
'tag:yaml.org,2002:int',
),
self._create_key_value_pair_for_mapping_node_value(
'__is_binary__',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize the other calls of this don't do this, but maybe we can use keyword args to make things clearer, it helps with the other function calls in this file.

Victor Zhou added 3 commits August 21, 2019 14:24
This is step (1) in supporting binary in both YAML and non-
YAML files.

This makes it so that instead of immediately converting the
base64-encoded binary into a binary value in python, we just
interpret the binary as a normal string, but annotate it as
such with the `is_binary` flag.

This is needed so that plugins can scan a different value
from the value hashed into baselines.
This implements support for high-entropy secrets in binary
values in yaml files.

We encode the binary value into a hex- or base64-encoded
string (based on the plugin), and run the normal entropy
check. If the string is deemed to be high-entropy, we re
encode the string into a yaml binary (using `yaml.dump`)
and strip the `!!binary`. This yaml binary is considered
the secret, and is put into the baseline as normal.

I had to update a test function so that it uses a custom
hex high-entropy detector, since `HighEntropyStringsPlugin`
is now an abstract class.
To test this you would need to import an unused class
into the module so that it's in `globals()`, and have
the test know what class that is. Seems messy to me,
and not worth what it would be testing.
@OiCMudkips OiCMudkips changed the title WIP: Handle binary values in YAML files Handle binary values in YAML files Aug 29, 2019
@OiCMudkips
Copy link
Contributor Author

This is done. The general strategy is:

  1. Parse YAML binaries, so that they eventually become a bytes object b'something'. In Python 2, this will be the same as 'something' though, so we have an __is_binary__ field to tell us whether the value is actually a binary in the plugin.
  2. For any value with the is_binary field set to True, decode_binary it to turn it into a str and run it through the plugin's analyze_string function.
  3. If the value is a secret, create a PotentialSecret with the str. This str isn't what's actually in the file, so apply the reverse of decode_binary (which is encode_to_binary), yaml.dump it, then strip the !!binary. This will leave us with the secret encoded in the binary yaml format. Set this as the secret in the PotentialSecret object.

The complicated-ness of (3) is needed so that we can actually find the secret in the file afterwards using the existing audit functionality.

@KevinHock
Copy link
Collaborator

re: ‘HighEntropyStringsPlugin is now an abstract class’ in commit message, for the explanation of the new class in the test, I’m on my phone but I don’t see that it’s abstract.

@KevinHock
Copy link
Collaborator

KevinHock commented Aug 30, 2019

Never mind me lol, I see the ABCMeta metaclass.

I guess it was an abstract class before this PR and I read the message as this making it abstract.

Copy link
Collaborator

@KevinHock KevinHock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

Note to self to check when I’m on desktop: I’m not sure how we were instantiating an abstract class in a test before, I should check I understand that.

@OiCMudkips
Copy link
Contributor Author

OiCMudkips commented Sep 4, 2019

KevinHock++ I think the test was able to instantiate an abstract class because we didn’t actually mark any methods as abstract, but once I added some abstract methods, the test started failing so I had to fix it.

@OiCMudkips OiCMudkips merged commit f996791 into Yelp:master Sep 4, 2019
killuazhu pushed a commit to IBM/detect-secrets that referenced this pull request May 28, 2020
killuazhu pushed a commit to IBM/detect-secrets that referenced this pull request Jul 9, 2020
killuazhu pushed a commit to IBM/detect-secrets that referenced this pull request Sep 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve handling of !!binary in yaml files
2 participants