Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bring-your own-plugins (BYOP), via --custom-plugins option #255

Closed
wants to merge 5 commits into from

Conversation

KevinHock
Copy link
Collaborator

@KevinHock KevinHock commented Oct 18, 2019

In case you wanted to make e.g. a new entropy plugin with different metacharacters, or a new keyword type detector, or a company-specific plugin, or use it/tune it privately before merging upstream.

To be used like so: detect-secrets scan --custom-plugins custom_plugins/ test_data/each_secret.py

We had a "truffleHog vs. metasploit" discussion internally. With the former, you can e.g. edit a config file with regexes in it like shhgit, with the latter you write your own modules.

Main reasons why this seems like a better fit for detect-secrets

  • More general purpose
  • We would still need verification functions written in Python if we put all our regexes in a config
  • This encourages writing tests

if parser.prog == 'detect-secrets-hook':
return parser

for action in parser._actions: # pragma: no cover (Always returns)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -20,6 +20,7 @@ class SecretsCollection(object):
def __init__(
self,
plugins=(),
custom_plugin_paths=[],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we switch this to None and then fill in the correct default value below? Re: the default-parameter-gets-re-used-in-Python feature.

I think the () above is fine because () are immutable.

# The difference will show whenever the word list changes
automaton, result.word_list_hash = build_automaton(result.word_list_file)

# In v0.12.8 the `--custom-plugins` option got added
result.custom_plugin_paths = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do result.custom_plugin_paths = data.get('custom_plugin_paths', [])?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch, I caught this too after reviewing it again, reminded me of bloodhound code.

plugin_paths = ['detect_secrets/plugins'] + list(custom_plugin_paths)
plugin_modules = []
for plugin_path in plugin_paths:
for root, _, files in os.walk( # pragma: no cover (Always breaks)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the rationale for os.walk vs os.listdir? Seems like with listdir we could reduce the code complexity here.

Nevermind, I see that it was already like that. You can fix it if you want though :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most definitely :)

action='append',
default=[],
dest='custom_plugin_paths',
help='Paths containing custom plugin Python files, not searched recursively.',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Directories containing ... clearer? Reading the code, it seems like things will break if we specify a file here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that makes sense, ++

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noting that it now works for directories and files.

@@ -243,6 +243,7 @@ def mock_env(self, user_inputs=None, baseline=None):
@property
def baseline(self):
return {
'custom_plugin_paths': (),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be lists so that it matches the actual interface?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! It was originally a tuple, so I must have missed a spot 👍

detect_secrets/plugins/common/util.py Outdated Show resolved Hide resolved


def _change_custom_plugin_paths_to_tuple(custom_plugin_paths_function):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we renamed this decorator to def tuplify (or whatever the canonical verb is) it would be a lot easier to understand. Alternatively, I'm not opposed to just putting it in the function body.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally prefer genericized naming too, but as written the argument has to be named custom_plugin_paths_function, otherwise it won't work, so I'm ambivalent. Maybe I can try to make it so the decorator tuplifies all arguments 🤔 , ++.

If we put it in the function body, I don't think we could lru_cache it, right?

Copy link
Contributor

@OiCMudkips OiCMudkips Oct 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I understand why the parameter has to be named custom_plugin_paths_function. Can't we s/custom_plugin_paths_function/fn/g?

I don't know where the lru_cache is applied right now, so we can leave it as a decorator if it works as intended already.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*Ouch my bad @OiCMudkips, I mis-typed last Friday, the argument to the function being decorated has to be named custom_plugin_paths, as it's currently written. So it's not applicable to any function or any function with 1 arg, only 1 arg func's with custom_plugin_paths as an arg.

I will try to make it so that it can accept any 1 arg func, so we can name it a more genericized name though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: where the lru_cache is applied
I wrap functions that accept a list, to turn them into functions that accept a tuple, before the lru_cache decorator gets to them, so that they can be cached. Lists are unacceptable for immutability reasons IIRC

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following up with this: the reason the arg must be named custom_plugin_paths, is because any function decorated by this decorator may be called with custom_plugin_paths as a keyword argument.

i.e. If I changed this to

    def wrapper_of_custom_plugin_paths_function(one_arg):
        return custom_plugin_paths_function(tuple(one_arg))

it would result in

============================================== ERRORS ===============================================
________________________________ ERROR collecting tests/main_test.py ________________________________
tests/main_test.py:76: in <module>
    class TestMain:
tests/main_test.py:294: in TestMain
    'name': 'Base64HighEntropyString',
tests/main_test.py:33: in get_list_of_plugins
    for name, plugin in import_plugins(custom_plugin_paths=[]).items():
E   TypeError: wrapper_of_custom_plugin_paths_function() got an unexpected keyword argument 'custom_plugin_paths'

tests/core/baseline_test.py Show resolved Hide resolved
@@ -78,10 +127,36 @@ def add_console_use_arguments(self):
return self

def parse_args(self, argv):
output = self.parser.parse_args(argv)
PluginOptions.consolidate_args(output)
# We temporarily remove '--help' so that we can give the full
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I understand this, can you explain?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you do --help with --custom-plugins, you want to see the options for e.g. --no-custom-detector, if we just parse --help before --custom-plugins, it will SystemExit and give the help message without any options from the custom plugins.

detect_secrets/core/usage.py Outdated Show resolved Hide resolved
Comment on lines 79 to 89
# Dynamically imported classes have different
# addresses for the same functions as statically
# imported classes do, so issubclass does not work.
# We use __mro__ to loop through all parent classes.
# issubclass does work in Python 3, since parent classes
# do have the same addresses.
if not any(
'plugins.base.BasePlugin' in str(klass)
for klass in attr.__mro__
):
return False
Copy link
Collaborator Author

@KevinHock KevinHock Oct 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to the issue explained in this comment, super calls in Python 2 for plugins no longer work!

e.g. the super call in

    @property
    def __dict__(self):
        output = super(HighEntropyStringsPlugin, self).__dict__
        output.update({
            'hex_limit': self.entropy_limit,
        })

        return output

will throw a super(type, obj): obj must be an instance or subtype of type error because it thinks self is not a subtype, when really it just has different addresses for its functions. I patched around the same issue with issubclass in this highlighted code, but with super there is not a desirable path forward that I can see, I shall ask on StackOverflow for advice.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For posterity, I did look into this, although I time-boxed myself.

You can change the metaclass of BasePlugin to be e.g.

+class AlwaysTrue(ABCMeta):
+    def __instancecheck__(self, instance):
+    def __subclasscheck__(self, instance):

however, super still throws the above error, only isinstance and issubclass work.

In other words, I believe this is definitely blocked on dropping Python 2 support #260. Either that or have the user supply the import path as well as the file path, but I don't want to do that.

@killuazhu
Copy link
Contributor

To be used like so: detect-secrets scan --custom-plugins custom_plugins/ test_data/each_secret.py

In general it sounds like a good idea. Have we thought about how to deal with things like additional pip packages required by new plugins?

Another thing is it would be nice to include a helloworld or sample plugin in the sample_custom_plugins folder, so others can easily follow and start developing new plugins.

@KevinHock
Copy link
Collaborator Author

KevinHock commented Oct 31, 2019

Have we thought about how to deal with things like additional pip packages required by new plugins?

For the server-side case it's easy to include any packages you need next to e.g. detect-secrets( or -server). But for the pre-commit hook case, I don't think it's desirable to try to pip install packages dynamically when someone runs detect-secrets, since it's so unexpected.

Another thing is it would be nice to include a helloworld or sample plugin in the sample_custom_plugins folder, so others can easily follow and start developing new plugins.

I did include a couple custom plugins in the tests, but those are kind of too basic. Since we link to the plugins/ dir in the README right before we mention --custom-plugins I think most users will be able to look at that directory for inspiration.

We should also add a link to the right contributing.md section though](https://github.com/Yelp/detect-secrets/blob/master/CONTRIBUTING.md#writing-a-plugin), I can add that.

@killuazhu
Copy link
Contributor

I did include a couple custom plugins in the tests, but those are kind of too basic. Since we link to the plugins/ dir in the README right before we mention --custom-plugins I think most users will be able to look at that directory for inspiration.

I think this should work. Given some of the relative imports might be broken (e.g. example) depends on who user structure their custom import folders.

We should also add a link to the right contributing.md section though](https://github.com/Yelp/detect-secrets/blob/master/CONTRIBUTING.md#writing-a-plugin), I can add that.

👍

@KevinHock KevinHock force-pushed the byop_feature branch 2 times, most recently from 8498770 to 484970e Compare April 19, 2020 00:26
@KevinHock KevinHock removed the blocked label Apr 19, 2020
Comment on lines +21 to +22
assert str(plugin.__class__) == str(HexHighEntropyString)
assert dir(plugin.__class__) == dir(HexHighEntropyString)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to do

-        assert all(
-            func(plugin.__class__) == func(Base64HighEntropyString)
-            for func in (str, dir)
-        )
+        assert str(plugin.__class__) == str(Base64HighEntropyString)
+        assert dir(plugin.__class__) == dir(Base64HighEntropyString)

because otherwise it was 94% 21->exit, 71->exit

@KevinHock
Copy link
Collaborator Author

I rebased with master so this should be ready to review now @OiCMudkips / @killuazhu 😁

- 🐍 Add add_shared_arguments() function in usage.py for DRYness
- 🐛 Fix issue #242 via passing `should_verify_secrets=not args.no_verify` to `from_parser_builder` call
- 🐛 Fix sorting issue in format_baseline_for_output() where --update and regular scan had different secret order
- 💯 All non-separated out files again :D
- 🎓 Mention `--custom-plugins` in README
- 🎓 Standardize NOTE -> Note
- 🐛 Fix test pollution due to `all_plugins` cls attribute
- 🐍 Change all relative imports to absolute, to avoid broken imports if someone copies an existing plugin to make a custom plugin

🙈 Hacks located in `def parse_args` of usage.py
@KevinHock
Copy link
Collaborator Author

Polite re-ping on this one 💟 I have some fun funemployed time at the moment.

Copy link
Contributor

@OiCMudkips OiCMudkips left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, hopefully you're holding up OK :)

It's been a while so I'll probably do one more pass tomorrow to make sure this is sane but here's some more for you to chew on for now.

@@ -283,13 +290,13 @@ def determine_audit_results(baseline, baseline_path):
audit_results['stats'][audit_result]['count'] += 1
audit_results['stats'][audit_result]['files'][filename] += 1
total += 1
audit_results['stats']['signal'] = str(
audit_results['stats']['signal'] = '{:.2f}'.format(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The % go into this format string

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch!! I no longer need the [:4] so that makes a lot of sense 🎊

detect_secrets/pre_commit_hook.py Show resolved Hide resolved
detect_secrets/plugins/aws.py Show resolved Hide resolved

from detect_secrets.plugins.base import BasePlugin
from detect_secrets.util import get_root_directory


def change_custom_plugin_paths_to_tuple(custom_plugin_paths_function):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems kinda excessive, can we just call tuple(argument) wherever we want to try to be safer?

One alternative is to make it so that the ArgumentParser returns a tuple by implementing a custom action but honestly I'd be OK with us just not worry about the accidentally mutating the list of paths.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for tuplifying is so that we can lru_cache the functions this decorator is used on, not b/c I'm afraid the list will get mutated.

I'll re-read the code to see if tuplifying as soon as we parse args (/the baseline) makes things easier. 👍 Convert to X as early as possible seems like always good advice, I'll have to see if there was a reason I wrote it this way. 🤔

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not appear there was a good reason 😁 I'll go ahead and do the custom action, something similar to https://stackoverflow.com/questions/50365835/how-to-make-argparse-argumentparser-return-a-tuple-or-an-np-array-rather-than-a

killuazhu pushed a commit to IBM/detect-secrets that referenced this pull request May 28, 2020
* Fail build

* Support python 2

* One more 🐛

* Add no cov to avoid py2 code in py3 env
@KevinHock
Copy link
Collaborator Author

Made new PR due to change in GitHub permissions. So closing this one.

@KevinHock KevinHock closed this May 30, 2020
killuazhu pushed a commit to IBM/detect-secrets that referenced this pull request Jul 9, 2020
* Fail build

* Support python 2

* One more 🐛

* Add no cov to avoid py2 code in py3 env
@KevinHock KevinHock deleted the byop_feature branch July 23, 2020 04:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants