Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle un-scannable files more gracefully #141

Closed
KevinHock opened this issue Mar 19, 2019 · 5 comments
Closed

Handle un-scannable files more gracefully #141

KevinHock opened this issue Mar 19, 2019 · 5 comments

Comments

@KevinHock
Copy link
Collaborator

KevinHock commented Mar 19, 2019

Having a bunch of
INFO: Checking file: some_image.png
WARNING: some_image.png failed to load.
from

log.warning("%s failed to load.", filename)

is not ideal. We know we cannot scan certain files e.g. images, so we should behave more gracefully.

In detect-secrets-server we already have the IGNORED_FILE_EXTENSIONS tuple we made to skip files like this

IGNORED_FILE_EXTENSIONS = (
    '7z',
    'bmp',
    'bz2',
    'dmg',
    'exe',
    'gif',
    'gz',
    'ico',
    'jar',
    'jpg',
    'jpeg',
    'png',
    'rar',
    'realm',
    's7z',
    'tar',
    'tif',
    'tiff',
    'webp',
    'zip',
)

maybe we should move it to detect-secrets, change it to a dict, and use it.

@dgzlopes
Copy link
Contributor

Hello! I don't understand why the IGNORED_FILE_EXTENSIONS structure should change from a tuple to a dict. Could you offer some more insight on this?

On the other side, I made this quick fix dgzlopes@3828be9 on my fork (handling the check inside scan_file() instead of _extract_secrets_from_file()) and seems to work ok. Any idea on where to handle it more gently?

@KevinHock
Copy link
Collaborator Author

The reason why I would like to change it to a dict is because the only use is an in check, which is faster in a hash table (dictionary). It's less for performance reasons and more for knowing how it is used, from seeing the type.

@dgzlopes
Copy link
Contributor

Oh! I see. On the other hand (and sorry if this question is stupid) why we can't use a Set()?

They have both the same time complexity (O(1) for membership checking on the average case [0]) and both are implemented with a Hashtable. Also our file extensions values fit in one dimension [1], are unique and can be unordered. Also the Python docs [3] states that they are a good option for membership testing.

[0] https://wiki.python.org/moin/TimeComplexity#set
[1] https://stackoverflow.com/questions/19454970/is-there-a-python-dict-without-values
[2] https://docs.python.org/3/library/stdtypes.html#set

@KevinHock
Copy link
Collaborator Author

Oh, I wasn't clear, I did mean set.
e.g. replacing the ( with { and ) with }

A dictionary without keys is a set, so I kind of used them interchangeably.

@KevinHock
Copy link
Collaborator Author

Thanks @dgzlopes! 👍 🎈 🎉 🎂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants