Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add filtering of english words from entropy (and keyword) plugins #241

Merged
merged 26 commits into from
Sep 24, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
d93e9bd
:mortar_board: Standardize `do not` -> `don't`
KevinHock Sep 19, 2019
5b66a26
:snake: Refactor awkward [0] to a short-circuit
KevinHock Sep 19, 2019
bb543c5
:bug: We were doing e.g. '.svg' == 'svg'
KevinHock Sep 19, 2019
f10cce8
:snake: Refactor `for...if...return` with `any`
KevinHock Sep 19, 2019
bc4f922
:snake: Remove is_false_positive from RegexBasedDetector
KevinHock Sep 19, 2019
0b2c0e1
:mortar_board: Standardize comments
KevinHock Sep 19, 2019
f8cb31f
:tada: Add --word-list option
KevinHock Sep 19, 2019
7bdf06f
:100: Coverage for jwt.py
KevinHock Sep 19, 2019
e4494ea
:snake: Separate not :100: covered files
KevinHock Sep 19, 2019
d3c9583
:snake: Make automaton case-insensitive
KevinHock Sep 19, 2019
9e3669d
:snake: Slightly better condition for is_verified = True
KevinHock Sep 19, 2019
7ce4b85
:telescope: Don't put words less than 4 chars in automaton
KevinHock Sep 20, 2019
de7fbd2
:100: Coverage for high_entropy_strings.py
KevinHock Sep 20, 2019
139c64a
:snake: Refactor `audit --display-results` code
KevinHock Sep 21, 2019
6c6e028
:mortar_board: Use plural variable name for list
KevinHock Sep 21, 2019
977c4fb
:tada: Add verification for Mailchimp API keys
KevinHock Sep 21, 2019
9cabbe0
:tada: Add verification for Stripe secret API keys
KevinHock Sep 21, 2019
20d1921
:100: Coverage for baseline.py initialize.py usage.py
KevinHock Sep 21, 2019
433b75e
:snake: Add stats to `audit --display-results` code
KevinHock Sep 23, 2019
a096600
:performing_arts: Add more to `IGNORED_FILE_EXTENSIONS`
KevinHock Sep 23, 2019
ef5d000
:bug: Fix auditing a baseline with MailChimp in it
KevinHock Sep 23, 2019
a493356
:bug: Fix TypeError thrown when Yaml was all comments
KevinHock Sep 23, 2019
171eeca
:bug: Fix scanning files that don't exist
KevinHock Sep 23, 2019
69a3aac
:snake: Use hash.update to be more succinct
KevinHock Sep 24, 2019
19d22da
:snake: Inline leading . in constants.py
KevinHock Sep 24, 2019
b440623
:snake: Remove useless WordListSupportedDetector class
KevinHock Sep 24, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -275,7 +275,7 @@ This includes using `# pragma: allowlist secret` now for inline allowlisting.

#### :telescope: Accuracy

- [Added `null` to the `FALSE_POSITIVES` tuple for the `KeywordDetector` plugin, so we do not alert off of it](https://github.com/Yelp/detect-secrets/commit/58df82ce37d64f22cb885960c2031b5f8ebe4b75)
- [Added `null` to the `FALSE_POSITIVES` tuple for the `KeywordDetector` plugin, so we don't alert off of it](https://github.com/Yelp/detect-secrets/commit/58df82ce37d64f22cb885960c2031b5f8ebe4b75)



Expand All @@ -286,7 +286,7 @@ This includes using `# pragma: allowlist secret` now for inline allowlisting.

- Turned the `KeywordDetector` plugin back on, with new regexes and accuracy improvements ([#86])
- Added an `AWSAccessKeyDetector` plugin ([#100])
- Added the ability to scan `.ini` types files that do not have a header ([#106])
- Added the ability to scan `.ini` types files that don't have a header ([#106])

[#86]: https://github.com/Yelp/detect-secrets/pull/86
[#100]: https://github.com/Yelp/detect-secrets/pull/100
Expand Down
6 changes: 3 additions & 3 deletions LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -100,21 +100,21 @@ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
excluding those notices that don't pertain to any part of
KevinHock marked this conversation as resolved.
Show resolved Hide resolved
the Derivative Works; and

(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
within such NOTICE file, excluding those notices that don't
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
don't modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -178,7 +178,7 @@ committing secrets.
### Things that won't be prevented

* Multi-line secrets
* Default passwords that do not trigger the `KeywordDetector` (e.g. `login = "hunter2"`)
* Default passwords that don't trigger the `KeywordDetector` (e.g. `login = "hunter2"`)

### Plugin Configuration

Expand Down
94 changes: 69 additions & 25 deletions detect_secrets/core/audit.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,19 +41,34 @@ class RedundantComparisonError(Exception):


AUDIT_RESULT_TO_STRING = {
True: 'positive',
False: 'negative',
None: 'unknown',
True: 'true-positives',
False: 'false-positives',
None: 'unknowns',
}

EMPTY_PLUGIN_AUDIT_RESULT = {
'results': {
'positive': [],
'negative': [],
'unknown': [],
'true-positives': defaultdict(list),
'false-positives': defaultdict(list),
'unknowns': defaultdict(list),
},
'config': {},
}
EMPTY_STATS_RESULT = {
'signal': 0,
'true-positives': {
'count': 0,
'files': defaultdict(int),
},
'false-positives': {
'count': 0,
'files': defaultdict(int),
},
'unknowns': {
'count': 0,
'files': defaultdict(int),
},
}


def audit_baseline(baseline_filename):
Expand Down Expand Up @@ -208,12 +223,30 @@ def determine_audit_results(baseline, baseline_path):
Given a baseline which has been audited, returns
a dictionary describing the results of each plugin in the following form:
{
"results": {
"plugins": {
"plugin_name1": {
"results": {
"positive": [list of secrets with is_secret: true caught by this plugin],
"negative": [list of secrets with is_secret: false caught by this plugin],
"unknown": [list of secrets with no is_secret entry caught by this plugin]
"true-positives": [
list of {
filename: {
'line': '...',
'plaintext':'...',
}
} for secrets with `is_secret: true` caught by this plugin],
"false-positives": [
list of {
filename: {
'line': '...',
'plaintext':'...',
}
} for secrets with `is_secret: false` caught by this plugin],
"unknowns": [
list of {
filename: {
'line': '...',
'plaintext':'...',
}
} for secrets with no `is_secret` entry caught by this plugin]
},
"config": {configuration used for the plugin}
},
Expand All @@ -228,34 +261,49 @@ def determine_audit_results(baseline, baseline_path):
all_secrets = _secret_generator(baseline)

audit_results = {
'results': defaultdict(lambda: deepcopy(EMPTY_PLUGIN_AUDIT_RESULT)),
'plugins': defaultdict(lambda: deepcopy(EMPTY_PLUGIN_AUDIT_RESULT)),
'stats': deepcopy(EMPTY_STATS_RESULT),
}

secret_type_to_plugin_name = get_mapping_from_secret_type_to_class_name()

total = 0
for filename, secret in all_secrets:
file_contents = _open_file_with_cache(filename)

secret_info = {}
secret_info['line'] = _get_file_line(filename, secret['line_number'])
try:
secret_plaintext = get_raw_secret_value(
secret_info['plaintext'] = get_raw_secret_value(
secret=secret,
plugin_settings=baseline['plugins_used'],
file_handle=io.StringIO(file_contents),
filename=filename,
)
except SecretNotFoundOnSpecifiedLineError:
secret_plaintext = _get_file_line(filename, secret['line_number'])
secret_info['plaintext'] = None

plugin_name = secret_type_to_plugin_name[secret['type']]
audit_result = AUDIT_RESULT_TO_STRING[secret.get('is_secret')]
audit_results['results'][plugin_name]['results'][audit_result].append(secret_plaintext)
audit_results['plugins'][plugin_name]['results'][audit_result][filename].append(secret_info)

audit_results['stats'][audit_result]['count'] += 1
audit_results['stats'][audit_result]['files'][filename] += 1
total += 1
audit_results['stats']['signal'] = str(
(
audit_results['stats']['true-positives']['count']
/
total
) * 100,
)[:4] + '%'

for plugin_config in baseline['plugins_used']:
plugin_name = plugin_config['name']
if plugin_name not in audit_results['results']:
if plugin_name not in audit_results['plugins']:
continue

audit_results['results'][plugin_name]['config'].update(plugin_config)
audit_results['plugins'][plugin_name]['config'].update(plugin_config)

git_repo_path = os.path.dirname(os.path.abspath(baseline_path))
git_sha = get_git_sha(git_repo_path)
Expand Down Expand Up @@ -657,13 +705,9 @@ def get_raw_secret_value(

plugin_secrets = plugin.analyze(file_handle, filename)

matching_secret = [
plugin_secret.secret_value
for plugin_secret in plugin_secrets
if plugin_secret.secret_hash == secret['hashed_secret']
]

if not matching_secret:
raise SecretNotFoundOnSpecifiedLineError(secret['line_number'])
# Return value of matching secret
for plugin_secret in plugin_secrets:
if plugin_secret.secret_hash == secret['hashed_secret']:
return plugin_secret.secret_value

return matching_secret[0]
raise SecretNotFoundOnSpecifiedLineError(secret['line_number'])
44 changes: 30 additions & 14 deletions detect_secrets/core/baseline.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,17 +18,27 @@ def initialize(
plugins,
exclude_files_regex=None,
exclude_lines_regex=None,
word_list_file=None,
word_list_hash=None,
should_scan_all_files=False,
):
"""Scans the entire codebase for secrets, and returns a
SecretsCollection object.

:type path: list

:type plugins: tuple of detect_secrets.plugins.base.BasePlugin
:param plugins: rules to initialize the SecretsCollection with.

:type exclude_files_regex: str|None
:type exclude_lines_regex: str|None
:type path: list

:type word_list_file: str|None
:param word_list_file: optional word list file for ignoring certain words.

:type word_list_hash: str|None
:param word_list_hash: optional iterated sha1 hash of the words in the word list.

:type should_scan_all_files: bool

:rtype: SecretsCollection
Expand All @@ -37,17 +47,21 @@ def initialize(
plugins,
exclude_files=exclude_files_regex,
exclude_lines=exclude_lines_regex,
word_list_file=word_list_file,
word_list_hash=word_list_hash,
)

files_to_scan = []
for element in path:
if os.path.isdir(element):
if should_scan_all_files:
files_to_scan.extend(_get_files_recursively(element))
files_to_scan.extend(
_get_files_recursively(element),
)
else:
files = _get_git_tracked_files(element)
if files:
files_to_scan.extend(files)
files_to_scan.extend(
_get_git_tracked_files(element),
)
elif os.path.isfile(element):
files_to_scan.append(element)
else:
Expand All @@ -65,7 +79,7 @@ def initialize(
files_to_scan,
)

for file in files_to_scan:
for file in sorted(files_to_scan):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No real reason, just looked nicer when I was running it on things.

output.scan_file(file)

return output
Expand Down Expand Up @@ -267,6 +281,7 @@ def _get_git_tracked_files(rootdir='.'):
:rtype: set|None
:returns: filepaths to files which git currently tracks (locally)
"""
output = []
try:
with open(os.devnull, 'w') as fnull:
git_files = subprocess.check_output(
Expand All @@ -277,13 +292,13 @@ def _get_git_tracked_files(rootdir='.'):
],
stderr=fnull,
)

return set([
util.get_relative_path(rootdir, filename)
for filename in git_files.decode('utf-8').split()
])
for filename in git_files.decode('utf-8').split():
relative_path = util.get_relative_path_if_in_cwd(rootdir, filename)
if relative_path:
output.append(relative_path)
except subprocess.CalledProcessError:
return None
pass
return output


def _get_files_recursively(rootdir):
Expand All @@ -293,6 +308,7 @@ def _get_files_recursively(rootdir):
output = []
for root, _, files in os.walk(rootdir):
for filename in files:
output.append(util.get_relative_path(root, filename))

relative_path = util.get_relative_path_if_in_cwd(root, filename)
if relative_path:
output.append(relative_path)
return output
2 changes: 1 addition & 1 deletion detect_secrets/core/bidirectional_iterator.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
class BidirectionalIterator(object):
def __init__(self, collection):
self.collection = collection
self.index = -1 # starts on -1, as index is increased _before_ getting result
self.index = -1 # Starts on -1, as index is increased _before_ getting result
self.step_back_once = False

def __next__(self):
Expand Down
54 changes: 31 additions & 23 deletions detect_secrets/core/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,29 +7,37 @@
# and look for "ASCII text", but that might be more expensive.
#
# Definitely something to look into, if this list gets unruly long.
IGNORED_FILE_EXTENSIONS = {
'7z',
'bmp',
'bz2',
'dmg',
'exe',
'gif',
'gz',
'ico',
'jar',
'jpg',
'jpeg',
'png',
'rar',
'realm',
's7z',
'svg',
'tar',
'tif',
'tiff',
'webp',
'zip',
}
IGNORED_FILE_EXTENSIONS = set(
(
'.7z',
'.bmp',
'.bz2',
'.dmg',
'.eot',
'.exe',
'.gif',
'.gz',
'.ico',
'.jar',
'.jpg',
'.jpeg',
'.mo',
'.png',
'.rar',
'.realm',
'.s7z',
'.svg',
'.tar',
'.tif',
'.tiff',
'.ttf',
'.webp',
'.woff',
'.xls',
'.xlsx',
'.zip',
),
)


class VerifiedResult(Enum):
Expand Down
Loading