Version 2.0.0 #84

adeptex · 2021-09-20T15:25:13Z

Release notes

❌ Breaking changes ❌

❌ File exclusion globs are now regex ❌

In version 1 the configuration file expected file exclusion specification to be a list of globs. Whispers would then resolve included globs, resolve excluded globs, and finally subtract the two lists to get applicable scope. The entire target directory tree would be traversed twice to compute applicable files (highly resource-intensive operation!)

In version 2 file exclusions are specified as regex. Instead of resolving globs, Whispers now uses the generator directly. Every file path received from the glob generator is now checked against the file exclusion regex to determine whether the file should be excluded on-the-fly.

This highly improves performance for cases where the target directory contains a large number of files. In version 2 the tree is traversed file by file, individually checking if the file path matches a pre-compiled exclusion regex. This decreases CPU, RAM and time needed to scan directories of potentially unlimited trees and depths.

❌ Rule specification format changes ❌

In version 1 the rules were defined as a dictionary with rule ID as the key and rule config as the value. This created awkward parsing practices and unintuitive code. For example:

npmrc: 
  description: Hardcoded .npmrc authToken
  message: .npmrc authToken
  severity: CRITICAL
  key:
    regex: ^npm authToken$
    ignorecase: False

In version 2 the rules are defined as a list of dictionaries. The rule ID now has its own id key inside the rule config definition. For example:

- id: npmrc
  description: Hardcoded .npmrc authToken
  message: .npmrc authToken
  severity: CRITICAL
  key:
    regex: ^npm authToken$
    ignorecase: False

If you have any custom rule definitions, you will have to adjust them for migrating to version 2.

❌ Removed support for dynamic languages ❌

In version 1 the following language files were parsed as text and checked for common variable declaration and assignment patterns:

JavaScript
Java
Go
PHP

It is not possible to parse these languages as Abstract Syntax Trees (ASTs) in Python. The initial attempt was to detect "low hanging fruit" by parsing the files as text instead. This lead to poor functional coverage, as well as a potentially false sense of security.

In version 2 the support for these dynamic languages is dropped. This allowed bringing unit test coverage up to 100%, and in this way ensuring result reliability and true security coverage. It is recommended to rely on AST-based parsing for dynamic languages for getting reliable results. Check out Semgrep!

Python3 remains fully supported in Whispers 2.

🛠️ Non-breaking changes 🛠️

🛠️ Rule specification in config.yml 🛠️

You can now specify rules that you want to use directly in config.yml as a list. In addition, custom rules can be added directly to that list using the following format:

exclude:
  files:
    - \.npmrc
    - .*coded.*
    - \.git/.*
  keys:
    - SECRET_VALUE_KEY
  values:
    - SECRET_VALUE_PLACEHOLDER

rules:
  - password
  - privatekey
  - id: starks
    message: Whispers from the North
    severity: CRITICAL
    value:
      regex: (Aria|Ned) Stark
      ignorecase: True

If you don't specify any, all built-in rules will be used be default. If you do, only those that you specify will be applicable.

🛠️ Severity specification in config.yml 🛠️

You can now specify severity levels that you want to use directly in config.yml as a list:

exclude:
  files:
    - \.npmrc
    - .*coded.*
    - \.git/.*
  keys:
    - SECRET_VALUE_KEY
  values:
    - SECRET_VALUE_PLACEHOLDER

severity:
  - BLOCKER
  - CRITICAL

If you don't specify any, all built-in severity levels will be used be default - BLOCKER, CRITICAL, MAJOR, MINOR, INFO.
If you do, only those that you specify will be applicable.

✅ New features ✅

No new features were introduced in this release. The primary objective of the present release was to optimize currently implemented logic in order to make it easier to read, understand, and work with in general. This refactoring, along with the aforementioned breaking changes, have shown to increase scanning speed of up to 7-10 times (depending on conditions) in comparison with version 1. In addition, it allowed achieving 100% unit test coverage.

Complete list of arguments can be found in whispers -h along with documentation in README.md

ocrawford555

@adeptex awesome changes. A lot to process in here so I've left an initial review and may give it another pass later on. Are there any particular areas you would like reviewing in more detail?

I will also try running locally and seeing if there are any bugs I can find and then link back to the code. Let me know if you need any clarifications on my comments! 💯

ocrawford555 · 2021-09-21T08:29:32Z

whispers/core/args.py

+    args_parser.add_argument("-i", "--info", action="store_true", help="show extended help and exit")
+    args_parser.add_argument("-c", "--config", help="config file")
+    args_parser.add_argument("-o", "--output", help="output file")
+    args_parser.add_argument("-e", "--exitcode", default=0, type=int, help="exit code on success")


Does the exitcode need to be configurable? Should it not always be 0 on success?

This feature was requested in #61

It appears that when chaining/piping scripts in Unix-type shells, it is possible that a wrong exit code gets propagated down the pipeline if one of the scripts fails. So having this configurable helps ensure that the step completed successfully with the expected exit code. For instance: https://unix.stackexchange.com/questions/584990/why-is-exit-code-0-even-though-the-command-is-wrong

Huh, interesting.

whispers/core/args.py

whispers/core/config.py

ocrawford555 · 2021-09-21T10:37:09Z

whispers/core/pairs.py

+
+    logging.debug(f"Loaded plugin '{plugin}' for file '{file}'")
+
+    pairs = plugin().pairs(file)


All the plugins could inherit from a common base class, which might help with some of the Python typing and make extensibility even easier in the future.

For example, creating a BasePlugin class which has an abstract method pairs, alongside other common properties/functions. 😄

Initially that was the idea. However, in practice not all plugins have a generic way of parsing files. The pairs method is the most generic common thing that unites all plugins - calling that method gives you a list of KeyValuePair objects.

Ok, maybe one for a future PR / contribution 😉

To summarize current cases:

Plugin parses file line by line with enumerate, so pair.line can be directly set by tracking lineno. For example - Dockerfile, .ini, .npmrc

whispers/whispers/plugins/npmrc.py

Lines 9 to 16 in 9f5f6da

for lineno, line in enumerate(filepath.open(), 1):

if ":_authToken=" not in line:

continue

value = line.split(":_authToken=")[-1].strip()

if value:

key = "npm authToken"

yield KeyValuePair(key, value, keypath=[key], line=lineno)

Plugin parses file as a dictionary, so pair.line has to be found later by reading the file as plaintext, if the pair is identified as a secret. For example - JSON, XML, YML

whispers/whispers/core/utils.py

Lines 192 to 216 in 9f5f6da

def find_line_number(pair: KeyValuePair) -> int:

"""Finds line number using pair keypath and value"""

if pair.line:

return pair.line # Already set

valuepath = pair.value.split("\n")[0][:16]

findpath = [*pair.keypath, valuepath]

foundline = 0

for lineno, line in enumerate(Path(pair.file).open(), 1):

founditems = 0

for item in findpath:

if item not in line:

break

founditems += 1

foundline = lineno

findpath = findpath[founditems:]

if not findpath:

return foundline

return 0

Plugin parses file as AST, so pair.line can be directly set from available AST node properties. For example - Python

whispers/whispers/plugins/python.py

Lines 122 to 126 in 9f5f6da

elif isinstance(node, astroid.nodes.Keyword):

key = self.node_to_str(node)

value = self.node_to_str(node.value)

if key and value and isinstance(value, (str, int)):

yield KeyValuePair(key, value, keypath=[key], line=node.lineno)

whispers/core/scope.py

whispers/plugins/python.py

whispers/plugins/shell.py

tests/unit/test_pairs.py

whispers/core/rules.py

ocrawford555 · 2021-09-21T11:07:18Z

whispers/core/utils.py

+    if pair.line:
+        return pair.line  # Already set
+
+    valuepath = pair.value.split("\n")[0][:16]


Why 16 here? 😄 I'm sure there's good reason, but maybe an inline comment will help

This is for specific cases like the following

whispers/tests/fixtures/privatekeys.yml

Lines 6 to 10 in 9f5f6da

key: "\

-----BEGIN RSA KEY----- \

QyNTUxOQAAACCtrF27B/zd9DEpd38IbVBy93wSeYXKU0AGXMyO8ePu2QAAAKBSzpYEUs6W \

-----END RSA KEY-----\

"

When this is loaded by whispers, key key has the value of:

-----BEGIN RSA KEY----- \nQyNTUxOQAAACCtrF27B/zd9DEpd38IbVBy93wSeYXKU0AGXMyO8ePu2QAAAKBSzpYEUs6W \n-----END RSA KEY-----

However, if you were to look for that string in that file, you would not find it, because in plaintext its multi-line is formatted differently.

In order to find the line number where this value begins correctly, the operation is to first split the value to get -----BEGIN RSA KEY----- , and then only take up to the first 16 chars for finding the string in a line of plaintext.

Co-authored-by: Oliver Crawford <16978487+ocrawford555@users.noreply.github.com>

Makefile

oscarbc96 · 2021-09-21T07:36:56Z

setup.py

 from setuptools import find_packages, setup


 def get_version():
    return import_module("whispers.__version__").__version__


-install_requires = ["luhn>=0.2.0", "lxml>=4.6.2", "pyyaml>=5.3.1", "astroid>=2.4.2", "jproperties>=2.1.0", "python-levenshtein>=0.12.0", "beautifulsoup4>=4.9.3"]
+install_requires = ["dataclasses", "luhn", "lxml", "pyyaml", "astroid", "jproperties", "python-levenshtein", "beautifulsoup4"]


Why versions are not restricted like dev dependencies? 😅

This decision was based on the following: https://caremad.io/posts/2013/07/setup-vs-requirement/

In addition, dependabot only bumps requirements.txt, while setup.py remains outdated. Updating versions in setup.py requires manual review. Unless there is an alternative way of keeping dependencies updated that I am missing, this seems to be the most reasonable solution.

The issue is the way we use setup.py. dependabot/dependabot-core#1475
So we don't use dependabot or we change the way we manage dependencies.

However, I would push for having pinned dependencies on setup.py

Makefile

oscarbc96 · 2021-10-19T10:08:44Z

whispers/core/config.py

+    """Ensure minimal expected config structure"""
+    try:
+        config["include"] = config.get("include", {"files": ["**/*"]})
+        config["include"]["files"] = config["include"].get("files", ["**/*"])
+
+        config["exclude"] = config.get("exclude", {"files": None, "keys": None, "values": None})
+        config["exclude"]["files"] = config["exclude"].get("files", None)
+        config["exclude"]["keys"] = config["exclude"].get("keys", None)
+        config["exclude"]["values"] = config["exclude"].get("values", None)
+
+        for idx in ["files", "keys", "values"]:
+            if not config["exclude"][idx]:
+                continue
+
+            # Create a single regex statement and compile it for efficient matching
+            unified = "|".join(config["exclude"][idx])
+            config["exclude"][idx] = re.compile(unified)
+
+        config["severity"] = config.get("severity", DEFAULT_SEVERITY)
+        config["rules"] = config.get("rules", list_rule_ids(default_rules()))


probably for a future release, this could be changed to a dataclass or pydantic model where schema and defaults are defined in the model.

That's a good one :)

whispers/core/pairs.py

whispers/core/rules.py

oscarbc96 · 2021-10-19T14:10:07Z

whispers/core/utils.py

+    """Checks if given data is printable text"""
+    if isinstance(data, bytes):
+        try:
+            data = data.decode("utf-8")
+        except Exception:
+            return False
+
+    if not isinstance(data, (str, int)):
+        return False
+
+    for ch in str(data):
+        if ch not in string.printable:
+            return False
+
+    return True
+
+
+def is_base64(data: str) -> bool:
+    """Checks if given data is base64-decodable to text"""
+    if not isinstance(data, str):
+        return False
+
+    try:
+        b64decode(data).decode("utf-8")
+        return True
+
+    except Exception:
+        return False
+
+
+def is_base64_bytes(data: str) -> bool:
+    """Checks if given data is base64-decodable to bytes"""
+    try:
+        return b64decode(data) != b""
+
+    except Exception:
+        return False


Capturing bare exceptions might be dangerous, we might be hiding errors. Can we just capture the ones we expect on these cases ex binascii.Error?

There are too many exceptions that could happen here. It could be wrongly guessing something as base64 and trying to decode it, or decode something that is usually base64 but not that time.

All errors are automatically logged into whispers.log for later analysis.

whispers/core/utils.py

adeptex added 2 commits September 20, 2021 17:24

Draft

60f9c6b

Python 3.6 support

6f327a5

ocrawford555 reviewed Sep 21, 2021

View reviewed changes

adeptex and others added 9 commits September 21, 2021 20:43

Add common config 'cnf' extension

0cf27e2

Remove print

737f39a

Standardize line enumeration, typing, and logging

f4a08bb

Improve style

3c16f41

Improve documentation

b1c56d1

Update whispers/core/args.py

674d595

Co-authored-by: Oliver Crawford <16978487+ocrawford555@users.noreply.github.com>

Remove YAML resolvers for on/off/yes/no

e2d3b1d

Optimize default exclusions config

415f9c9

Update README

9f5f6da

adeptex marked this pull request as draft September 22, 2021 15:10

oscarbc96 reviewed Sep 22, 2021

View reviewed changes

adeptex added 2 commits September 22, 2021 20:20

Add return type

665c2ca

Configure pip-compile freeze and upgrade

3911d9b

adeptex closed this Sep 23, 2021

adeptex deleted the version-2 branch September 23, 2021 15:31

adeptex restored the version-2 branch October 14, 2021 16:42

adeptex reopened this Oct 14, 2021

adeptex and others added 9 commits October 14, 2021 18:45

Bump requirements

3f4dd56

Fix lint step

ea3aafa

Update release workflow

fa01c57

Update description

d0cf00e

Update workflows

8a21905

Update

ea1821f

Update workflows

1ba618a

Bump requirements

2bacd60

Merge branch 'master' into version-2

549977a

adeptex marked this pull request as ready for review October 18, 2021 10:42

adeptex added 2 commits October 18, 2021 16:50

Python 3.10 coverage

e119198

No Python 3.10 coverage yet

752487d

oscarbc96 reviewed Oct 19, 2021

View reviewed changes

adeptex added 15 commits October 19, 2021 22:06

Remove explicit PyPI URL

3f52afc

Explicit Optional return type

275db5b

Add comment

dc87ec9

Compile utils regex

ae72f50

Explicit PyPI URL

47d9c70

Remove explicit PyPI URL

8200158

Fixed Python versions

4a1c61e

Use pip as module

c62ff25

Try PyPy-3.x

699db2d

Try PyPy-3.x

edd5976

Ensure pip

5fb9b8a

Ensure pip

8127904

Revert pip

087af90

Trigger build

5922607

Freeze upgrade

bb669aa

adeptex closed this Oct 20, 2021

adeptex reopened this Oct 20, 2021

adeptex added 9 commits October 20, 2021 21:42

Install pip explicitly

e49bbb1

Install pip explicitly

aa35b79

Install pip explicitly

2574476

Install pip explicitly

9ced98b

Try with sudo

c4069a8

Add sudo

60e7e1b

Add py39 extension

76d0dbf

Add sensitive files

3e7d8c3

Fix file detection bug

8c6b3f5

adeptex closed this Apr 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Version 2.0.0 #84

Version 2.0.0 #84

adeptex commented Sep 20, 2021 •

edited

ocrawford555 left a comment

ocrawford555 Sep 21, 2021

adeptex Sep 21, 2021

ocrawford555 Sep 22, 2021

ocrawford555 Sep 21, 2021

adeptex Sep 21, 2021

ocrawford555 Sep 22, 2021

adeptex Sep 22, 2021

ocrawford555 Sep 21, 2021

adeptex Sep 22, 2021 •

edited

oscarbc96 Sep 21, 2021

adeptex Sep 22, 2021

oscarbc96 Oct 19, 2021

oscarbc96 Oct 19, 2021

adeptex Oct 19, 2021

oscarbc96 Oct 19, 2021

adeptex Oct 19, 2021


		logging.debug(f"Loaded plugin '{plugin}' for file '{file}'")

		pairs = plugin().pairs(file)

	for lineno, line in enumerate(filepath.open(), 1):
	if ":_authToken=" not in line:
	continue

	value = line.split(":_authToken=")[-1].strip()
	if value:
	key = "npm authToken"
	yield KeyValuePair(key, value, keypath=[key], line=lineno)

	def find_line_number(pair: KeyValuePair) -> int:
	"""Finds line number using pair keypath and value"""
	if pair.line:
	return pair.line # Already set

	valuepath = pair.value.split("\n")[0][:16]
	findpath = [*pair.keypath, valuepath]
	foundline = 0

	for lineno, line in enumerate(Path(pair.file).open(), 1):
	founditems = 0

	for item in findpath:
	if item not in line:
	break

	founditems += 1
	foundline = lineno

	findpath = findpath[founditems:]

	if not findpath:
	return foundline

	return 0

	elif isinstance(node, astroid.nodes.Keyword):
	key = self.node_to_str(node)
	value = self.node_to_str(node.value)
	if key and value and isinstance(value, (str, int)):
	yield KeyValuePair(key, value, keypath=[key], line=node.lineno)

	key: "\
	-----BEGIN RSA KEY----- \
	QyNTUxOQAAACCtrF27B/zd9DEpd38IbVBy93wSeYXKU0AGXMyO8ePu2QAAAKBSzpYEUs6W \
	-----END RSA KEY-----\
	"

Version 2.0.0 #84

Version 2.0.0 #84

Conversation

adeptex commented Sep 20, 2021 • edited

Release notes

❌ Breaking changes ❌

❌ File exclusion globs are now regex ❌

❌ Rule specification format changes ❌

❌ Removed support for dynamic languages ❌

🛠️ Non-breaking changes 🛠️

🛠️ Rule specification in config.yml 🛠️

🛠️ Severity specification in config.yml 🛠️

✅ New features ✅

ocrawford555 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adeptex Sep 22, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adeptex commented Sep 20, 2021 •

edited

adeptex Sep 22, 2021 •

edited