Skip to content

Commit

Permalink
Merge fd7fdc3 into 6891909
Browse files Browse the repository at this point in the history
  • Loading branch information
cobaltine committed May 13, 2023
2 parents 6891909 + fd7fdc3 commit ee9f7a4
Show file tree
Hide file tree
Showing 12 changed files with 113 additions and 85 deletions.
58 changes: 30 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
# json-fingerprint

![](https://img.shields.io/github/license/cobaltine/json-fingerprint)
[![](https://img.shields.io/pypi/v/json-fingerprint)](https://pypi.org/project/json-fingerprint/)
![](https://img.shields.io/pypi/pyversions/json-fingerprint)
![](https://img.shields.io/github/actions/workflow/status/cobaltine/json-fingerprint/ci.yml?branch=main&label=build)
[![](https://img.shields.io/pypi/v/json-fingerprint)](https://pypi.org/project/json-fingerprint/)
![Code Climate maintainability](https://img.shields.io/codeclimate/maintainability/cobaltine/json-fingerprint)
[![Coverage Status](https://coveralls.io/repos/github/cobaltine/json-fingerprint/badge.svg?branch=main)](https://coveralls.io/github/cobaltine/json-fingerprint?branch=main)

Expand Down Expand Up @@ -36,9 +36,11 @@ A JSON fingerprint consists of three parts: the version of the underlying canoni

## v1 release checklist (jfpv1)

This is a list of high-level development and documentation tasks, which need to be completed prior to freezing the API for v1. Before v1, backwards-incompatible changes to the API are possible, although not likely from v0.10.0 onwards. Since the jfpv1 spec is work in progress, the fingerprints may not be fully comparable between different _0.y.z_ versions.
This is a list of high-level development and documentation tasks, which need to be completed prior to freezing the API for v1. Before v1, backward-incompatible changes to the API are possible. Since the jfpv1 spec is work in progress, the fingerprints may not be fully comparable between different _0.y.z_ versions.

**NB:** JSON fingerprints up until `v0.12.2` ignored empty objects and arrays as values. This behavior was changed in `v0.13.0` which means that JSON fingerprints created with earlier versions may produce different and incomparable hashes depending on the presence of empty objects or arrays.

- [ ] Formalized the jfpv1 specification
- [ ] Formalized and complete jfpv1 specification
- [x] JSON type support
- [x] Primitives and literals
- [x] Arrays
Expand Down Expand Up @@ -72,12 +74,12 @@ JSON fingerprints can be created with the `create()` function, which requires th
import json
import json_fingerprint

input_1 = json.dumps([3, 2, 1, [True, False], {'foo': 'bar'}])
input_2 = json.dumps([2, {'foo': 'bar'}, 1, [False, True], 3]) # Different order
fp_1 = json_fingerprint.create(input=input_1, hash_function='sha256', version=1)
fp_2 = json_fingerprint.create(input=input_2, hash_function='sha256', version=1)
print(f'Fingerpr. 1: {fp_1}')
print(f'Fingerpr. 2: {fp_2}')
input_1 = json.dumps([3, 2, 1, [True, False], {"foo": "bar"}])
input_2 = json.dumps([2, {"foo": "bar"}, 1, [False, True], 3]) # Different order
fp_1 = json_fingerprint.create(input=input_1, hash_function="sha256", version=1)
fp_2 = json_fingerprint.create(input=input_2, hash_function="sha256", version=1)
print(f"Fingerpr. 1: {fp_1}")
print(f"Fingerpr. 2: {fp_2}")
```
This will output two identical fingerprints regardless of the different order of the json elements:

Expand All @@ -96,11 +98,11 @@ JSON fingerprints can be decoded with the `decode()` convenience function. It re
```python
import json_fingerprint

fp = 'jfpv1$sha256$2ecb0c919fcb06024f55380134da3bbaac3879f98adce89a8871706fe50dda03'
fp = "jfpv1$sha256$2ecb0c919fcb06024f55380134da3bbaac3879f98adce89a8871706fe50dda03"
version, hash_function, hash = json_fingerprint.decode(fingerprint=fp)
print(f'Version (integer): {version}')
print(f'Hash function: {hash_function}')
print(f'Secure hash: {hash}')
print(f"Version (integer): {version}")
print(f"Hash function: {hash_function}")
print(f"Secure hash: {hash}")
```
This will output the individual elements that make up a fingerprint as follows:

Expand All @@ -119,13 +121,13 @@ The `match()` is another convenience function that matches JSON data against a f
import json
import json_fingerprint

input_1 = json.dumps([3, 2, 1, [True, False], {'foo': 'bar'}])
input_1 = json.dumps([3, 2, 1, [True, False], {"foo": "bar"}])
input_2 = json.dumps([3, 2, 1])
target_fp = 'jfpv1$sha256$2ecb0c919fcb06024f55380134da3bbaac3879f98adce89a8871706fe50dda03'
target_fp = "jfpv1$sha256$2ecb0c919fcb06024f55380134da3bbaac3879f98adce89a8871706fe50dda03"
match_1 = json_fingerprint.match(input=input_1, target_fingerprint=target_fp)
match_2 = json_fingerprint.match(input=input_2, target_fingerprint=target_fp)
print(f'Fingerprint matches with input_1: {match_1}')
print(f'Fingerprint matches with input_2: {match_2}')
print(f"Fingerprint matches with input_1: {match_1}")
print(f"Fingerprint matches with input_2: {match_2}")
```
This will output the following:
```
Expand All @@ -144,26 +146,26 @@ import json_fingerprint

# Produces SHA256: jfpv1$sha256$d119f4d8...b1710d9f
# Produces SHA384: jfpv1$sha384$9bca46fd...fd0e2e9c
input = json.dumps({'foo': 'bar'})
input = json.dumps({"foo": "bar"})
fingerprints = [
# SHA256 match
'jfpv1$sha256$d119f4d8b802091520162b78f57a995a9ecbc88b20573b0c7e474072b1710d9f',
"jfpv1$sha256$d119f4d8b802091520162b78f57a995a9ecbc88b20573b0c7e474072b1710d9f",
# SHA256 match (duplicate)
'jfpv1$sha256$d119f4d8b802091520162b78f57a995a9ecbc88b20573b0c7e474072b1710d9f',
"jfpv1$sha256$d119f4d8b802091520162b78f57a995a9ecbc88b20573b0c7e474072b1710d9f",
# SHA384 match
('jfpv1$sha384$9bca46fd7ef7aa2e16e68978b5eb5c294bd5b380780e81bcb1af97d4b339bca'
'f7f6a622b2f1a955eea2fadb8fd0e2e9c'),
("jfpv1$sha384$9bca46fd7ef7aa2e16e68978b5eb5c294bd5b380780e81bcb1af97d4b339bca"
"f7f6a622b2f1a955eea2fadb8fd0e2e9c"),
# SHA256, not a match
'jfpv1$sha256$73f7bb145f268c033ec22a0b74296cdbab1405415a3d64a1c79223aa9a9f7643',
"jfpv1$sha256$73f7bb145f268c033ec22a0b74296cdbab1405415a3d64a1c79223aa9a9f7643",
]
matches = json_fingerprint.find_matches(input=input, fingerprints=fingerprints)
# Print raw matches, which include 2 same SHA256 fingerprints
print(*(f'\nMatch: {match[0:30]}...' for match in matches))
print(*(f"\nMatch: {match[0:30]}..." for match in matches))
deduplicated_matches = json_fingerprint.find_matches(input=input,
fingerprints=fingerprints,
deduplicate=True)
# Print deduplicated matches
print(*(f'\nDeduplicated match: {match[0:30]}...' for match in deduplicated_matches))
print(*(f"\nDeduplicated match: {match[0:30]}..." for match in deduplicated_matches))
```
This will output the following results, first the list with a duplicate and the latter with deduplicated results:
```
Expand Down Expand Up @@ -194,7 +196,7 @@ In practice, the jfpv1 specification purposefully ignores the original order of
* All values in the compared datasets are identical
* The values exist in identical paths (arrays, object key-value pairs)

In the case of arrays, each array gets a unique hash identifier based on the data elements it holds. This way, each flattened value "knows" to which array it belongs to. This identifier is called a _sibling hash_ because its derived from each value and its neighboring values.
In the case of arrays, each array gets a unique hash identifier based on the data elements it holds. This way, each flattened value "knows" to which array it belongs to. This identifier is called a _sibling hash_ because it is derived from each array element's value as well as its neighboring values.

## Running tests

Expand All @@ -205,9 +207,9 @@ The entire internal test suite of json-fingerprint is included in its distributi
If all tests ran successfully, this will produce an output similar to the following:

```
..........................
..............................
----------------------------------------------------------------------
Ran 26 tests in 0.009s
Ran 30 tests in 0.007s
OK
```
2 changes: 1 addition & 1 deletion json_fingerprint/_create.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
from ._jfpv1 import _create_jfpv1_fingerprint
from ._utils import _load_json
from ._load_json import _load_json
from ._validators import (
_validate_hash_function,
_validate_input_type,
Expand Down
5 changes: 3 additions & 2 deletions json_fingerprint/_find_matches.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,11 @@ def _create_input_fingerprints(input: str, target_hashes: List[Dict]) -> List[st


def find_matches(input: str, fingerprints: List[str], deduplicate: bool = False) -> List[str]:
"""Match raw json str input to a list of fingerprints.
"""Match raw json string input to a list of fingerprints
Decodes the target fingerprints and creates a fingerprint from the input with identical parameters.
Creates a fingerprint from the input of each different JSON fingerprint type present in the fingerprint list."""
Creates a fingerprint from the input of each different JSON fingerprint type present in the fingerprint list.
"""
if deduplicate:
fingerprints = list(set(fingerprints))
target_hashes = _get_target_hashes(fingerprints=fingerprints)
Expand Down
File renamed without changes.
2 changes: 1 addition & 1 deletion json_fingerprint/_match.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@


def match(input: str, target_fingerprint: str) -> bool:
"""Match raw json str input to target fingerprint.
"""Match raw json string input to target fingerprint
Decodes the target fingerprint and creates a fingerprint from the input with identical parameters."""
version, hash_function, _ = decode(fingerprint=target_fingerprint)
Expand Down
6 changes: 3 additions & 3 deletions json_fingerprint/_validators.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@

def _validate_hash_function(hash_function: str, version: int):
if version == 1 and hash_function not in JFPV1_HASH_FUNCTIONS:
err = f"Expected one of supported hash functions '{JFPV1_HASH_FUNCTIONS}', " f"instead got '{hash_function}'"
err = f"Expected one of supported hash functions '{JFPV1_HASH_FUNCTIONS}', instead got '{hash_function}'"
raise FingerprintHashFunctionError(err)


Expand All @@ -34,7 +34,7 @@ def _validate_input_type(input: str):

def _validate_version(version: int):
if version not in JSON_FINGERPRINT_VERSIONS:
err = f"Expected one of supported JSON fingerprint versions '{JSON_FINGERPRINT_VERSIONS}', " f"instead got '{version}'"
err = f"Expected one of supported JSON fingerprint versions '{JSON_FINGERPRINT_VERSIONS}', instead got '{version}'"
raise FingerprintVersionError(err)


Expand All @@ -45,5 +45,5 @@ def _validate_fingerprint_format(fingerprint: str):
is_valid = True

if not is_valid:
err = "Expected JSON fingerprint in format '{fingerprint_version}${hash_function}${hex_digest}', instead got: " f"{fingerprint}"
err = "Expected JSON fingerprint in format '{fingerprint_version}${hash_function}${hex_digest}', " f"instead got: {fingerprint}"
raise FingerprintStringFormatError(err)
43 changes: 25 additions & 18 deletions json_fingerprint/tests/test_create.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,43 +11,48 @@

class TestCreate(unittest.TestCase):
def test_jfpv1_json_load_error(self):
"""Test json fingerprint raw json string load error.
"""Test json fingerprint raw json string load error
Verify that:
- FingerprintJSONLoadError is properly raised with malformed json input string"""
- FingerprintJSONLoadError is properly raised with malformed json input string
"""
with self.assertRaises(FingerprintJSONLoadError):
create('{"foo": bar}', hash_function="sha256", version=1)

def test_jfpv1_sha256_output_format(self):
"""Test jfpv1 output format.
"""Test jfpv1 output format
Verify that:
- Complete jfpv1-sha256 output fingerprint is properly formatted"""
- Complete jfpv1-sha256 output fingerprint is properly formatted
"""
fp = create(input='{"foo": "bar"}', hash_function="sha256", version=1)
self.assertRegex(fp, "^jfpv1\\$sha256\\$[0-9a-f]{64}$")

def test_jfpv1_sha384_output_format(self):
"""Test jfpv1 output format.
"""Test jfpv1 output format
Verify that:
- Complete jfpv1-sha256 output fingerprint is properly formatted"""
- Complete jfpv1-sha256 output fingerprint is properly formatted
"""
fp = create(input='{"foo": "bar"}', hash_function="sha384", version=1)
self.assertRegex(fp, "^jfpv1\\$sha384\\$[0-9a-f]{96}$")

def test_jfpv1_sha512_output_format(self):
"""Test jfpv1 output format.
"""Test jfpv1 output format
Verify that:
- Complete jfpv1-sha256 output fingerprint is properly formatted"""
- Complete jfpv1-sha256 output fingerprint is properly formatted
"""
fp = create(input='{"foo": "bar"}', hash_function="sha512", version=1)
self.assertRegex(fp, "^jfpv1\\$sha512\\$[0-9a-f]{128}$")

def test_jfpv1_sha256_mixed_order(self):
"""Test jfpv1 sha256 mixed order fingerprint match.
"""Test jfpv1 sha256 mixed order fingerprint match
Verify that:
- The fingerprints of test objects 1 and 2 match despite same data being ordered differently
- The fingerprints also match against a known valid fingerprint"""
- The fingerprints also match against a known valid fingerprint
"""
with open(os.path.join(TESTDATA_DIR, "jfpv1_test_obj_1.json"), "r") as file:
self.test_obj_1 = file.read()
file.close()
Expand All @@ -61,11 +66,12 @@ def test_jfpv1_sha256_mixed_order(self):
self.assertEqual(fp_1, "jfpv1$sha256$b182c755347a6884fd11f1194cbe0961f548e5ac62be78a56c48c3c05eb56650")

def test_jfpv1_sha256_structural_distinction_1(self):
"""Test jfpv1 json flattener's structural value distinction.
"""Test jfpv1 json flattener's structural value distinction
Verify that:
- Identical values at identical depths, but held in different data structures,
don't produce identical outputs"""
don't produce identical outputs
"""
obj_in_1 = [
1,
[1, [2, 2]],
Expand All @@ -82,10 +88,11 @@ def test_jfpv1_sha256_structural_distinction_1(self):
self.assertNotEqual(fp_1, fp_2)

def test_jfpv1_sha256_structural_distinction_2(self):
"""Test jfpv1 json flattener's structural value distinction.
"""Test jfpv1 json flattener's structural value distinction
Verify that:
- Values in identical data structure paths, but different sibling values, don't get matched"""
- Values in identical data structure paths, but different sibling values, don't get matched
"""
obj_in_1 = [
[1, ["x", "x"]],
[2, ["y", "y"]],
Expand All @@ -101,14 +108,14 @@ def test_jfpv1_sha256_structural_distinction_2(self):
self.assertNotEqual(fp_1, fp_2)

def test_jfpv1_empty_list_as_value(self):
"""Test jfpv1 json flattener's ability to handle empty lists as values.
"""Test jfpv1 json flattener's ability to handle empty lists as values
Versions up to 0.12.2 did not acknowledge empty lists as values.
Related issue: https://github.com/cobaltine/json-fingerprint/issues/33
Verify that:
- Empty lists (and, as such, underlying data structure paths) are not ignored by the json flattener"""

- Empty lists (and, as such, underlying data structure paths) are not ignored by the json flattener
"""
obj_in_1 = {"field1": "yes"}
fp_1 = create(input=json.dumps(obj_in_1), hash_function="sha256", version=1)

Expand All @@ -118,7 +125,7 @@ def test_jfpv1_empty_list_as_value(self):
self.assertNotEqual(fp_1, fp_2)

def test_jfpv1_empty_dict_as_value(self):
"""Test jfpv1 json flattener's ability to handle empty dicts as values.
"""Test jfpv1 json flattener's ability to handle empty dicts as values
Versions up to 0.12.2 did not acknowledge empty dicts as values.
Related issue: https://github.com/cobaltine/json-fingerprint/issues/33
Expand Down
3 changes: 2 additions & 1 deletion json_fingerprint/tests/test_decode.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,8 @@ def test_jfpv1_decode(self):
Verify that:
- Fingerprints of all jfpv1 SHA-2 variants are properly decoded
- Exception is properly raised with invalid fingerprint input"""
- Exception is properly raised with invalid fingerprint input
"""
input = json.dumps({"foo": "bar"})
jfpv1_sha256 = create(input=input, hash_function="sha256", version=1)
jfpv1_sha384 = create(input=input, hash_function="sha384", version=1)
Expand Down
9 changes: 6 additions & 3 deletions json_fingerprint/tests/test_find_matches.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,8 @@ def test_get_target_hashes(self):
"""Test target hash list creation.
Verify that:
- The returned element list contains only unique entries"""
- The returned element list contains only unique entries
"""
# Fingerprint list with duplicate entries
fingerprints = [
self.jfpv1_sha256,
Expand All @@ -39,7 +40,8 @@ def test_create_input_fingerprints(self):
"""Test list matching's fingerprint creation function.
Verify that:
- Fingerprints are correctly parsed from the given target hash elements"""
- Fingerprints are correctly parsed from the given target hash elements
"""
target_hashes = [
{"version": 1, "hash_function": "sha256"},
{"version": 1, "hash_function": "sha384"},
Expand All @@ -58,7 +60,8 @@ def test_jfpv1_find_matches(self):
Verify that:
- Fingerprints of all jfpv1 SHA-2 variants are properly matched in a fingerprint list
- Deduplication works for duplicate entries in fingerprint list
- Exceptions are properly raised with invalid fingerprints and input types"""
- Exceptions are properly raised with invalid fingerprints and input types
"""
input = json.dumps({"bar": "foo"})
chaff_jfpv1_sha256 = create(input=input, hash_function="sha256", version=1)

Expand Down
Loading

0 comments on commit ee9f7a4

Please sign in to comment.