Skip to content

Commit

Permalink
Add decode_fingerprint() and fingerprint_match() functions #minor
Browse files Browse the repository at this point in the history
Extended the functionality with two new functions and related tests. The
`decode_fingerprint()` function takes jfp's as input, validates them and
returns a 3-value tuple containing the version (int), hash function
(str) and hash (str).

The `fingerprint_match()` uses the `decode_fingerprint()` function
internally to decode a `target_fingerprint`, then create a fingerprint
with these value of the given JSON input data. If the fingerprint then
matches the given `target_fingerprint` value, the function will return
`True`, or otherwise `False` if they don't match.

Refactoring:
 - Split all exceptions into a dedicated `_exceptions` module
 - Moved JSON loading into a new utility module `_utils`

Other changes:
 - Updated README.md to reflect new functionality
  • Loading branch information
cobaltine committed Jan 4, 2021
1 parent d633c81 commit e05405c
Show file tree
Hide file tree
Showing 14 changed files with 277 additions and 45 deletions.
67 changes: 58 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ This is a list of high-level development and documentation tasks, which need to
- [x] SHA256
- [x] SHA384
- [x] SHA512
- [ ] Dynamic jfpv1 fingerprint comparison function (JSON string against a fingerprint)
- [x] Dynamic jfpv1 fingerprint comparison function (JSON string against a fingerprint)
- [x] Performance characteristics that scale sufficiently
- [ ] Extensive verification against potential fingerprint (hash) collisions

Expand All @@ -37,24 +37,73 @@ To install the json-fingerprint package, run `pip install json-fingerprint`.

## Examples

The example below shows how to create and compare json fingerprints.
The complete working examples below show how to create and compare JSON fingerprints.

### Creating fingerprints from JSON data

Fingerprints can be created with the `json_fingerprint()` function, which requires three arguments: input (valid JSON string), hash function (`sha256`, `sha384` and `sha512` are supported) and JSON fingerprint version (`1`).

```python
import json
import json_fingerprint as jfp

obj_1_str = json.dumps([3, 2, 1, {'foo': 'bar'}])
obj_2_str = json.dumps([2, {'foo': 'bar'}, 1, 3]) # Same data in different order
fp_1 = jfp.json_fingerprint(input=obj_1_str, hash_function='sha256', version=1)
fp_2 = jfp.json_fingerprint(input=obj_2_str, hash_function='sha256', version=1)
from json_fingerprint import json_fingerprint

obj_1_str = json.dumps([3, 2, 1, [True, False], {'foo': 'bar'}])
obj_2_str = json.dumps([2, {'foo': 'bar'}, 1, [False, True], 3]) # Same data in different order
fp_1 = json_fingerprint(input=obj_1_str, hash_function='sha256', version=1)
fp_2 = json_fingerprint(input=obj_2_str, hash_function='sha256', version=1)
print(f'Fingerprint 1: {fp_1}')
print(f'Fingerprint 2: {fp_2}')
```
This will output two identical fingerprints regardless of the different order of the json elements:

```
Fingerprint 1: jfpv1$sha256$f4a2c8bfb5a03da86bbb4e1639ca6b56f9fac6b04c5c7d9e3470afef46cefb4f
Fingerprint 2: jfpv1$sha256$f4a2c8bfb5a03da86bbb4e1639ca6b56f9fac6b04c5c7d9e3470afef46cefb4f
Fingerprint 1: jfpv1$sha256$164e2e93056b7a0e4ace25b3c9aed9cf061f9a23c48c3d88a655819ac452b83a
Fingerprint 2: jfpv1$sha256$164e2e93056b7a0e4ace25b3c9aed9cf061f9a23c48c3d88a655819ac452b83a
```

Since json objects with identical data content and structure will always produce identical fingerprints, the fingerprints can be used effectively for various purposes. These include finding duplicate json data from a larger dataset, json data cache validation/invalidation and data integrity checking.

### Decoding JSON fingerprints

JSON fingerprints can be decoded with the `decode_fingerprint()` convenience function, which returns the version, hash function and hash in a tuple.

```python
from json_fingerprint import decode_fingerprint

fingerprint = 'jfpv1$sha256$164e2e93056b7a0e4ace25b3c9aed9cf061f9a23c48c3d88a655819ac452b83a'
version, hash_function, hash = decode_fingerprint(fingerprint=fingerprint)
print(f'Version (integer): {version}')
print(f'Hash function: {hash_function}')
print(f'Hash: {hash}')
```
This will output the individual elements that make up a fingerprint as follows:

```
Version (integer): 1
Hash function: sha256
Hash: 164e2e93056b7a0e4ace25b3c9aed9cf061f9a23c48c3d88a655819ac452b83a
```

### Fingerprint matching

The `fingerprint_match()` is another convenience function that matches JSON data against a fingerprint, and returns either `True` or `False` depending on whether the data matches the fingerprint or not. Internally, it will automatically choose the correct version and hash function based on the `target_fingerprint` argument.

```python
import json

from json_fingerprint import fingerprint_match

input_1 = json.dumps([3, 2, 1, [True, False], {'foo': 'bar'}])
input_2 = json.dumps([3, 2, 1])
target_fingerprint = 'jfpv1$sha256$164e2e93056b7a0e4ace25b3c9aed9cf061f9a23c48c3d88a655819ac452b83a'
match_1 = fingerprint_match(input=input_1, target_fingerprint=target_fingerprint)
match_2 = fingerprint_match(input=input_2, target_fingerprint=target_fingerprint)
print(f'Fingerprint matches with input_1: {match_1}')
print(f'Fingerprint matches with input_2: {match_2}')
```
This will output the following:
```
Fingerprint matches with input_1: True
Fingerprint matches with input_2: False
```
2 changes: 2 additions & 0 deletions json_fingerprint/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1,3 @@
from .decode_fingerprint import decode_fingerprint
from .fingerprint_match import fingerprint_match
from .json_fingerprint import json_fingerprint
18 changes: 18 additions & 0 deletions json_fingerprint/_exceptions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
class FingerprintHashFunctionError(Exception):
pass


class FingerprintInputDataTypeError(Exception):
pass


class FingerprintVersionError(Exception):
pass


class FingerprintStringFormatError(Exception):
pass


class FingerprintJSONLoadError(Exception):
pass
11 changes: 11 additions & 0 deletions json_fingerprint/_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
import json

from ._exceptions import FingerprintJSONLoadError


def _load_json(data: str):
try:
return json.loads(data)
except Exception:
err = 'Unable to load JSON'
raise FingerprintJSONLoadError(err) from None
39 changes: 27 additions & 12 deletions json_fingerprint/_validators.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,16 @@
import re

from ._exceptions import (
FingerprintHashFunctionError,
FingerprintInputDataTypeError,
FingerprintStringFormatError,
FingerprintVersionError,
)

SHA256_JFP_REGEX_PATTERN = re.compile('^jfpv1\\$sha256\\$[0-9a-f]{64}$')
SHA384_JFP_REGEX_PATTERN = re.compile('^jfpv1\\$sha384\\$[0-9a-f]{96}$')
SHA512_JFP_REGEX_PATTERN = re.compile('^jfpv1\\$sha512\\$[0-9a-f]{128}$')

JFPV1_HASH_FUNCTIONS = (
'sha256',
'sha384',
Expand All @@ -9,18 +22,6 @@
)


class FingerprintHashFunctionError(Exception):
pass


class FingerprintInputDataTypeError(Exception):
pass


class FingerprintVersionError(Exception):
pass


def _validate_hash_function(hash_function: str, version: int):
if hash_function not in JFPV1_HASH_FUNCTIONS:
err = (f'Expected one of supported hash functions \'{JFPV1_HASH_FUNCTIONS}\', '
Expand All @@ -39,3 +40,17 @@ def _validate_version(version: int):
err = (f'Expected one of supported JSON fingerprint versions \'{JSON_FINGERPRINT_VERSIONS}\', '
f'instead got \'{version}\'')
raise FingerprintVersionError(err)


def _validate_fingerprint_format(fingerprint: str):
is_valid = False

if SHA256_JFP_REGEX_PATTERN.match(fingerprint) or \
SHA384_JFP_REGEX_PATTERN.match(fingerprint) or \
SHA512_JFP_REGEX_PATTERN.match(fingerprint):
is_valid = True

if not is_valid:
err = ('Expected JSON fingerprint in format \'{fingerprint_version}${hash_function}${hex_digest}\', instead got: '
f'{fingerprint}')
raise FingerprintStringFormatError(err)
13 changes: 13 additions & 0 deletions json_fingerprint/decode_fingerprint.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
from typing import Tuple
from ._validators import _validate_fingerprint_format


def decode_fingerprint(fingerprint: str) -> Tuple[int, str, str]:
"""Decode json fingerprints into version, hash function and hash values."""
_validate_fingerprint_format(fingerprint)
elements = fingerprint.split('$')
version = int(elements[0][4:])
hash_function = elements[1]
hash = elements[2]

return version, hash_function, hash
13 changes: 13 additions & 0 deletions json_fingerprint/fingerprint_match.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
from .decode_fingerprint import decode_fingerprint
from .json_fingerprint import json_fingerprint


def fingerprint_match(input: str, target_fingerprint: str) -> bool:
"""Match raw json str input to target fingerprint.
Decodes the target fingerprint and creates a fingerprint from the input with identical parameters."""
version, hash_function, _ = decode_fingerprint(fingerprint=target_fingerprint)
input_fingerprint = json_fingerprint(input=input, hash_function=hash_function, version=version)
if input_fingerprint == target_fingerprint:
return True
return False
15 changes: 2 additions & 13 deletions json_fingerprint/json_fingerprint.py
Original file line number Diff line number Diff line change
@@ -1,27 +1,16 @@
import json

from ._jfpv1 import _create_jfpv1_fingerprint
from ._utils import _load_json
from ._validators import (
_validate_hash_function,
_validate_input_type,
_validate_version,
)


class FingerprintJSONLoadError(Exception):
pass


def json_fingerprint(input: str, hash_function: str, version: int) -> str:
"""Create json fingerprints with the selected hash function and jfp version."""
_validate_version(version=version)
_validate_input_type(input=input)
_validate_hash_function(hash_function=hash_function, version=version)

try:
loaded = json.loads(input)
except Exception:
err = 'Unable to load JSON'
raise FingerprintJSONLoadError(err) from None

loaded = _load_json(data=input)
return _create_jfpv1_fingerprint(data=loaded, hash_function=hash_function, version=version)
4 changes: 3 additions & 1 deletion json_fingerprint/tests/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
from .test_json_fingerprint import TestJsonFingerprint
from .test_decode_fingerprint import TestDecodeFingerprint
from .test_fingerprint_match import TestFingerprintMatch
from .test_jfpv1 import TestJfpv1
from .test_json_fingerprint import TestJsonFingerprint
from .test_validators import TestValidators
43 changes: 43 additions & 0 deletions json_fingerprint/tests/test_decode_fingerprint.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
import json
import unittest

from json_fingerprint import (
_exceptions,
decode_fingerprint,
json_fingerprint,
)


class TestDecodeFingerprint(unittest.TestCase):
def test_decode_fingerprint(self):
"""Test json fingerprint decoder.
Verify that:
- Fingerprints of all jfpv1 SHA-2 variants are properly decoded
- Exception is properly raised with invalid fingerprint input"""
input = json.dumps({'foo': 'bar'})
jfpv1_sha256 = json_fingerprint(input=input, hash_function='sha256', version=1)
jfpv1_sha384 = json_fingerprint(input=input, hash_function='sha384', version=1)
jfpv1_sha512 = json_fingerprint(input=input, hash_function='sha512', version=1)

version, hash_function, hash = decode_fingerprint(fingerprint=jfpv1_sha256)
self.assertEqual(version, 1)
self.assertEqual(hash_function, 'sha256')
self.assertEqual(hash, jfpv1_sha256.split('$')[-1])

version, hash_function, hash = decode_fingerprint(fingerprint=jfpv1_sha384)
self.assertEqual(version, 1)
self.assertEqual(hash_function, 'sha384')
self.assertEqual(hash, jfpv1_sha384.split('$')[-1])

version, hash_function, hash = decode_fingerprint(fingerprint=jfpv1_sha512)
self.assertEqual(version, 1)
self.assertEqual(hash_function, 'sha512')
self.assertEqual(hash, jfpv1_sha512.split('$')[-1])

with self.assertRaises(_exceptions.FingerprintStringFormatError):
decode_fingerprint(fingerprint='invalid fingerprint')


if __name__ == '__main__':
unittest.main()
43 changes: 43 additions & 0 deletions json_fingerprint/tests/test_fingerprint_match.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
import json
import unittest

from json_fingerprint import (
_exceptions,
decode_fingerprint,
json_fingerprint,
)


class TestFingerprintMatch(unittest.TestCase):
def test_jfpv1_fingerprint_match(self):
"""Test json fingerprint matcher.
Verify that:
- Fingerprints of all jfpv1 SHA-2 variants are properly matched
- Exception is properly raised with invalid fingerprints and input types"""
input = json.dumps({'foo': 'bar'})
jfpv1_sha256 = json_fingerprint(input=input, hash_function='sha256', version=1)
jfpv1_sha384 = json_fingerprint(input=input, hash_function='sha384', version=1)
jfpv1_sha512 = json_fingerprint(input=input, hash_function='sha512', version=1)

version, hash_function, hash = decode_fingerprint(fingerprint=jfpv1_sha256)
self.assertEqual(version, 1)
self.assertEqual(hash_function, 'sha256')
self.assertEqual(hash, jfpv1_sha256.split('$')[-1])

version, hash_function, hash = decode_fingerprint(fingerprint=jfpv1_sha384)
self.assertEqual(version, 1)
self.assertEqual(hash_function, 'sha384')
self.assertEqual(hash, jfpv1_sha384.split('$')[-1])

version, hash_function, hash = decode_fingerprint(fingerprint=jfpv1_sha512)
self.assertEqual(version, 1)
self.assertEqual(hash_function, 'sha512')
self.assertEqual(hash, jfpv1_sha512.split('$')[-1])

with self.assertRaises(_exceptions.FingerprintStringFormatError):
decode_fingerprint(fingerprint='invalid fingerprint')


if __name__ == '__main__':
unittest.main()
3 changes: 2 additions & 1 deletion json_fingerprint/tests/test_jfpv1.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
import hashlib
import json
from json_fingerprint import _jfpv1
import unittest

from json_fingerprint import _jfpv1


class TestJfpv1(unittest.TestCase):
def test_create_json_hash(self):
Expand Down
2 changes: 1 addition & 1 deletion json_fingerprint/tests/test_json_fingerprint.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import unittest

from json_fingerprint import json_fingerprint
from json_fingerprint.json_fingerprint import FingerprintJSONLoadError
from json_fingerprint._exceptions import FingerprintJSONLoadError

TESTS_DIR = os.path.dirname(__file__)
TESTDATA_DIR = os.path.join(TESTS_DIR, 'testdata')
Expand Down
Loading

0 comments on commit e05405c

Please sign in to comment.