Skip to content

Commit

Permalink
Cleanup some leftovers (#115)
Browse files Browse the repository at this point in the history
Use JSONPath from the RFC 9535, improve test coverage for prefix / negative matches
Replace gradle.properties with conditional version
Make sure shuffle always uses the same seed to select random JSONPath
Use RFC <number> without dashes
  • Loading branch information
gavlyukovskiy committed Apr 2, 2024
1 parent 97cae59 commit 03d5744
Show file tree
Hide file tree
Showing 15 changed files with 137 additions and 59 deletions.
24 changes: 12 additions & 12 deletions README.md
Expand Up @@ -39,11 +39,11 @@ Finally, no additional third-party runtime dependencies are required to use this
Java object it was serialized from
* Target key **case sensitivity configuration** (default: `false`)
* Use **block-list** (`maskKeys`) or **allow-list** (`allowKeys`) for masking
* Limited support for JsonPATH masking in both **block-list** (`maskJsonPaths`) and **allow-list** (`allowJsonPaths`)
* Limited support for JSONPath masking in both **block-list** (`maskJsonPaths`) and **allow-list** (`allowJsonPaths`)
modes
* Masking a valid JSON will always return a valid JSON

Note: Since [RFC-8259](https://datatracker.ietf.org/doc/html/rfc8259) dictates that JSON exchanges between systems that
Note: Since [RFC 8259](https://datatracker.ietf.org/doc/html/rfc8259) dictates that JSON exchanges between systems that
are not part of an enclosed system MUST be encoded using UTF-8, the `json-masker` only supports UTF-8 encoding.

## JDK Compatibility
Expand All @@ -66,7 +66,7 @@ var jsonMasker = JsonMasker.getMasker(
.build()
);

// block-mode, JsonPATH
// block-mode, JSONPath
var jsonMasker = JsonMasker.getMasker(
JsonMaskingConfig.builder()
.maskJsonPaths(Set.of("$.email", "$.nested.iban", "$.organization.*.name"))
Expand All @@ -80,20 +80,20 @@ var jsonMasker = JsonMasker.getMasker(
.build()
);

// allow-mode, JsonPATH
// allow-mode, JSONPath
var jsonMasker = JsonMasker.getMasker(
JsonMaskingConfig.builder()
.allowJsonPaths(Set.of("$.id", "$.clients.*.phone", "$.nested.name"))
.build()
);
```

Using `JsonMaskingConfig` allows customizing the masking behaviour of types, keys or JsonPATH or mix keys and JSON
Using `JsonMaskingConfig` allows customizing the masking behaviour of types, keys or JSONPath or mix keys and JSON
paths.

> [!NOTE]
> Whenever a simple key (`maskKeys(Set.of("email", "iban"))`) is specified, it is going to be masked recursively
> regardless of the nesting, whereas using a JsonPATH (`maskJsonPaths(Set.of("$.email", "$.iban"))`) would only
> regardless of the nesting, whereas using a JSONPath (`maskJsonPaths(Set.of("$.email", "$.iban"))`) would only
> mask those keys on the top level JSON
After creating the `JsonMasker` instance, it can be used to mask a JSON as following:
Expand Down Expand Up @@ -308,11 +308,11 @@ String maskedJson = jsonMasker.mask(json);
}
```

### Masking with JsonPATH
### Masking with JSONPath

To have more control over the nesting, JsonPATH can be used to specify the keys that needs to be masked (allowed).
To have more control over the nesting, JSONPath can be used to specify the keys that needs to be masked (allowed).

The following JsonPATH features are not supported:
The following JSONPath features are not supported:

* Descendant segments.
* Child segments.
Expand All @@ -326,8 +326,8 @@ The following JsonPATH features are not supported:
The library also imposes a number of additional restrictions:

* Numbers as key names are disallowed.
* JsonPATH keys must not be ambiguous. For example, `$.a.b` and `$.*.b` combination is disallowed.
* JsonPATH must not end with a single leading wildcard. Use `$.a` instead of `$.a.*`.
* JSONPath keys must not be ambiguous. For example, `$.a.b` and `$.*.b` combination is disallowed.
* JSONPath must not end with a single leading wildcard. Use `$.a` instead of `$.a.*`.

#### Usage

Expand Down Expand Up @@ -590,7 +590,7 @@ String maskedJson = jsonMasker.mask(json);
> When defining a config for the specific key and value of that key is an `object` or an `array`, the config will apply
> recursively to all nested keys and values, unless the nested key(s) defines its own masking configuration.
>
> If config is attached to a JsonPATH it has a precedence over a regular key.
> If config is attached to a JSONPath it has a precedence over a regular key.
#### Input

Expand Down
28 changes: 14 additions & 14 deletions adr/0003-jsonpath-support.md
Expand Up @@ -33,39 +33,39 @@ possible with regular key masking.

## Decisions

### JsonPATH
### JSONPath

The solution to the described problem is to provide a way to disambiguate key/value pairs by selecting the target pair
using its location in JSON.
The industry standard for selecting values in JSON is JsonPATH. Most developers are expected to know how to create basic
JsonPATH queries using either bracket or dot notation.
Therefore, we decided to solve the key ambiguity problem using JsonPATH.
The industry standard for selecting values in JSON is JSONPath. Most developers are expected to know how to create basic
JSONPath queries using either bracket or dot notation.
Therefore, we decided to solve the key ambiguity problem using JSONPath.

### Supported features

The [JsonPATH RFC 9535](https://www.rfc-editor.org/rfc/rfc9535.html) specifies a wide variety of features. Not all of them are required to solve the described problem.
The [JSONPath RFC 9535](https://www.rfc-editor.org/rfc/rfc9535.html) specifies a wide variety of features. Not all of them are required to solve the described problem.
Therefore, we have to decide which features are necessary and which are not.

The decision is premised on the following expectations:

1. Most of the clients will not face the key ambiguity problem, therefore they will not use JsonPATH.
1. Most of the clients will not face the key ambiguity problem, therefore they will not use JSONPath.
2. Correctness, performance and maintainability of the library takes precedent over the number of features.
3. JsonPATH support is required only for solving the key ambiguity problem.
3. JSONPath support is required only for solving the key ambiguity problem.

Any implemented JsonPATH feature adds some performance overhead both on configuration and on masking time, even if it is
Any implemented JSONPath feature adds some performance overhead both on configuration and on masking time, even if it is
not used.
In general, it also increases the likelihood of incorrect behaviour as the amount of code increases. So having
unnecessary features violates the premises.
Therefore, we decided to keep support for JsonPATH as limited as possible and extend it only when someone specifically
Therefore, we decided to keep support for JSONPath as limited as possible and extend it only when someone specifically
requests it.

The following features are necessary to support:

1. **Support for bracket, dot and mixed notations**.
We expect developers to know either of these notations, but we cannot know which one exactly. Also, the JsonPATH
We expect developers to know either of these notations, but we cannot know which one exactly. Also, the JSONPath
could originate from code which uses either notation.
2. **Support for name selectors**.
This is the main building block of JsonPATH queries.
This is the main building block of JSONPath queries.
3. **Support for wildcard segments and selectors**.
Wildcard selectors are necessary for traversing array values.

Expand All @@ -82,11 +82,11 @@ The following features are necessary to support:

### Performance considerations.

Supporting JsonPATH makes the masking 25% slower according to the latest benchmarks.
We decided to disable JsonPATH in case no JsonPATH keys are supplied.
Supporting JSONPath makes the masking 25% slower according to the latest benchmarks.
We decided to disable JSONPath in case no JSONPath keys are supplied.

### Potential issues

Mixing keys and JsonPATH keys in the same trie opens up some (highly unlikely) issues:
Mixing keys and JSONPath keys in the same trie opens up some (highly unlikely) issues:

1. https://github.com/Breus/json-masker/issues/94
3 changes: 3 additions & 0 deletions build.gradle.kts
Expand Up @@ -12,6 +12,9 @@ plugins {
}

description = "High-performance JSON masker in Java with no runtime dependencies"
if (version == "unspecified") {
version = "0.1.0-SNAPSHOT"
}

group = "dev.blaauwendraad"

Expand Down
1 change: 0 additions & 1 deletion gradle.properties

This file was deleted.

Expand Up @@ -64,7 +64,7 @@ public static String randomJson(Set<String> targetKeys, String jsonSize, String
.setTargetKeys(targetKeys)
.setTargetKeyPercentage(targetKeyPercentage)
.setTargetJsonSizeBytes(BenchmarkUtils.parseSize(jsonSize))
.setRandomSeed(1285756302517652226L)
.setRandomSeed(RandomJsonGenerator.STATIC_RANDOM_SEED)
.createConfig();

return new RandomJsonGenerator(config).createRandomJsonNode().toString();
Expand Down
8 changes: 4 additions & 4 deletions src/main/java/dev/blaauwendraad/masker/json/JsonPathNode.java
@@ -1,14 +1,14 @@
package dev.blaauwendraad.masker.json;

/**
* A mutable reference to a sequence of bytes in <code>dev.blaauwendraad.masker.json.MaskingState#message</code>.
* It is used to represent json path nodes.
* A mutable reference to a sequence of bytes in {@link MaskingState#getMessage()}.
* It is used to represent JSONPath nodes.
* <p>
* There are two types of nodes:
* <ul>
* <li>{@link Node} - a reference to a node in a json path, where <code>offset</code> denotes the start index of a
* <li>{@link Node} - a reference to a node in a JSONPath, where <code>offset</code> denotes the start index of a
* segment in the message and <code>length</code> denotes the length of a segment in the message</li>
* <li>{@link Array} - a reference to an array in a json path. Only wildcard indexes are supported.</li>
* <li>{@link Array} - a reference to an array in a JSONPath. Only wildcard indexes are supported.</li>
* </ul>
*/
sealed interface JsonPathNode permits JsonPathNode.Array, JsonPathNode.Node {
Expand Down
Expand Up @@ -33,7 +33,7 @@ final class KeyContainsMasker implements JsonMasker {

/**
* Masks the values in the given input for all values having keys corresponding to any of the provided target keys.
* This implementation is optimized for multiple target keys. Since RFC-8259 dictates that JSON exchanges between
* This implementation is optimized for multiple target keys. Since RFC 8259 dictates that JSON exchanges between
* systems that are not part of an enclosed system MUST be encoded using UTF-8, this method assumes UTF-8 encoding.
*
* @param input the input message for which values might be masked
Expand All @@ -46,7 +46,7 @@ public byte[] mask(byte[] input) {

KeyMaskingConfig keyMaskingConfig = maskingConfig.isInAllowMode() ? maskingConfig.getDefaultConfig() : null;
if (maskingState.jsonPathEnabled()) {
// Check for "$" json path key.
// Check for "$" JSONPath key.
keyMaskingConfig = keyMatcher.getMaskConfigIfMatched(
maskingState.getMessage(),
-1,
Expand Down
6 changes: 3 additions & 3 deletions src/main/java/dev/blaauwendraad/masker/json/KeyMatcher.java
Expand Up @@ -136,7 +136,7 @@ private void insert(String word, boolean negativeMatch) {
public KeyMaskingConfig getMaskConfigIfMatched(byte[] bytes, int keyOffset, int keyLength, Iterator<? extends JsonPathNode> jsonPath) {
// first search by key
if (maskingConfig.isInMaskMode()) {
// check json path first, as it's more specific
// check JSONPath first, as it's more specific
TrieNode node = searchForJsonPathKeyNode(bytes, jsonPath);
// if found - mask with this config
// if not found - do not mask
Expand All @@ -151,7 +151,7 @@ public KeyMaskingConfig getMaskConfigIfMatched(byte[] bytes, int keyOffset, int
}
return null;
} else {
// check json path first, as it's more specific
// check JSONPath first, as it's more specific
TrieNode node = searchForJsonPathKeyNode(bytes, jsonPath);
// if found and is not negativeMatch - do not mask
// if found and is negative match - mask, but with a specific config
Expand Down Expand Up @@ -236,7 +236,7 @@ private TrieNode searchForJsonPathKeyNode(byte[] bytes, Iterator<? extends JsonP
// only wildcard indexes are supported
return null;
} else {
throw new IllegalStateException("Unknown json path segment reference type " + jsonPathSegmentReference.getClass());
throw new IllegalStateException("Unknown JSONPath segment reference type " + jsonPathSegmentReference.getClass());
}
}

Expand Down
4 changes: 2 additions & 2 deletions src/main/java/dev/blaauwendraad/masker/json/MaskingState.java
Expand Up @@ -21,7 +21,7 @@ final class MaskingState implements ValueMaskerContext {
private int replacementOperationsTotalDifference = 0;

/**
* Current json path is represented by a dequeue of segment references.
* Current JSONPath is represented by a dequeue of segment references.
*/
private final Deque<JsonPathNode> currentJsonPath;

Expand Down Expand Up @@ -177,7 +177,7 @@ void backtrackCurrentJsonPath() {
}

/**
* Returns the iterator over the json path component references from head to tail
* Returns the iterator over the JSONPath component references from head to tail
*/
Iterator<JsonPathNode> getCurrentJsonPath() {
if (currentJsonPath != null) {
Expand Down
Expand Up @@ -26,7 +26,7 @@ public final class JsonMaskingConfig {
*/
private final Set<String> targetKeys;
/**
* Specifies the set of JSON paths for which the string/number values should be masked.
* Specifies the set of JSONPaths for which the string/number values should be masked.
*/
private final Set<JsonPath> targetJsonPaths;
/**
Expand Down Expand Up @@ -183,8 +183,8 @@ private Builder maskKeys0(Set<String> keys, @CheckForNull KeyMaskingConfig confi

public Builder maskJsonPaths(Set<String> jsonPaths) {
if (targetKeyMode == TargetKeyMode.ALLOW) {
throw new IllegalArgumentException("Cannot mask json paths when in ALLOW mode, if you want to customize" +
" masking for specific json paths in ALLOW mode, use " +
throw new IllegalArgumentException("Cannot mask JSONPaths when in ALLOW mode, if you want to customize " +
"masking for specific JSONPaths in ALLOW mode, use " +
"maskJsonPaths(String jsonPath, KeyMaskingConfig config)");
}
return maskJsonPaths0(jsonPaths, null);
Expand All @@ -196,14 +196,14 @@ public Builder maskJsonPaths(Set<String> jsonPaths, KeyMaskingConfig config) {

private Builder maskJsonPaths0(Set<String> jsonPaths, @CheckForNull KeyMaskingConfig config) {
if (jsonPaths.isEmpty()) {
throw new IllegalArgumentException("At least one json path must be provided");
throw new IllegalArgumentException("At least one JSONPath must be provided");
}
for (String jsonPath : jsonPaths) {
JsonPath parsed = JSON_PATH_PARSER.parse(jsonPath);
if (targetJsonPaths.contains(parsed) || targetKeyConfigs.containsKey(parsed.toString())) {
throw new IllegalArgumentException("Duplicate json path '%s'".formatted(jsonPath));
throw new IllegalArgumentException("Duplicate JSONPath '%s'".formatted(jsonPath));
}
// in ALLOW mode this method can be used to set a specific masking config for a json path
// in ALLOW mode this method can be used to set a specific masking config for a JSONPath
if (targetKeyMode != TargetKeyMode.ALLOW) {
targetKeyMode = TargetKeyMode.MASK;
targetJsonPaths.add(parsed);
Expand Down Expand Up @@ -237,7 +237,7 @@ public Builder allowJsonPaths(Set<String> jsonPaths) {
for (String jsonPath : jsonPaths) {
JsonPath parsed = JSON_PATH_PARSER.parse(jsonPath);
if (targetJsonPaths.contains(parsed)) {
throw new IllegalArgumentException("Duplicate json path '%s'".formatted(jsonPath));
throw new IllegalArgumentException("Duplicate JSONPath '%s'".formatted(jsonPath));
}
targetJsonPaths.add(parsed);
}
Expand Down
Expand Up @@ -127,7 +127,7 @@ private boolean isNumber(String segment) {
}

/**
* Validates if the input set of json path queries is ambiguous. Throws {@code java.lang.IllegalArgumentException#IllegalArgumentException} if it is.
* Validates if the input set of JSONPath queries is ambiguous. Throws {@code java.lang.IllegalArgumentException#IllegalArgumentException} if it is.
* <p>
* The method does a lexical sort of input jsonpath queries, iterates over sorted values and checks if any local pair is ambiguous.
*
Expand Down

0 comments on commit 03d5744

Please sign in to comment.