Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update complexity controls. #79

Merged
merged 15 commits into from
Sep 19, 2023
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,13 @@
- Add `signal` option to allow use of an `AbortSignal` for complexity control.
Enables the algorithm to abort after a timeout, manual abort, or other
condition.
- Add `maxWorkFactor` to calculate a deep iteration limit based on the number
of non-unique blank nodes. This defaults to `1` for roughly O(n) behavior and
will handle common graphs. It must be adjusted to higher values if there is a
need to process graphs with complex blank nodes or other "poison" graphs. It
is recommeded to use this paramter instead of `maxDeepIterations` directly.
davidlehn marked this conversation as resolved.
Show resolved Hide resolved
- **BREAKING**: Check output `format` parameter. Must be omitted, falsey, or
"application/n-quads".

### Changed
- **BREAKING**: Change algorithm name from "URDNA2015" to "RDFC-1.0" to match
Expand Down Expand Up @@ -63,6 +70,7 @@
- Use a pre-computed map of replacement values.
- Performance difference depends on the number of replacements. The
[rdf-canon][] escaping test showed up to 15% improvement.
- Support generalized RDF `BlankNode` predicate during N-Quads serialization.

### Fixed
- Disable native lib tests in a browser.
Expand Down
71 changes: 49 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,18 +57,19 @@ Examples
--------

```js
const dataset = {
// ...
};
// canonize a dataset with the default algorithm

// canonize a data set with a particular algorithm with async/await
const dataset = [
// ...
];
const canonical = await canonize.canonize(dataset, {algorithm: 'RDFC-1.0'});

// canonize a data set with a particular algorithm and force use of the
// native implementation
const canonical = await canonize.canonize(dataset, {
// parse and canonize N-Quads with the default algorithm

const nquads = "...";
const canonical = await canonize.canonize(nquads, {
algorithm: 'RDFC-1.0',
useNative: true
inputFormat: 'application/n-quads'
});
```

Expand All @@ -95,7 +96,7 @@ URDNA2015 Migration

* The deprecated "URDNA2015" algorithm name is currently supported as an alias
for "RDFC-1.0".
* There is a minor difference that could cause compatibilty issues. It is
* There is a minor difference that could cause compatibility issues. It is
considered an edge case that will not be an issue in practice. See above for
details.
* Two tools are currently provided to help transition to "RDFC-1.0":
Expand All @@ -111,13 +112,17 @@ Complexity Control

Inputs may vary in complexity and some inputs may use more computational
resources than desired. There also exists a class of inputs that are sometimes
referred to as "poison" graphs. These are designed specifically to be difficult
to process but often do not provide any useful purpose.
referred to as "poison" graphs. These are structured or designed specifically
to be difficult to process but often do not provide any useful purpose.

### Signals

The `canonize` API accepts an
[`AbortSignal`](https://developer.mozilla.org/en-US/docs/Web/API/AbortSignal)
that can be used to control processing of computationally difficult inputs. It
can be used in a number of ways:
as the `signal` parameter that can be used to control processing of
computationally difficult inputs. `signal` is not set by default. It can be
used in a number of ways:

- Abort processing manually with
[`AbortController.abort()`](https://developer.mozilla.org/en-US/docs/Web/API/AbortController/abort)
- Abort processing after a timeout with
Expand All @@ -131,15 +136,37 @@ can be used in a number of ways:
For performance reasons this signal is only checked periodically during
processing and is not immediate.

The `canonize` API also has a `maxDeepIterations` option to control how many
times deep comparison algorithms run before throwing an error. This provides
additional control over input complexity as this limit is generally very low
(no more than 1 or 2) for common use case graphs.
### Limits

The `canonize` API has parameters to limit how many times the blank node deep
comparison algorithm can be run to assign blank node labels before throwing an
error. It is designed to control exponential growth related to the number of
blank nodes. Graphs without blank nodes, and those with simple blank nodes will
not run the algorithms that use this parameter. Those with more complex deeply
connected blank nodes can result in significant time complexity which these
parameters can control.

The `canonize` API has the following parameters to control limits:

- `maxWorkFactor`: Used to calculate a maximum number of deep iterations based
on the number of non-unique blank nodes.
- `0`: Deep inspection disallowed.
- `1`: Limit deep iterations to O(n). (default)
- `2`: Limit deep iterations to O(n^2).
- `3`: Limit deep iterations to O(n^3). Values at this level or higher will
allow processing of complex "poison" graphs but may take significant
amounts of computational resources.
- `Infinity`: No limitation.
- `maxDeepIterations`: The exact number of deep iterations. This parameter is
for specialized use cases and use of `maxWorkFactor` is recommended. Defaults
to `Infinity` and any other value will override `maxWorkFactor`.

### Usage

In practice, callers must balance system load, concurrent processing, expected
input size and complexity, and other factors to determine which complexity
controls to use. This library defaults to infinite `maxDeepIterations` and a 1
second timeout, and these should be adjusted as needed.
controls to use. This library defaults to a `maxWorkFactor` of `1` and no
timeout signal. These should be adjusted as needed.
davidlehn marked this conversation as resolved.
Show resolved Hide resolved

Related Modules
---------------
Expand All @@ -156,9 +183,9 @@ The test suite is included in an external repository:

https://github.com/w3c/rdf-canon

This should be a sibling directory of the rdf-canonize directory or in a
`test-suites` dir. To clone shallow copies into the `test-suites` dir you can
use the following:
This should be a sibling directory of the `rdf-canonize` directory or in a
`test-suites` directory. To clone shallow copies into the `test-suites`
directory you can use the following:

npm run fetch-test-suite

Expand Down
8 changes: 6 additions & 2 deletions lib/NQuads.js
Original file line number Diff line number Diff line change
Expand Up @@ -240,8 +240,12 @@ module.exports = class NQuads {
nquad += `_:${s.value}`;
}

// predicate can only be NamedNode
nquad += ` <${_iriEscape(p.value)}> `;
// predicate normally a NamedNode, can be a BlankNode in generalized RDF
if(p.termType === TYPE_NAMED_NODE) {
nquad += ` <${_iriEscape(p.value)}> `;
} else {
nquad += ` _:${p.value} `;
}

// object is NamedNode, BlankNode, or Literal
if(o.termType === TYPE_NAMED_NODE) {
Expand Down
36 changes: 27 additions & 9 deletions lib/URDNA2015.js
Original file line number Diff line number Diff line change
Expand Up @@ -14,23 +14,24 @@ module.exports = class URDNA2015 {
createMessageDigest = null,
messageDigestAlgorithm = 'sha256',
canonicalIdMap = new Map(),
maxDeepIterations = Infinity,
signal = AbortSignal.timeout(1000)
maxWorkFactor = 1,
maxDeepIterations = -1,
signal = null
} = {}) {
this.name = 'RDFC-1.0';
this.blankNodeInfo = new Map();
this.canonicalIssuer = new IdentifierIssuer('c14n', canonicalIdMap);
this.createMessageDigest = createMessageDigest ||
(() => new MessageDigest(messageDigestAlgorithm));
this.maxWorkFactor = maxWorkFactor;
this.maxDeepIterations = maxDeepIterations;
this.quads = null;
this.deepIterations = null;
this.remainingDeepIterations = 0;
this.signal = signal;
this.quads = null;
}

// 4.4) Normalization Algorithm
async main(dataset) {
this.deepIterations = new Map();
this.quads = dataset;

// 1) Create the normalization state.
Expand Down Expand Up @@ -99,6 +100,24 @@ module.exports = class URDNA2015 {
// 5.4.5) Set simple to true.
}

if(this.maxDeepIterations < 0) {
// calculate maxDeepIterations if not explicit
if(this.maxWorkFactor === 0) {
// use 0 default
davidlehn marked this conversation as resolved.
Show resolved Hide resolved
} else if(this.maxWorkFactor === Infinity) {
this.maxDeepIterations = Infinity;
} else {
const nonUniqueCount =
nonUnique.reduce((count, v) => count + v.length, 0);
this.maxDeepIterations = nonUniqueCount ** this.maxWorkFactor;
}
}
// handle any large inputs as Infinity
if(this.maxDeepIterations > Number.MAX_SAFE_INTEGER) {
this.maxDeepIterations = Infinity;
}
this.remainingDeepIterations = this.maxDeepIterations;

// 6) For each hash to identifier list mapping in hash to blank nodes map,
// lexicographically-sorted by hash:
// Note: sort optimized away, use `nonUnique`.
Expand Down Expand Up @@ -253,12 +272,11 @@ module.exports = class URDNA2015 {

// 4.8) Hash N-Degree Quads
async hashNDegreeQuads(id, issuer) {
const deepIterations = this.deepIterations.get(id) || 0;
if(deepIterations > this.maxDeepIterations) {
if(this.remainingDeepIterations === 0) {
throw new Error(
`Maximum deep iterations (${this.maxDeepIterations}) exceeded.`);
}
this.deepIterations.set(id, deepIterations + 1);
this.remainingDeepIterations--;

// 1) Create a hash to related blank nodes map for storing hashes that
// identify related blank nodes.
Expand Down Expand Up @@ -290,7 +308,7 @@ module.exports = class URDNA2015 {
// Note: batch permutations 3 at a time
if(++i % 3 === 0) {
if(this.signal && this.signal.aborted) {
throw new Error('Abort signal received.');
throw new Error(`Abort signal received: "${this.signal.reason}".`);
}
await this._yield();
}
Expand Down
36 changes: 28 additions & 8 deletions lib/URDNA2015Sync.js
Original file line number Diff line number Diff line change
Expand Up @@ -15,24 +15,27 @@ module.exports = class URDNA2015Sync {
createMessageDigest = null,
messageDigestAlgorithm = 'sha256',
canonicalIdMap = new Map(),
maxDeepIterations = Infinity,
timeout = 1000
maxWorkFactor = 1,
maxDeepIterations = -1,
timeout = 0
} = {}) {
this.name = 'RDFC-1.0';
this.blankNodeInfo = new Map();
this.canonicalIssuer = new IdentifierIssuer('c14n', canonicalIdMap);
this.createMessageDigest = createMessageDigest ||
(() => new MessageDigest(messageDigestAlgorithm));
this.maxWorkFactor = maxWorkFactor;
this.maxDeepIterations = maxDeepIterations;
this.remainingDeepIterations = 0;
this.timeout = timeout;
this.startTime = Date.now();
if(timeout > 0) {
this.startTime = Date.now();
}
this.quads = null;
this.deepIterations = null;
}

// 4.4) Normalization Algorithm
main(dataset) {
this.deepIterations = new Map();
this.quads = dataset;

// 1) Create the normalization state.
Expand Down Expand Up @@ -96,6 +99,24 @@ module.exports = class URDNA2015Sync {
// 5.4.5) Set simple to true.
}

if(this.maxDeepIterations < 0) {
// calculate maxDeepIterations if not explicit
if(this.maxWorkFactor === 0) {
// use 0 default
} else if(this.maxWorkFactor === Infinity) {
this.maxDeepIterations = Infinity;
} else {
const nonUniqueCount =
nonUnique.reduce((count, v) => count + v.length, 0);
this.maxDeepIterations = nonUniqueCount ** this.maxWorkFactor;
}
}
// handle any large inputs as Infinity
if(this.maxDeepIterations > Number.MAX_SAFE_INTEGER) {
this.maxDeepIterations = Infinity;
}
this.remainingDeepIterations = this.maxDeepIterations;

// 6) For each hash to identifier list mapping in hash to blank nodes map,
// lexicographically-sorted by hash:
// Note: sort optimized away, use `nonUnique`.
Expand Down Expand Up @@ -250,12 +271,11 @@ module.exports = class URDNA2015Sync {

// 4.8) Hash N-Degree Quads
hashNDegreeQuads(id, issuer) {
const deepIterations = this.deepIterations.get(id) || 0;
if(deepIterations > this.maxDeepIterations) {
if(this.remainingDeepIterations === 0) {
throw new Error(
`Maximum deep iterations (${this.maxDeepIterations}) exceeded.`);
}
this.deepIterations.set(id, deepIterations + 1);
this.remainingDeepIterations--;

// 1) Create a hash to related blank nodes map for storing hashes that
// identify related blank nodes.
Expand Down
Loading
Loading