Skip to content

Commit

Permalink
Update complexity controls.
Browse files Browse the repository at this point in the history
- Add `maxWorkFactor` to calculate `maxDeepIterations` based on
  non-unique blank nodes. Default to `1`.
- If `maxDeepIterations` is >= 0 then use it explicitly.
- Remove default timeout signal.
- Add docs.
- Add tests.
- Update async and sync versions.
- Update tests to handle `computationalComplexity` test paramter and map
  it to `maxWorkFactor` adjustments.
- Improve negative test handling.
  • Loading branch information
davidlehn committed Sep 15, 2023
1 parent 73f48e2 commit 69e2df7
Show file tree
Hide file tree
Showing 7 changed files with 181 additions and 60 deletions.
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,11 @@
- Add `signal` option to allow use of an `AbortSignal` for complexity control.
Enables the algorithm to abort after a timeout, manual abort, or other
condition.
- Add `maxWorkFactor` to calculate a deep iteration limit based on the number
of non-unique blank nodes. This defaults to `1` for roughly O(n) behavior and
will handle common graphs. It must be adjusted to higher values if there is a
need to process graphs with complex blank nodes or other "poison" graphs. It
is recommeded to use this paramter instead of `maxDeepIterations` directly.
- **BREAKING**: Check output `format` parameter. Must be omitted, falsey, or
"application/n-quads".

Expand Down
46 changes: 36 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,13 +112,17 @@ Complexity Control

Inputs may vary in complexity and some inputs may use more computational
resources than desired. There also exists a class of inputs that are sometimes
referred to as "poison" graphs. These are designed specifically to be difficult
to process but often do not provide any useful purpose.
referred to as "poison" graphs. These are structured or designed specifically
to be difficult to process but often do not provide any useful purpose.

### Signals

The `canonize` API accepts an
[`AbortSignal`](https://developer.mozilla.org/en-US/docs/Web/API/AbortSignal)
that can be used to control processing of computationally difficult inputs. It
can be used in a number of ways:
as the `signal` parameter that can be used to control processing of
computationally difficult inputs. `signal` is not set by default. It can be
used in a number of ways:

- Abort processing manually with
[`AbortController.abort()`](https://developer.mozilla.org/en-US/docs/Web/API/AbortController/abort)
- Abort processing after a timeout with
Expand All @@ -132,15 +136,37 @@ can be used in a number of ways:
For performance reasons this signal is only checked periodically during
processing and is not immediate.

The `canonize` API also has a `maxDeepIterations` option to control how many
times deep comparison algorithms run before throwing an error. This provides
additional control over input complexity as this limit is generally very low
(no more than 1 or 2) for common use case graphs.
### Limits

The `canonize` API has parameters to limit how many times the blank node deep
comparison algorithm can be run to assign blank node labels before throwing an
error. It is designed to control exponential growth related to the number of
blank nodes. Graphs without blank nodes, and those with simple blank nodes will
not run the algorithms that use this parameter. Those with more complex deeply
connected blank nodes can result in significant time complexity which these
parameters can control.

The `canonize` API has the following parameters to control limits:

- `maxWorkFactor`: Used to calculate a maximum number of deep iterations based
on the number of non-unique blank nodes.
- `0`: Deep inspection disallowed.
- `1`: Limit deep iterations to O(n). (default)
- `2`: Limit deep iterations to O(n^2).
- `3`: Limit deep iterations to O(n^3). Values at this level or higher will
allow processing of complex "poison" graphs but may take significant
amounts of computational resources.
- `Infinity`: No limitation.
- `maxDeepIterations`: The exact number of deep iterations. This parameter is
for specialized use cases and use of `maxWorkFactor` is recommended. Defaults
to `Infinity` and any other value will override `maxWorkFactor`.

### Usage

In practice, callers must balance system load, concurrent processing, expected
input size and complexity, and other factors to determine which complexity
controls to use. This library defaults to infinite `maxDeepIterations` and a 1
second timeout, and these should be adjusted as needed.
controls to use. This library defaults to a `maxWorkFactor` of `1` and no
timeout signal. These should be adjusted as needed.

Related Modules
---------------
Expand Down
36 changes: 27 additions & 9 deletions lib/URDNA2015.js
Original file line number Diff line number Diff line change
Expand Up @@ -14,23 +14,24 @@ module.exports = class URDNA2015 {
createMessageDigest = null,
messageDigestAlgorithm = 'sha256',
canonicalIdMap = new Map(),
maxDeepIterations = Infinity,
signal = AbortSignal.timeout(1000)
maxWorkFactor = 1,
maxDeepIterations = -1,
signal = null
} = {}) {
this.name = 'RDFC-1.0';
this.blankNodeInfo = new Map();
this.canonicalIssuer = new IdentifierIssuer('c14n', canonicalIdMap);
this.createMessageDigest = createMessageDigest ||
(() => new MessageDigest(messageDigestAlgorithm));
this.maxWorkFactor = maxWorkFactor;
this.maxDeepIterations = maxDeepIterations;
this.quads = null;
this.deepIterations = null;
this.remainingDeepIterations = 0;
this.signal = signal;
this.quads = null;
}

// 4.4) Normalization Algorithm
async main(dataset) {
this.deepIterations = new Map();
this.quads = dataset;

// 1) Create the normalization state.
Expand Down Expand Up @@ -99,6 +100,24 @@ module.exports = class URDNA2015 {
// 5.4.5) Set simple to true.
}

if(this.maxDeepIterations < 0) {
// calculate maxDeepIterations if not explicit
if(this.maxWorkFactor === 0) {
// use 0 default
} else if(this.maxWorkFactor === Infinity) {
this.maxDeepIterations = Infinity;
} else {
const nonUniqueCount =
nonUnique.reduce((count, v) => count + v.length, 0);
this.maxDeepIterations = nonUniqueCount ** this.maxWorkFactor;
}
}
// handle any large inputs as Infinity
if(this.maxDeepIterations > Number.MAX_SAFE_INTEGER) {
this.maxDeepIterations = Infinity;
}
this.remainingDeepIterations = this.maxDeepIterations;

// 6) For each hash to identifier list mapping in hash to blank nodes map,
// lexicographically-sorted by hash:
// Note: sort optimized away, use `nonUnique`.
Expand Down Expand Up @@ -253,12 +272,11 @@ module.exports = class URDNA2015 {

// 4.8) Hash N-Degree Quads
async hashNDegreeQuads(id, issuer) {
const deepIterations = this.deepIterations.get(id) || 0;
if(deepIterations > this.maxDeepIterations) {
if(this.remainingDeepIterations === 0) {
throw new Error(
`Maximum deep iterations (${this.maxDeepIterations}) exceeded.`);
}
this.deepIterations.set(id, deepIterations + 1);
this.remainingDeepIterations--;

// 1) Create a hash to related blank nodes map for storing hashes that
// identify related blank nodes.
Expand Down Expand Up @@ -290,7 +308,7 @@ module.exports = class URDNA2015 {
// Note: batch permutations 3 at a time
if(++i % 3 === 0) {
if(this.signal && this.signal.aborted) {
throw new Error('Abort signal received.');
throw new Error(`Abort signal received: "${this.signal.reason}".`);
}
await this._yield();
}
Expand Down
36 changes: 28 additions & 8 deletions lib/URDNA2015Sync.js
Original file line number Diff line number Diff line change
Expand Up @@ -15,24 +15,27 @@ module.exports = class URDNA2015Sync {
createMessageDigest = null,
messageDigestAlgorithm = 'sha256',
canonicalIdMap = new Map(),
maxDeepIterations = Infinity,
timeout = 1000
maxWorkFactor = 1,
maxDeepIterations = -1,
timeout = 0
} = {}) {
this.name = 'RDFC-1.0';
this.blankNodeInfo = new Map();
this.canonicalIssuer = new IdentifierIssuer('c14n', canonicalIdMap);
this.createMessageDigest = createMessageDigest ||
(() => new MessageDigest(messageDigestAlgorithm));
this.maxWorkFactor = maxWorkFactor;
this.maxDeepIterations = maxDeepIterations;
this.remainingDeepIterations = 0;
this.timeout = timeout;
this.startTime = Date.now();
if(timeout > 0) {
this.startTime = Date.now();
}
this.quads = null;
this.deepIterations = null;
}

// 4.4) Normalization Algorithm
main(dataset) {
this.deepIterations = new Map();
this.quads = dataset;

// 1) Create the normalization state.
Expand Down Expand Up @@ -96,6 +99,24 @@ module.exports = class URDNA2015Sync {
// 5.4.5) Set simple to true.
}

if(this.maxDeepIterations < 0) {
// calculate maxDeepIterations if not explicit
if(this.maxWorkFactor === 0) {
// use 0 default
} else if(this.maxWorkFactor === Infinity) {
this.maxDeepIterations = Infinity;
} else {
const nonUniqueCount =
nonUnique.reduce((count, v) => count + v.length, 0);
this.maxDeepIterations = nonUniqueCount ** this.maxWorkFactor;
}
}
// handle any large inputs as Infinity
if(this.maxDeepIterations > Number.MAX_SAFE_INTEGER) {
this.maxDeepIterations = Infinity;
}
this.remainingDeepIterations = this.maxDeepIterations;

// 6) For each hash to identifier list mapping in hash to blank nodes map,
// lexicographically-sorted by hash:
// Note: sort optimized away, use `nonUnique`.
Expand Down Expand Up @@ -250,12 +271,11 @@ module.exports = class URDNA2015Sync {

// 4.8) Hash N-Degree Quads
hashNDegreeQuads(id, issuer) {
const deepIterations = this.deepIterations.get(id) || 0;
if(deepIterations > this.maxDeepIterations) {
if(this.remainingDeepIterations === 0) {
throw new Error(
`Maximum deep iterations (${this.maxDeepIterations}) exceeded.`);
}
this.deepIterations.set(id, deepIterations + 1);
this.remainingDeepIterations--;

// 1) Create a hash to related blank nodes map for storing hashes that
// identify related blank nodes.
Expand Down
35 changes: 28 additions & 7 deletions lib/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -116,15 +116,25 @@ exports._rdfCanonizeNative = function(api) {
* {string} [format] - The format of the output. Omit or use
* 'application/n-quads' for a N-Quads string.
* {boolean} [useNative=false] - Use native implementation.
* {number} [maxDeepIterations=Infinity] - The maximum number of times to run
* {number} [maxWorkFactor=1] - Control of the maximum number of times to run
* deep comparison algorithms (such as the N-Degree Hash Quads algorithm
* used in RDFC-1.0) before bailing out and throwing an error; this is a
* useful setting for preventing wasted CPU cycles or DoS when canonizing
* meaningless or potentially malicious datasets, a recommended value is
* `1`.
* meaningless or potentially malicious datasets. This parameter sets the
* maximum number of iterations based on the number of non-unique blank
* nodes. `0` to disable iterations, `1` for a O(n) limit, `2` for a O(n^2)
* limit, `3` and higher may handle "poison" graphs but may take
* significant computational resources, `Infinity` for no limitation.
* Defaults to `1` which can handle many common inputs.
* {number} [maxDeepIterations=-1] - The maximum number of times to run
* deep comparison algorithms (such as the N-Degree Hash Quads algorithm
* used in RDFC-1.0) before bailing out and throwing an error; this is a
* useful setting for preventing wasted CPU cycles or DoS when canonizing
* meaningless or potentially malicious datasets. If set to a value other
* than `-1` it will explicitly set the number of iterations and override
* `maxWorkFactor`. It is recommended to use `maxWorkFactor`.
* {AbortSignal} [signal] - An AbortSignal used to abort the operation. The
* aborted status is only periodically checked for performance reasons.
* The default is to timeout after 1s. Use null to disable.
* {boolean} [rejectURDNA2015=false] - Reject the "URDNA2015" algorithm name
* instead of treating it as an alias for "RDFC-1.0".
*
Expand Down Expand Up @@ -190,12 +200,23 @@ exports.canonize = async function(input, options) {
* {string} [format] - The format of the output. Omit or use
* 'application/n-quads' for a N-Quads string.
* {boolean} [useNative=false] - Use native implementation.
* {number} [maxDeepIterations=Infinity] - The maximum number of times to run
* {number} [maxWorkFactor=1] - Control of the maximum number of times to run
* deep comparison algorithms (such as the N-Degree Hash Quads algorithm
* used in RDFC-1.0) before bailing out and throwing an error; this is a
* useful setting for preventing wasted CPU cycles or DoS when canonizing
* meaningless or potentially malicious datasets. This parameter sets the
* maximum number of iterations based on the number of non-unique blank
* nodes. `0` to disable iterations, `1` for a O(n) limit, `2` for a O(n^2)
* limit, `3` and higher may handle "poison" graphs but may take
* significant computational resources, `Infinity` for no limitation.
* Defaults to `1` which can handle many common inputs.
* {number} [maxDeepIterations=-1] - The maximum number of times to run
* deep comparison algorithms (such as the N-Degree Hash Quads algorithm
* used in RDFC-1.0) before bailing out and throwing an error; this is a
* useful setting for preventing wasted CPU cycles or DoS when canonizing
* meaningless or potentially malicious datasets, a recommended value is
* `1`.
* meaningless or potentially malicious datasets. If set to a value other
* than `-1` it will explicitly set the number of iterations and override
* `maxWorkFactor`. It is recommended to use `maxWorkFactor`.
* {number} [timeout=1000] - The maximum number of milliseconds before the
* operation will timeout. This is only periodically checked for
* performance reasons. Use 0 to disable. Note: This is a replacement for
Expand Down
27 changes: 25 additions & 2 deletions test/misc.js
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,28 @@ _:c14n1 <urn:p1> "v1" .
assert(!output, 'abort should have no output');
});

it('should abort (work factor)', async () => {
const {data} = graphs.makeDataA({
subjects: 6,
objects: 6
});
let error;
let output;
try {
output = await rdfCanonize.canonize(data, {
algorithm: 'RDFC-1.0',
inputFormat: 'application/n-quads',
format: 'application/n-quads',
signal: null,
maxWorkFactor: 1
});
} catch(e) {
error = e;
}
assert(error, 'no abort error');
assert(!output, 'abort should have no output');
});

it('should abort (iterations)', async () => {
const {data} = graphs.makeDataA({
subjects: 6,
Expand Down Expand Up @@ -219,8 +241,9 @@ _:c14n1 <urn:p1> "v1" .
algorithm: 'RDFC-1.0',
inputFormat: 'application/n-quads',
format: 'application/n-quads',
signal: AbortSignal.timeout(100),
maxDeepIterations: 1000
//signal: AbortSignal.timeout(1000),
//maxWorkFactor: 3
//maxDeepIterations: 9
});
output = await p;
//console.log('OUTPUT', output);
Expand Down
Loading

0 comments on commit 69e2df7

Please sign in to comment.