Skip to content
This repository has been archived by the owner on Sep 19, 2023. It is now read-only.

Commit

Permalink
Add hook to provide superType relationships
Browse files Browse the repository at this point in the history
Also adds a cache to improve performance of superType lookups.
  • Loading branch information
jitsedesmet committed Aug 25, 2021
1 parent 869ddea commit b527ac2
Show file tree
Hide file tree
Showing 56 changed files with 2,033 additions and 1,522 deletions.
70 changes: 60 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,25 +48,33 @@ Note: If you want to use *aggregates*, or *exists* you should check out the [str

### Config

Sparqlee accepts an optional config argument, that is not required for simple use cases, but for feature completeness and spec compliance it should be populated fully.
Sparqlee accepts an optional config argument, that is not required for simple use cases,
but for feature completeness and spec compliance it should receive `now, baseIRI, exists, aggregate and bnode`.

```ts
interface AsyncEvaluatorConfig {
interface AsyncEvaluatorContext {
now?: Date;
baseIRI?: string;

exists?: (expression: Alg.ExistenceExpression, mapping: Bindings) => Promise<boolean>;
aggregate?: (expression: Alg.AggregateExpression) => Promise<RDF.Term>;
bnode?: (input?: string) => Promise<RDF.BlankNode>;
extensionFunctionCreator?: (functionNamedNode: RDF.NamedNode) => (args: RDF.Term[]) => Promise<RDF.Term> | undefined;
overloadCache?: LRUCache<string, SomeInternalType>;
typeCache?: LRUCache<string, SomeInternalType>;
getSuperType?: (unknownType: string) => string;
}
```

See the [stream](#streams) and [context dependant function](#context_dependant_functions) sections for more info.

### Errors

Sparqlee exports an Error class called `ExpressionError` from which all SPARQL related errors inherit. These might include unbound variables, wrong types, invalid lexical forms, and much more. More info on errors [here](lib/util/Errors.ts). These errors can be caught, and may impact program execution in an expected way. All other errors are unexpected, and are thus programmer mistakes or mistakes in this library.
Sparqlee exports an Error class called `ExpressionError` from which all SPARQL related errors inherit.
These might include unbound variables, wrong types, invalid lexical forms, and much more.
More info on errors [here](lib/util/Errors.ts).
These errors can be caught, and may impact program execution in an expected way.
All other errors are unexpected, and are thus programmer mistakes or mistakes in this library.

There is also the utility function `isExpressionError` for detecting these cases.

Expand All @@ -87,15 +95,19 @@ try {

### Exists

'Exists' operations are an annoying problem to tackle in the context of an expression evaluator, since they make the operation statefull and context dependant. They might span entire streams and, depending on the use case, have very different requirements for speed and memory consumption. Sparqlee has therefore decided to delegate this responsibility back to you.
'Exists' operations are an annoying problem to tackle in the context of an expression evaluator,
since they make the operation statefull and context dependant.
They might span entire streams and, depending on the use case, have very different requirements for speed and memory consumption.
Sparqlee has therefore decided to delegate this responsibility back to you.

You can, if you want, pass hooks to the evaluators of the shape:

```ts
exists?: (expression: Alg.ExistenceExpression, mapping: Bindings) => Promise<boolean>;
```

If Sparqlee encounters any or existence expression, it will call this hook with the relevant information so you can resolve it yourself. If these hooks are not present, but an existence expression is encountered, then an error is thrown.
If Sparqlee encounters any or existence expression, it will call this hook with the relevant information so you can resolve it yourself.
If these hooks are not present, but an existence expression is encountered, then an error is thrown.

An example consumer/hook can be found in [Comunica](https://github.com/comunica/comunica/blob/master/packages/actor-query-operation-filter-sparqlee/lib/ActorQueryOperationFilterSparqlee.ts).;

Expand Down Expand Up @@ -142,6 +154,35 @@ config.extensionFunctionCreator = (functionName: RDF.NamedNode) => {
}
```

### Overload function caching

An overloadcache allows Sparqlee to cache the implementation of a function provided the argument types.
This cache is only used when provided to the context.
It can speed up execution time significantly especially when adding evaluating a lot of bindings that mostly have the same types.
This statement is backed up by the [integer addition benchmark](/benchmarks/integerAddition.ts).

This cache can be reused across multiple evaluators. We don't recommend manual modification.

### Super type discovery

The `getSuperType` allow a user to use custom types and define their super relationship to other types.
Example:
```ts
const superTypeDiscoverCallback = (unknownType: string) => {
if (unknownType === "http://example.org/label") {
return 'http://www.w3.org/2001/XMLSchema#string';
}
return 'term';
}
```
This is helpful when performing queries over data that uses data-types that are a restriction on the known xsd data types.
For example a datasource could define `ex:label = "good" | "bad"`. These are both strings,
and we could for example call the `substr` function on these values.
When we want to allow this in a type safe way, we need to check if `ex:label` is a restriction on string.

The `typeCache` allows us to cache these super type relationships.
This cache can be reused across multiple evaluators. We don't recommend manual modification.

### Binary

Sparqlee also provides a binary for evaluating simple expressions from the command line. Example
Expand All @@ -164,27 +205,35 @@ Literal {
### Context dependant functions
Some functions (BNODE, NOW, IRI) need a (statefull) context from the caller to function correctly according to the spec. This context can be passed as an argument to Sparqlee (see the [config section](#config) for exact types). If they are not passed, Sparqlee will use a naive implementation that might do the trick for simple use cases.
Some functions (BNODE, NOW, IRI) need a (statefull) context from the caller to function correctly according to the spec.
This context can be passed as an argument to Sparqlee (see the [config section](#config) for exact types).
If they are not passed, Sparqlee will use a naive implementation that might do the trick for simple use cases.
#### BNODE
[spec](https://www.w3.org/TR/sparql11-query/#func-bnode)
Blank nodes are very dependant on the rest of the SPARQL query, therefore, we provide the option of delegating the entire responsibility back to you by accepting a blank node constructor callback. If this is not found, we create a blank node with the given label, or we use uuid (v4) for argument-less calls to generate definitely unique blank nodes of the shape `blank_uuid`.
Blank nodes are very dependant on the rest of the SPARQL query, therefore,
we provide the option of delegating the entire responsibility back to you by accepting a blank node constructor callback.
If this is not found, we create a blank node with the given label,
or we use uuid (v4) for argument-less calls to generate definitely unique blank nodes of the shape `blank_uuid`.
`bnode(input?: string) => RDF.BlankNode`
#### Now
[spec](https://www.w3.org/TR/sparql11-query/#func-now)
All calls to now in a query must return the same value, since we aren't aware of the rest of the query, you can provide a timestamp (`now: Date`). If it's not present, Sparqlee will use the timestamp of evaluator creation, this at least allows evaluation with multiple bindings to have the same `now` value.
All calls to now in a query must return the same value, since we aren't aware of the rest of the query,
you can provide a timestamp (`now: Date`). If it's not present, Sparqlee will use the timestamp of evaluator creation,
this at least allows evaluation with multiple bindings to have the same `now` value.
#### IRI
[spec](https://www.w3.org/TR/sparql11-query/#func-iri)
To be fully spec compliant, the IRI/URI functions should take into account base IRI of the query, which you can provide as `baseIRI: string` to the config.
To be fully spec compliant, the IRI/URI functions should take into account base IRI of the query,
which you can provide as `baseIRI: string` to the config.
## Spec compliance
Expand Down Expand Up @@ -314,7 +363,8 @@ All definitions are defined using a builder model defined in [Helpers.ts](lib/fu
Three kinds exists:
- Regular functions: Functions with a uniform interface, that only need their arguments to calculate their result.
- Special functions: whose behaviour deviates enough from the norm to warrant the implementations taking full control over type checking and evaluation (these are mostly the functional forms).
- Special functions: whose behaviour deviates enough from the norm to warrant the implementations taking full control
over type checking and evaluation (these are mostly the functional forms).
- Named functions: which correspond to the SPARQLAlgebra Named Expressions.
**TODO**: Explain this hot mess some more.
Expand Down
50 changes: 42 additions & 8 deletions benchmarks/integerAddition.ts
Original file line number Diff line number Diff line change
@@ -1,9 +1,14 @@
// eslint-disable-next-line eslint-comments/disable-enable-pair
/* eslint-disable no-console */

import type * as RDF from '@rdfjs/types';
import type { Event } from 'benchmark';
import { Suite } from 'benchmark';
import * as Benchmark from 'benchmark';
import * as LRUCache from 'lru-cache';
import { DataFactory } from 'rdf-data-factory';
import { translate } from 'sparqlalgebrajs';
import { AsyncEvaluator } from '../lib/evaluators/AsyncEvaluator';
import { SyncEvaluator } from '../lib/evaluators/SyncEvaluator';
import { Bindings } from '../lib/Types';
import { TypeURL } from '../lib/util/Consts';
import { template } from '../test/util/Aliases';
Expand All @@ -15,21 +20,50 @@ function integerTerm(int: number): RDF.Term {
return DF.literal(int.toString(), DF.namedNode(TypeURL.XSD_INTEGER));
}

benchSuite.add('bench addition', async() => {
const noCache = new Benchmark('bench addition no overloadCache', () => {
const query = translate(template('?a + ?b = ?c'));
const evaluator = new AsyncEvaluator(query.input.expression);
const evaluator = new SyncEvaluator(query.input.expression, {
// Provide a cache that can not store anything
overloadCache: new LRUCache({
max: 1,
length: () => 5,
}),
});
const max = 100;
for (let fst = 0; fst < max; fst++) {
for (let snd = 0; snd < max; snd++) {
await evaluator.evaluate(Bindings({
evaluator.evaluate(Bindings({
'?a': integerTerm(fst),
'?b': integerTerm(snd),
'?c': integerTerm(fst + snd),
}));
}
}
}).on('cycle', (event: Event) => {
// eslint-disable-next-line no-console
console.log(String(event.target));
}).run();
});

const cache = new Benchmark('bench addition with overloadCache', () => {
const query = translate(template('?a + ?b = ?c'));
const evaluator = new SyncEvaluator(query.input.expression, {
overloadCache: new LRUCache(),
});
const max = 100;
for (let fst = 0; fst < max; fst++) {
for (let snd = 0; snd < max; snd++) {
evaluator.evaluate(Bindings({
'?a': integerTerm(fst),
'?b': integerTerm(snd),
'?c': integerTerm(fst + snd),
}));
}
}
});

benchSuite.push(noCache);
benchSuite.push(cache);
benchSuite.on('cycle', (event: Event) => {
console.log(String(event.target));
}).on('complete', () => {
console.log(`Mean execution time without cache ${noCache.stats.mean}`);
console.log(`Mean execution time with cache ${cache.stats.mean}`);
console.log(`Fastest is ${benchSuite.filter('fastest').map('name')}`);
}).run({ async: true });
6 changes: 4 additions & 2 deletions index.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
// TODO: this form is deprecated, we should not rename these types. Should change in next mayor update.
export { AsyncEvaluator, IAsyncEvaluatorConfig as AsyncEvaluatorConfig } from './lib/evaluators/AsyncEvaluator';
export { SyncEvaluator, ISyncEvaluatorConfig as SyncEvaluatorConfig } from './lib/evaluators/SyncEvaluator';
export { AsyncEvaluator, IAsyncEvaluatorContext as AsyncEvaluatorConfig,
IAsyncEvaluatorContext as IAsyncEvaluatorConfig } from './lib/evaluators/AsyncEvaluator';
export { SyncEvaluator, ISyncEvaluatorContext as SyncEvaluatorConfig,
ISyncEvaluatorContext as ISyncEvaluatorConfig } from './lib/evaluators/SyncEvaluator';
export { AggregateEvaluator } from './lib/evaluators/AggregateEvaluator';

export { ExpressionError, isExpressionError } from './lib/util/Errors';
Expand Down

0 comments on commit b527ac2

Please sign in to comment.