-
Notifications
You must be signed in to change notification settings - Fork 7
Feature/buffer & open world typing #112
Feature/buffer & open world typing #112
Conversation
Pull Request Test Coverage Report for Build 1166602057
💛 - Coveralls |
string([ ...source ].slice(startingLoc - 1, length).join(''))) | ||
.onTernary([ TypeURL.RDF_LANG_STRING, TypeURL.XSD_INTEGER, TypeURL.XSD_INTEGER ], | ||
(source: E.LangStringLiteral, startingLoc: E.NumericLiteral, length: E.NumericLiteral) => { | ||
() => (source: E.LangStringLiteral, startingLoc: E.NumericLiteral, length: E.NumericLiteral) => { | ||
const sub = [ ...source.typedValue ].slice(startingLoc.typedValue - 1, length.typedValue).join(''); | ||
return langString(sub, source.language); | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question here before I make an issue about it. I think this implementation is wrong. The third argument of slice is the endIndex. However the third argument of https://www.w3.org/TR/sparql11-query/#func-substr is the length of the substring.
This function is also based on https://www.w3.org/TR/xpath-functions/#func-substring . It talks about arguments being double and sparql talks about integers. Maybe we should provide both implementation? Doesn't hurt anyone?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this implementation is wrong.
That may be wrong indeed. Aren't there tests for this yet?
It talks about arguments being double and sparql talks about integers. Maybe we should provide both implementation? Doesn't hurt anyone?
I think there is no need to do that. If I understand the XPath spec correctly, doubles are only allowed there for backwards-compatibility, which is a problem we don't really have here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are tests but they test with arguments someStr 1 1 which is gives the same result.
Alright. Should I edit this in this PR or create an issue handling this small change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would do it in a separate PR right after this one. (or at least create an issue now)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good in general!
Just wondering I we shouldn't use the general config/context in places where the openworld config thing is passed.
We could do this for sure. We can then also make |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good stuff, just some minor comments.
### Overload function caching | ||
|
||
An overloadcache allows Sparqlee to cache the implementation of a function provided the argument types. | ||
This cache is only used when provided to the context. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also say that this is safe to reuse across different evaluators, and that manual modification is not recommended.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, do the benchmarks show that this actually helps performance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If so, make sure to explain that this improves performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, do the benchmarks show that this actually helps performance?
Don't know yet because I don't have the benchmark in this PR.
Maybe we should merge the benchmark PR first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, once you give the go-ahead, we can do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a benchmark test and cleaned the benchmarks a bit. It seems like it's 2x as fast. (in this benchmark)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like it's 2x as fast. (in this benchmark)
Oh wow, that's quite significant! I didn't expect it to be that much faster. But that's a good thing :-)
Make sure to mention this in the docs!
We'll probably want to exploit this as much as possible from within Comunica then, so that we reuse this cache whenever possible. Perhaps you could look into this next?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright. I did some digging and it turns out my initial benchmark introduced in #106 was quite wrong. I updated the benchmark to return a mean execution time and this gave some more insights. I also added some benchmarks on past commits to find out where the big difference in execution time came from. It seems like the new type system is to blame.
I will provide a list of branches with added benchmarks and their reports.
- benchmark on commit before using the overload tree ->
bench addition no overloadCache x 18.58 ops/sec ±1.96% (49 runs sampled) Mean execution time without cache 0.05381279502040815
- benchmark on commit after using overload tree but with old type system ->
bench addition no overloadCache x 34.60 ops/sec ±1.78% (60 runs sampled) Mean execution time without cache 0.028901073775000004
- benchmark on commit using overload tree and new type system (with type substitution and type promotion) ->
bench addition no overloadCache x 7.45 ops/sec ±1.15% (23 runs sampled) Mean execution time without cache 0.134191621826087
- Benchmark on current PR ->
bench addition no overloadCache x 8.82 ops/sec ±1.46% (26 runs sampled) bench addition with overloadCache x 22.57 ops/sec ±0.76% (41 runs sampled) Mean execution time without cache 0.11341942315384619 Mean execution time with cache 0.044309969926829264 Fastest is bench addition with overloadCache
From this we can conclude that the new type system is a lot slower then the old one. I already said that the integer addition is one of the hardest operations to perform on this system. Overall we can conclude that using a cache the new system is still faster than what we had before. I am personally really happy about this considering we have a much stronger type system that is fully spec complaint (ignoring the type promotion detail here).
A small thing to notice is how the overload tree is quite a bit faster than the Immutable list solution we had first.
This pull request aims to close #109 .
What this PR does:
lib/transformation.ts
in 2 classes that are easier to read and maintain.lib/Aggragators
in multiple files and create a base class for the evaluators to use instead of calling global functions.TypeCheckedLiteral
Reason: A literal almost always has a known dataType. Only when transforming an RDF.Term to it's internal representation does it have an other type. Every operation on a custom function return a known type. For example providing '-5 ' with dataType myNegatives extends Integer provided to UnaryMinus should return the known known type integer and not something with dataType myNegatives. Typechecking every internal literal would thus be an unnecessary overhead.
tests/util/generalEvaluation
I added ageneralErrorEvaluation
function. Using generalEvaluate in combination with an ErrorTable made SyncAvaluator not fully tested. When theAsyncEvaluator
throws an error in an ErrorTable. We still want to test theSyncEvaluator
. This function makes sure this happens by catching all thrown Errors.OverLoadTree
. This allows functions to access theIOpenWorldEnabler
. It also allows us to make IRI and NOW regular functions.I couldn't move
BNODE
because the sync and async implementation is different. It might be interesting to move this in the future since we probably want theAsyncEvaluator
to be able to get an asyncSuperTypeDiscoverCallback
. This should be possible but I would leave it for another PR. This one is big enough as is.It should be noted that the values of the caches are not to be used by the user.
Not providing a
typeCache
results in us making one our self. Not providing anoverloadCache
will not create one.