Feature/buffer & open world typing #112

jitsedesmet · 2021-08-20T14:48:17Z

This pull request aims to close #109 .

What this PR does:

Split lib/transformation.ts in 2 classes that are easier to read and maintain.
Split lib/Aggragators in multiple files and create a base class for the evaluators to use instead of calling global functions.
Create some more super classes for the Evaluator files that makes us handle them more like the same. (Reduces the need to copy everything from sync to async)
Remove TypeCheckedLiteral
Reason: A literal almost always has a known dataType. Only when transforming an RDF.Term to it's internal representation does it have an other type. Every operation on a custom function return a known type. For example providing '-5 ' with dataType myNegatives extends Integer provided to UnaryMinus should return the known known type integer and not something with dataType myNegatives. Typechecking every internal literal would thus be an unnecessary overhead.
Update tests/util/generalEvaluation I added a generalErrorEvaluation function. Using generalEvaluate in combination with an ErrorTable made SyncAvaluator not fully tested. When the AsyncEvaluator throws an error in an ErrorTable. We still want to test the SyncEvaluator. This function makes sure this happens by catching all thrown Errors.
Provides access to a shared (between sync and async) functionConfig from within functions that use the OverLoadTree. This allows functions to access the IOpenWorldEnabler. It also allows us to make IRI and NOW regular functions.
I couldn't move BNODE because the sync and async implementation is different. It might be interesting to move this in the future since we probably want the AsyncEvaluator to be able to get an async SuperTypeDiscoverCallback. This should be possible but I would leave it for another PR. This one is big enough as is.
This PR will also close OverloadTree caching and open world types #109. It does this by requesting 3 extra arguments from the user:

overloadCache?: LRUCache<string, ImplementationFunction | undefined>;
typeCache?: LRUCache<string, GeneralSubExtensionTable>;
superTypeDiscoverCallback?: (unknownType: string) => string;

It should be noted that the values of the caches are not to be used by the user.
Not providing a typeCache results in us making one our self. Not providing an overloadCache will not create one.

coveralls · 2021-08-20T14:50:22Z

Pull Request Test Coverage Report for Build 1166602057

627 of 677 (92.61%) changed or added relevant lines in 29 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.3%) to 90.622%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
lib/functions/NamedFunctions.ts	27	28	96.43%
lib/transformers/TermTransformer.ts	47	50	94.0%
lib/evaluators/evaluatorHelpers/AsyncRecursiveEvaluator.ts	23	28	82.14%
lib/evaluators/evaluatorHelpers/SyncRecursiveEvaluator.ts	22	27	81.48%
lib/functions/Helpers.ts	64	72	88.89%
lib/functions/RegularFunctions.ts	99	112	88.39%
lib/transformers/AlgebraTransformer.ts	59	74	79.73%

Totals
Change from base Build 1162151356:	0.3%
Covered Lines:	1503
Relevant Lines:	1625

💛 - Coveralls

jitsedesmet · 2021-08-20T14:55:20Z

lib/functions/RegularFunctions.ts

        string([ ...source ].slice(startingLoc - 1, length).join('')))
    .onTernary([ TypeURL.RDF_LANG_STRING, TypeURL.XSD_INTEGER, TypeURL.XSD_INTEGER ],
-      (source: E.LangStringLiteral, startingLoc: E.NumericLiteral, length: E.NumericLiteral) => {
+      () => (source: E.LangStringLiteral, startingLoc: E.NumericLiteral, length: E.NumericLiteral) => {
        const sub = [ ...source.typedValue ].slice(startingLoc.typedValue - 1, length.typedValue).join('');
        return langString(sub, source.language);
      })


Question here before I make an issue about it. I think this implementation is wrong. The third argument of slice is the endIndex. However the third argument of https://www.w3.org/TR/sparql11-query/#func-substr is the length of the substring.
This function is also based on https://www.w3.org/TR/xpath-functions/#func-substring . It talks about arguments being double and sparql talks about integers. Maybe we should provide both implementation? Doesn't hurt anyone?

I think this implementation is wrong.

That may be wrong indeed. Aren't there tests for this yet?

It talks about arguments being double and sparql talks about integers. Maybe we should provide both implementation? Doesn't hurt anyone?

I think there is no need to do that. If I understand the XPath spec correctly, doubles are only allowed there for backwards-compatibility, which is a problem we don't really have here.

There are tests but they test with arguments someStr 1 1 which is gives the same result.
Alright. Should I edit this in this PR or create an issue handling this small change?

I would do it in a separate PR right after this one. (or at least create an issue now)

rubensworks

Looks good in general!

Just wondering I we shouldn't use the general config/context in places where the openworld config thing is passed.

lib/evaluators/evaluatorHelpers/BaseExpressionEvaluator.ts

lib/functions/Core.ts

lib/functions/RegularFunctions.ts

lib/transformers/AlgebraTransformer.ts

lib/util/TypeHandling.ts

jitsedesmet · 2021-08-23T08:10:38Z

Just wondering I we shouldn't use the general config/context in places where the openworld config thing is passed.

We could do this for sure. We can then also make SuperTypeCallback async. To handle this we would then also need to provide Static create method so the creation can be async. Some problems will arise handling normal construction but we'll see about that.

rubensworks

Good stuff, just some minor comments.

rubensworks · 2021-08-24T09:47:06Z

README.md

+### Overload function caching
+
+An overloadcache allows Sparqlee to cache the implementation of a function provided the argument types. 
+This cache is only used when provided to the context.


Also say that this is safe to reuse across different evaluators, and that manual modification is not recommended.

Also, do the benchmarks show that this actually helps performance?

If so, make sure to explain that this improves performance.

Also, do the benchmarks show that this actually helps performance?

Don't know yet because I don't have the benchmark in this PR.
Maybe we should merge the benchmark PR first?

Sure, once you give the go-ahead, we can do that.

I added a benchmark test and cleaned the benchmarks a bit. It seems like it's 2x as fast. (in this benchmark)

It seems like it's 2x as fast. (in this benchmark)

Oh wow, that's quite significant! I didn't expect it to be that much faster. But that's a good thing :-)

Make sure to mention this in the docs!

We'll probably want to exploit this as much as possible from within Comunica then, so that we reuse this cache whenever possible. Perhaps you could look into this next?

Alright. I did some digging and it turns out my initial benchmark introduced in #106 was quite wrong. I updated the benchmark to return a mean execution time and this gave some more insights. I also added some benchmarks on past commits to find out where the big difference in execution time came from. It seems like the new type system is to blame.

I will provide a list of branches with added benchmarks and their reports.

benchmark on commit before using the overload tree -> bench addition no overloadCache x 18.58 ops/sec ±1.96% (49 runs sampled) Mean execution time without cache 0.05381279502040815

benchmark on commit after using overload tree but with old type system -> bench addition no overloadCache x 34.60 ops/sec ±1.78% (60 runs sampled) Mean execution time without cache 0.028901073775000004

benchmark on commit using overload tree and new type system (with type substitution and type promotion) -> bench addition no overloadCache x 7.45 ops/sec ±1.15% (23 runs sampled) Mean execution time without cache 0.134191621826087

Benchmark on current PR -> bench addition no overloadCache x 8.82 ops/sec ±1.46% (26 runs sampled) bench addition with overloadCache x 22.57 ops/sec ±0.76% (41 runs sampled) Mean execution time without cache 0.11341942315384619 Mean execution time with cache 0.044309969926829264 Fastest is bench addition with overloadCache

From this we can conclude that the new type system is a lot slower then the old one. I already said that the integer addition is one of the hardest operations to perform on this system. Overall we can conclude that using a cache the new system is still faster than what we had before. I am personally really happy about this considering we have a much stronger type system that is fully spec complaint (ignoring the type promotion detail here).

A small thing to notice is how the overload tree is quite a bit faster than the Immutable list solution we had first.

README.md

index.ts

test/integration/functions/op.addition.test.ts

jitsedesmet added 5 commits August 18, 2021 15:00

First openWorldType chain adding

be5fcb6

Create a path for openWorldTypes and overloadCache

a52cfdd

buff up tests some more

1322d29

change type names

ef653b8

Additional changes before PR

73ef612

jitsedesmet commented Aug 20, 2021

View reviewed changes

rubensworks suggested changes Aug 23, 2021

View reviewed changes

jitsedesmet added 2 commits August 24, 2021 10:42

Reslove changes requested by @rubenworks & use 'context' exclusively

f08ddef

update README

376aacc

This was referenced Aug 24, 2021

Async super type discovery and getting full context in regular functions. #113

Closed

Wrong implementation of substr #114

Closed

jitsedesmet requested a review from rubensworks August 24, 2021 09:42

rubensworks suggested changes Aug 24, 2021

View reviewed changes

jitsedesmet added 5 commits August 24, 2021 12:53

Resolve second batch of changes requested by @rubenworks

1d1aada

Merge branch 'master' into feature/buffer-&-open-world-typing

fdd7288

Merge with master and create benchmark

c5e34b6

Create default OverloadCache when not providing one

d2fa9cf

update benchmark

047f66c

rubensworks merged commit b527ac2 into comunica:master Aug 25, 2021

This was referenced Aug 27, 2021

Feature/add experimental mode #117

Merged

Breaking changes #122

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/buffer & open world typing #112

Feature/buffer & open world typing #112

jitsedesmet commented Aug 20, 2021

coveralls commented Aug 20, 2021 •

edited

jitsedesmet Aug 20, 2021

rubensworks Aug 23, 2021

jitsedesmet Aug 23, 2021

rubensworks Aug 23, 2021

rubensworks left a comment

jitsedesmet commented Aug 23, 2021

rubensworks left a comment

rubensworks Aug 24, 2021

rubensworks Aug 24, 2021

rubensworks Aug 24, 2021

jitsedesmet Aug 24, 2021

rubensworks Aug 24, 2021

jitsedesmet Aug 24, 2021 •

edited

rubensworks Aug 24, 2021

jitsedesmet Aug 25, 2021

Feature/buffer & open world typing #112

Feature/buffer & open world typing #112

Conversation

jitsedesmet commented Aug 20, 2021

coveralls commented Aug 20, 2021 • edited

Pull Request Test Coverage Report for Build 1166602057

💛 - Coveralls

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rubensworks left a comment

Choose a reason for hiding this comment

jitsedesmet commented Aug 23, 2021

rubensworks left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jitsedesmet Aug 24, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coveralls commented Aug 20, 2021 •

edited

jitsedesmet Aug 24, 2021 •

edited