Skip to content
This repository has been archived by the owner on Sep 19, 2023. It is now read-only.

Feature/buffer & open world typing #112

Merged

Conversation

jitsedesmet
Copy link
Member

This pull request aims to close #109 .

What this PR does:

  • Split lib/transformation.ts in 2 classes that are easier to read and maintain.
  • Split lib/Aggragators in multiple files and create a base class for the evaluators to use instead of calling global functions.
  • Create some more super classes for the Evaluator files that makes us handle them more like the same. (Reduces the need to copy everything from sync to async)
  • Remove TypeCheckedLiteral
    Reason: A literal almost always has a known dataType. Only when transforming an RDF.Term to it's internal representation does it have an other type. Every operation on a custom function return a known type. For example providing '-5 ' with dataType myNegatives extends Integer provided to UnaryMinus should return the known known type integer and not something with dataType myNegatives. Typechecking every internal literal would thus be an unnecessary overhead.
  • Update tests/util/generalEvaluation I added a generalErrorEvaluation function. Using generalEvaluate in combination with an ErrorTable made SyncAvaluator not fully tested. When the AsyncEvaluator throws an error in an ErrorTable. We still want to test the SyncEvaluator. This function makes sure this happens by catching all thrown Errors.
  • Provides access to a shared (between sync and async) functionConfig from within functions that use the OverLoadTree. This allows functions to access the IOpenWorldEnabler. It also allows us to make IRI and NOW regular functions.
    I couldn't move BNODE because the sync and async implementation is different. It might be interesting to move this in the future since we probably want the AsyncEvaluator to be able to get an async SuperTypeDiscoverCallback. This should be possible but I would leave it for another PR. This one is big enough as is.
  • This PR will also close OverloadTree caching and open world types #109. It does this by requesting 3 extra arguments from the user:
overloadCache?: LRUCache<string, ImplementationFunction | undefined>;
typeCache?: LRUCache<string, GeneralSubExtensionTable>;
superTypeDiscoverCallback?: (unknownType: string) => string;

It should be noted that the values of the caches are not to be used by the user.
Not providing a typeCache results in us making one our self. Not providing an overloadCache will not create one.

@coveralls
Copy link

coveralls commented Aug 20, 2021

Pull Request Test Coverage Report for Build 1166602057

  • 627 of 677 (92.61%) changed or added relevant lines in 29 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.3%) to 90.622%

Changes Missing Coverage Covered Lines Changed/Added Lines %
lib/functions/NamedFunctions.ts 27 28 96.43%
lib/transformers/TermTransformer.ts 47 50 94.0%
lib/evaluators/evaluatorHelpers/AsyncRecursiveEvaluator.ts 23 28 82.14%
lib/evaluators/evaluatorHelpers/SyncRecursiveEvaluator.ts 22 27 81.48%
lib/functions/Helpers.ts 64 72 88.89%
lib/functions/RegularFunctions.ts 99 112 88.39%
lib/transformers/AlgebraTransformer.ts 59 74 79.73%
Totals Coverage Status
Change from base Build 1162151356: 0.3%
Covered Lines: 1503
Relevant Lines: 1625

💛 - Coveralls

string([ ...source ].slice(startingLoc - 1, length).join('')))
.onTernary([ TypeURL.RDF_LANG_STRING, TypeURL.XSD_INTEGER, TypeURL.XSD_INTEGER ],
(source: E.LangStringLiteral, startingLoc: E.NumericLiteral, length: E.NumericLiteral) => {
() => (source: E.LangStringLiteral, startingLoc: E.NumericLiteral, length: E.NumericLiteral) => {
const sub = [ ...source.typedValue ].slice(startingLoc.typedValue - 1, length.typedValue).join('');
return langString(sub, source.language);
})
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question here before I make an issue about it. I think this implementation is wrong. The third argument of slice is the endIndex. However the third argument of https://www.w3.org/TR/sparql11-query/#func-substr is the length of the substring.
This function is also based on https://www.w3.org/TR/xpath-functions/#func-substring . It talks about arguments being double and sparql talks about integers. Maybe we should provide both implementation? Doesn't hurt anyone?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this implementation is wrong.

That may be wrong indeed. Aren't there tests for this yet?

It talks about arguments being double and sparql talks about integers. Maybe we should provide both implementation? Doesn't hurt anyone?

I think there is no need to do that. If I understand the XPath spec correctly, doubles are only allowed there for backwards-compatibility, which is a problem we don't really have here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are tests but they test with arguments someStr 1 1 which is gives the same result.
Alright. Should I edit this in this PR or create an issue handling this small change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would do it in a separate PR right after this one. (or at least create an issue now)

Copy link
Member

@rubensworks rubensworks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general!

Just wondering I we shouldn't use the general config/context in places where the openworld config thing is passed.

lib/evaluators/evaluatorHelpers/BaseExpressionEvaluator.ts Outdated Show resolved Hide resolved
lib/functions/Core.ts Outdated Show resolved Hide resolved
lib/functions/RegularFunctions.ts Outdated Show resolved Hide resolved
lib/transformers/AlgebraTransformer.ts Outdated Show resolved Hide resolved
lib/util/TypeHandling.ts Outdated Show resolved Hide resolved
lib/util/TypeHandling.ts Outdated Show resolved Hide resolved
lib/util/TypeHandling.ts Outdated Show resolved Hide resolved
@jitsedesmet
Copy link
Member Author

Just wondering I we shouldn't use the general config/context in places where the openworld config thing is passed.

We could do this for sure. We can then also make SuperTypeCallback async. To handle this we would then also need to provide Static create method so the creation can be async. Some problems will arise handling normal construction but we'll see about that.

Copy link
Member

@rubensworks rubensworks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff, just some minor comments.

### Overload function caching

An overloadcache allows Sparqlee to cache the implementation of a function provided the argument types.
This cache is only used when provided to the context.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also say that this is safe to reuse across different evaluators, and that manual modification is not recommended.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, do the benchmarks show that this actually helps performance?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, make sure to explain that this improves performance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, do the benchmarks show that this actually helps performance?

Don't know yet because I don't have the benchmark in this PR.
Maybe we should merge the benchmark PR first?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, once you give the go-ahead, we can do that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a benchmark test and cleaned the benchmarks a bit. It seems like it's 2x as fast. (in this benchmark)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like it's 2x as fast. (in this benchmark)

Oh wow, that's quite significant! I didn't expect it to be that much faster. But that's a good thing :-)

Make sure to mention this in the docs!

We'll probably want to exploit this as much as possible from within Comunica then, so that we reuse this cache whenever possible. Perhaps you could look into this next?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright. I did some digging and it turns out my initial benchmark introduced in #106 was quite wrong. I updated the benchmark to return a mean execution time and this gave some more insights. I also added some benchmarks on past commits to find out where the big difference in execution time came from. It seems like the new type system is to blame.

I will provide a list of branches with added benchmarks and their reports.

From this we can conclude that the new type system is a lot slower then the old one. I already said that the integer addition is one of the hardest operations to perform on this system. Overall we can conclude that using a cache the new system is still faster than what we had before. I am personally really happy about this considering we have a much stronger type system that is fully spec complaint (ignoring the type promotion detail here).

A small thing to notice is how the overload tree is quite a bit faster than the Immutable list solution we had first.

README.md Outdated Show resolved Hide resolved
index.ts Outdated Show resolved Hide resolved
test/integration/functions/op.addition.test.ts Outdated Show resolved Hide resolved
@rubensworks rubensworks merged commit b527ac2 into comunica:master Aug 25, 2021
This was referenced Aug 27, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OverloadTree caching and open world types
3 participants