Skip to content
This repository has been archived by the owner on Sep 19, 2023. It is now read-only.

Enable no-redeclare rule #94

Closed
jitsedesmet opened this issue Aug 4, 2021 · 52 comments
Closed

Enable no-redeclare rule #94

jitsedesmet opened this issue Aug 4, 2021 · 52 comments

Comments

@jitsedesmet
Copy link
Member

jitsedesmet commented Aug 4, 2021

While trying to fix #93 it became clear I needed to disable the no-redeclare rule because an implementation of Set is used in the file lib/util/Consts.ts I couldn't just use the default implementation of Set because a union function is used. This issue, tho probably small aims to enable the no-redeclare rule.

@jitsedesmet
Copy link
Member Author

Since it's also time to move away from immutable I will do this in this issue as well. This issue will thus also remove all References to the Immutable package.

@jitsedesmet
Copy link
Member Author

I stumbled into a problem moving away from Immutable. In lib/functions/Core.ts we use an something called export type OverloadMap = Map<List<ArgumentType>, E.SimpleApplication>;. The basic transition would be to write export type OverloadMap = Map<ArgumentType[], E.SimpleApplication>; instead. Problem with this is that array doesn't define the kind of equality we want here. I don't really understand this code very well yet. So I'm already asking, do you (in particular @wschella) know any good solutions for this.

@jitsedesmet
Copy link
Member Author

I also don't think @rubensworks will let me write my own equals within the Array prototype :p .

@wschella
Copy link
Member

wschella commented Aug 6, 2021

This is the indeed some difficult code, as we basically define a DSL for function overloading in Typescript in type safe way. Lemme check! In the meanwhile, what's the reason with moving away from immutable (i'm not up to date with the ecosystem)?

Context: A couple of years ago I think Immutable was heavily used in comunica, and @rubensworks was definitely fan of using it in Sparqlee as well. I needed a List object that was a valid entry for a HashMap key (which a standard JS is not). I'll have a look at the code, I have some ideas.

@jitsedesmet
Copy link
Member Author

I don't really know the reason. Maybe @rubensworks isn't that good of a friend with them anymore. I love that you have some ideas because I am fresh out of them? Btw, what does DSL stand for? (sorry)

@wschella
Copy link
Member

wschella commented Aug 6, 2021

So the general gist behind OverloadMap is that SPARQL functions are overloaded (the same function has different amount of arguments, or the arguments can be different types). OverloadMap keeps an implementation of this function for each overload. Overloads are uniquely defined by the list of arguments to the function and their types (which is runtime information). So when we know at runtime which concrete types are passed to our function, we can look up the corresponding definition in the overloadmap.

It becomes more difficult due to subclassing/subtypes. When a function takes a generic RDF Term, it can be a Literal, a blank node, an IRI. So we can't just lookup the concrete types, as those might not be in the overload map (only a definition for Term would be there). This is why we have the monomorphization function: it looks implementations up in the overload map in manner of progressing generality: it first looks up the most concrete types (these are specific literal types, such as int and string), then more general terms (literal, blank, IRI), and only then do we consider the arguments as general types. THIS IS VERY LIMITED, it's basically sufficient because there are no overloads that take different levels of granularity, e.g. if a function existed that takes 2 arguments, with 3 overloads: (int, term), (str, term), and a fallback (term, term), we would always use the fallback (as the second argument wouldn't match in the overloadmap, until we come to the last monomorphization step).

More soon.

@wschella
Copy link
Member

wschella commented Aug 6, 2021

DSL is Domain Specific Language.

@wschella
Copy link
Member

wschella commented Aug 6, 2021

Note: when I say types in the above, these are types of the RDF terms expressed in strings, not TS types!

@wschella
Copy link
Member

wschella commented Aug 6, 2021

Option 1 (the straightforward approach): You create a class that wraps plain JS arrays, but defines a sound equality operator, e.g. HashableList. This can be a drop in replacement for the Immutable List, and then you don't need the Immutable Map anymore either. This is actually not that difficult, because for our use, the only entries for HashableList will plain strings. So if you have deterministic order (you have in plain JS arrays), you can basically concat these strings together and use that as a key. At least equality is actually well defined for our usecase.

Option 2 (fix the problem in the meanwhile): Now we have a single Map per function, but what we actually want is a tree (of maps). Bear with me.

The root level map has as keys the allowed types of the first argument of the function, and as values it has maps. These maps have as keys the allowed types of the second argument, and as values it has maps.
These maps have as keys the allowed types of the third argument, and as values... you get where I'm going.

Now I say "the values are maps", these are the intermediate nodes of the tree, but the values can also be concrete implementations, these are the leaves of the tree, and are reached when the function can take no more arguments.

The advantage is that this does implementation resolving correctly (and more flexible), monomorphization is still in constant time, and the keys (which are the type of an argument) are just a single string, so we don't even need an ES6 map, a javascript object will do.

Type signature: Overloads = { [key: string]: E.SimpleApplicaton | Overloads } or smth like that.

@jitsedesmet
Copy link
Member Author

I do like you second approach. I think I'll try that. I have one question tho (I might be able to just look it up to but you know), is there a function that tests if a certain type is inherited? I know it's not clear what I mean, an example should do it. You said:

When a function takes a generic RDF Term, it can be a Literal, a blank node, an IRI.

I can imagine this also goes for types like double and int? Do we have a function that would test, provided an int we know it's also a doulble? Or is this not necessary?

My question just boils down to the fact that I don't understand how the sub-classing problem would be solved using the type signature you mentioned

Type signature: Overloads = { [key: string]: E.SimpleApplicaton | Overloads } or smth like that.

@rubensworks
Copy link
Member

+1 on the raw hash-based approach. Constant-time lookups is definitely something we want here, as performance is key.

@jitsedesmet
Copy link
Member Author

So you prefer the first one better? Alright, I guess I'll be checking that one out then :D .

@wschella
Copy link
Member

wschella commented Aug 7, 2021

Both are constant time, since the amount of arguments is constant (with the exception of varargs, but there you can't escape it).

I think performance overhead will be negligible, or potentially even better in the tree based approach, as no string manipulation needs to be done. It's just (1/2/3/ key lookups) times (superclassing checks) vs (strinconcat) times (superclassing checks).

The more structure is encoded in the data, the less needs to be done at runtime.

@wschella
Copy link
Member

wschella commented Aug 7, 2021

is there a function that tests if a certain type is inherited?

There is not, cause all functions are 'inherited'. You have 3 options: (abstract) Literals, Blank Nodes, and IRIs, all 3 of which are a Term. The literals are divided again into concrete types: int, float, string, ...

There are also some automatic casting rules IIRC (which are currently not implemented), which might be able to be solved with the tree approach.

Or is this not necessary?

So I don't think this is necessary. We now the exact inheritance tree. It's static. So just "hardcode" it.

My question just boils down to the fact that I don't understand how the sub-classing problem would be solved using the type signature you mentioned

The subclassing isn't really solved by the type signature, but by the way it's structure allows us to select concrete types. Now, we can incrementally interpret each argument more general, on a per argument basis. This is just a way to check all possible options/combinations. You could also do this with the plain map, but then you would have A LOT of options, increasing combinatorically for each extra argument and allowed type.

I'd be happy to have a call about this.

@wschella
Copy link
Member

wschella commented Aug 7, 2021

+1 on the raw hash-based approach. Constant-time lookups is definitely something we want here, as performance is key.

It might not be a bad idea to start with the hash-based approach in any case, just to get a feel for the code. It's not gonna be an easy ride. If there's time left, the tree approach can be done still (which would be very cool!).

@rubensworks
Copy link
Member

Ah no, I was actually referring to the second approach (I was referring to "so we don't even need an ES6 map, a javascript object will do").

For reference, this issue sounds a bit similar to an implementation of the MINUS operator I recently creator, where I ended up creating a tree-based approach using hashes as well: https://github.com/comunica/comunica/blob/master/packages/actor-query-operation-minus/lib/BindingsIndex.ts

@jitsedesmet
Copy link
Member Author

I think I understand what you mean Wout. Having a call to make sure we're on the same page on this would not be that bad since it is pretty complicated. And a vital piece of code considering how many tests broke :p .

@jitsedesmet
Copy link
Member Author

After calling we decided to also fix the autocast issue mentioned earlier (hard coded and starting only for xsd). I will briefly describe what we'll try.
We'll map each xsd type to a list of types it extends. In an initialization fase we'll transform this so every type maps to an object. This object's keys are all the sub types. Note that every one of these objects will contain term in it's keys! We will call this object SmartCatingLookupTable.

Now, the overload map itself will have a type like OverloadNode | OverloadLeaf. A leaf should extend a node so we don't have any weird type checking. On each node we can call getSubTreeWithArg. To this function, we'll provide the argument type. The node class will then return the necessary subtree, for this choice it will consult the SmartCatingLookupTable. When we need the function implementation we could call getImplementation on every node. This way a node could return a function that throws an error like Function overload didn't match.
Note that we now actually don't need a specific OverloadLeaf type since it wouldn't be much different from OverloadNode. (They have the same functions).

Please know this is a first draft of what we'll try to implement and it can still be changed significantly.

@rubensworks
Copy link
Member

rubensworks commented Aug 9, 2021

Or in pseudo-code:

// During init phase
const extensionTableInput = {
  xsd:int: [ xsd:signedInt, xsd:unsignedInt ],
  xsd:signedInt: [ xsd:someOtherSubTypeOfSignedInt ],
  ...
}

// During init phase
const extensionTable: Record<string, Record<string, boolean>> = deriveExtensions(extensionTableInput); // same as SmartCastingLookupTable?

// To check if a type is valid (sub)-type at runtime
const valid = xsd:signedInt in extensionTable[xsd:int]

@jitsedesmet
Copy link
Member Author

Just to be absolutely clear and because I've made this mistake. The extensionTableInput asks the question:
Can a function expecting a receive b.
or also: Can a function expecting xsd:int receive xsd:unsignedInt.

@wschella
Copy link
Member

wschella commented Aug 9, 2021

Will you handle being a subtype of Literal and Term with this table as well? (but I'm guessing not, since this approach only makes sense for concrete types).

Anyway this is very much the right direction!

The key part to focus on is rewriting the monomorph function, which finds the implementation with highest "concreteness" matching the argument types. I suggest trying to write this down first, in pseudo-code if must be. On of annoying things about the tree i just realized is you might find a mismatch (a type error) at any level of the tree, so to find the correct implementation you need to do a depth first search. The natural way to implement this is recursion (which is also how .getImplementation must work like you describe it), but I guess a loop would be faster in the JS runtime.

Anyway, you will hit another difficulty with the extensionTable, specifically, how do you find the xsd:int in const valid = xsd:signedInt in extensionTable[xsd:int] from the Overloads? (i assume xsd:signedInt is the argument type here), a possible Overload might look like this:

{
   'xsd:int': ... implementation ...
   'xsd:string': ...implementation ...
   'literal': ...implementation...
   'term': ...implementation...
}

So from a map like that you need to extract xsd:int and xsd:string (and filter out literal and term), and check in the extensionTable for which the argument is valid for each one.

I think I should also warn against moving to a more object-oriented approach (OverloadNode & Leaf) at this stage (vs the functional style it is now), although I give this warning with low confidence. I don't think the surrounding code is going to work well with it (especially the function definitions DSL)., and rewriting everything at once might be a mess.

@jitsedesmet
Copy link
Member Author

I just noticed that our tree structure will need to be searched depth first. Example:
We have a function f. Implementations: f(a: term, b: term) and f(a: xsd_int, b: int)
We call with a: xsd_int and b: xsd: double. The first search will result in a node that will only result in a function when provided an xsd_int. Our second argument doesn't match. However, our first argument matched with term as well. We should also check this case. I think some kind of depth first search algorithm will be needed here.

@wschella
Copy link
Member

wschella commented Aug 9, 2021

Sniped!

@jitsedesmet
Copy link
Member Author

jitsedesmet commented Aug 9, 2021

To find the most correct type I think we might be able to count how many iterations we needed (maximum) before getting a type. For example
int -> unsigned int -> term
int -> signed int -> unsigned int -> term
We will have
int: 0; unsigned int: 2 ( max(1, 2) ), signed int: 1; term: 3 ( max(2, 3) )

This way we can make some kind of stack and have the order of the stack defined. (First the first argument by 'correctness', then the second by correctness).
Hope this makes sense?

@rubensworks
Copy link
Member

To find the most correct type I think we might be able to count how many iterations we needed (maximum) before getting a type.

I probably wasn't clear enough in my pseudo-code, but I would suggest to pre-calculate all possible extension types of each given type during the init phase (deriveExtensions). (I don't expect there to be too many to cause memory issues)

So that extensionTable[xsd:int] contains xsd:signedInt andxsd:unsignedInt, but also the sub-types of them (and recursively).

In that case, lookups can be done in constant time, instead of n.

@jitsedesmet
Copy link
Member Author

I know I would also just calculate these max distances during the deriveExtensions phase.

@rubensworks
Copy link
Member

Not sure what the value of knowing these max distances is then.
You will know in constant time if a type is a sub-type of another type, which is sufficient I guess?

@rubensworks
Copy link
Member

The question for me is: will you do this also for Literal (which will contain all literal subtypes), and Term (which will contain all subtypes)?

I would only use this system for literals with non-default datatype.
RDF.Literal and RDF.Term are probably better to handle separately.

So the two-tier system you talk about makes sense to keep.

@jitsedesmet
Copy link
Member Author

You will know in constant time if a type is a sub-type of another type, which is sufficient I guess?

Yes, in a way. But take f(a: xsd_int) and overloaded f(a: xsd_double) we call f with a: xsd_unsigned_int. Both overloads match. But looking at this, we would say xsd_int is closest to xsd_unsigned_int? So we should call f(a: xsd_int)?

@wschella
Copy link
Member

wschella commented Aug 9, 2021

Suggestion: what if we turn extension map on it's head?

For every type, we have an (ordered) list of supertypes?

Then what you do is just check in order whether that supertype is present as a key in the overloads. I think this also matches more closely with how sub/super method resolving works in OO programming.

This also extends nicely to Literal and Term, which should just always be at the end of the list (which can ofc be handled specially), and is just a generalization of the way it currently works.

E.g. { 'float': ['float', 'double', 'numeric', 'literal', 'term'] } (numeric is a thing in the spec)

@jitsedesmet
Copy link
Member Author

That could work but would require linear time in term super types (which could be pretty big?) I think linear time in term of overloads would be better? So for a certain node we would check all types it can accept (this is max like 3 or something?) and then take the one with the lowest distance I already spoke of? This would be n log (n) (we need to sort the distances so we can add them to the stack in the right order) n= arguments overloaded.

@jitsedesmet
Copy link
Member Author

jitsedesmet commented Aug 9, 2021

If I'm not mistaken your approach would be n*m (n= overloaded arguments, m= amount of supertypes) ? For every overloaded type you would need to check if it is contained within the list of super types. Although writing this i realize you could also make this the same n log(n) I talked about in my approach.

@wschella
Copy link
Member

wschella commented Aug 9, 2021

I think you can ignore the overloads of types that are adjacent (not in sub/super relation) with the approach I suggested.
E.g. with the following overloads

{
   'xsd:float': ... implementation ...
   'xsd:string': ...implementation ...
   'literal': ...implementation...
   'term': ...implementation...
}

and an argument of type xsd:float, you can ignore xsd:string. You only check 'float', 'literal', 'term' (in that order), anything that is not in the supertypes map is not relevant.

@wschella
Copy link
Member

wschella commented Aug 9, 2021

But my approach is not really relevant since I do like the idea of the orginal extension types: it's less "clean" (not the usual way to do it), with it will be faster likely because there will be less overloads than supertypes generally.

But what you need is not a distance, but an ordering.

If you have f(a: xsd_int) and overloaded f(a: xsd_double), xsd_int always takes priority, because it is more specific. There is no distance.

@wschella
Copy link
Member

wschella commented Aug 9, 2021

Your types and their sup/super relations form a tree. Each of the types is thus at a specific level (or depth).

What you can do is walk over the overloads in deepest to highest. If the argument's type is deeper (or at the same level) and part of that subtree (in extensionmap), it fits.

If extensionmap is a set, this is (constant time) * n_overloads.

rubensworks pushed a commit that referenced this issue Aug 9, 2021
* Fix non 'Immutable' no-redeclare rule

* Quicksave before bos battle (Converting Helpers)

* Why do we not have unit tests? Debugging this is gonna hurt

* Comiling & tested version after enabeling no-redeclare rule #94

* Fix backwards compatibility issue

* Resolve changes requested by @rubenworks

* Resolve changes requested by @wschella
@jitsedesmet
Copy link
Member Author

jitsedesmet commented Aug 9, 2021

Question for @wschella regarding implementation. Online I found an overview of all xsd types and their relation. Within sparqlee I found an object with type URL's. I was wondering whether there's a reason not all xsd types are in the object?

@wschella
Copy link
Member

wschella commented Aug 9, 2021

Because I think these are the only ones the SPARQL spec talks about (in all function specifications).

I also see no mention of http://www.w3.org/2001/XMLSchema#dayTimeDuration there, which is definitely required for SPARQL

@wschella wschella closed this as completed Aug 9, 2021
@wschella wschella reopened this Aug 9, 2021
@wschella
Copy link
Member

wschella commented Aug 9, 2021

It would be very nice to create such a graph ourselves and add it to the repo. E.g. SPARQL also uses the terminology "numeric", and I don't see any reference to langString in the this schema either.

@rubensworks
Copy link
Member

It would be very nice to create such a graph ourselves and add it to the repo.

+1, this would be highly valuable! (might even want to add it to the comunica website)
But I guess we can start with just adding it to the code, and create such a graph once everything is finalized and this issue is resolved.

@jitsedesmet
Copy link
Member Author

So you're saying this is still incomplete?

@wschella
Copy link
Member

wschella commented Aug 9, 2021

@jitsedesmet Yes and no, I think how this organization maps to the types as mentioned in the SPARQL spec is not clear to me, so for our purpose it's incomplete.

There needs to be explicit mention of at least 2 extra concrete types:

Also many other types are largely irrelevant, but could be included for completeness i guess.

@jitsedesmet
Copy link
Member Author

So what are you suggesting? Support all types (the one I found yesterday + the object with type URL's? Or support only the object? I wouldn't see a reason to support something we don't need? It also wouldn't be consistent, we need to draw a clear line on what we support and what not.

@wschella
Copy link
Member

I suggest you do what you think is best! And I agree it's no need to support things we don't need and a clear line is better.

@jitsedesmet
Copy link
Member Author

I'll try with just the types already within sparqlee.

@jitsedesmet
Copy link
Member Author

For now, I'll be working with the types from https://www.w3.org/TR/xmlschema/#built-in-datatypes. There seems to be no reason to take http://www.w3.org/1999/02/22-rdf-syntax-ns#langString into account as its definition is:

rdf:langString a rdfs:Datatype ;
	rdfs:subClassOf rdfs:Literal ;
	rdfs:isDefinedBy <http://www.w3.org/1999/02/22-rdf-syntax-ns#> ;
	rdfs:seeAlso <http://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal> ;
	rdfs:label "langString" ;
	rdfs:comment "The datatype of language-tagged string values" .

In other words, it seems to be just a literal? It doesn't say its a xsd string?
As for http://www.w3.org/2001/XMLSchema#dayTimeDuration it is mentioned in the document under Other Built-in Datatypes. Our line can thus just be this document? (This note is still a work in progress)

@jitsedesmet
Copy link
Member Author

@wschella , since I'm editing the type links i wonder if I can make the links used in these objects just point to TypeUrl his version of those links? They should be the same anyway?

@wschella
Copy link
Member

@wschella , since I'm editing the type links i wonder if I can make the links used in these objects just point to TypeUrl his version of those links? They should be the same anyway?

If you can that would be nice! When I wrote this TS couldn't handle this cleanly yet.

@wschella
Copy link
Member

For now, I'll be working with the types from https://www.w3.org/TR/xmlschema/#built-in-datatypes. There seems to be no reason to take http://www.w3.org/1999/02/22-rdf-syntax-ns#langString into account as its definition is:

rdf:langString a rdfs:Datatype ;
	rdfs:subClassOf rdfs:Literal ;
	rdfs:isDefinedBy <http://www.w3.org/1999/02/22-rdf-syntax-ns#> ;
	rdfs:seeAlso <http://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal> ;
	rdfs:label "langString" ;
	rdfs:comment "The datatype of language-tagged string values" .

In other words, it seems to be just a literal? It doesn't say its a xsd string?
As for http://www.w3.org/2001/XMLSchema#dayTimeDuration it is mentioned in the document under Other Built-in Datatypes. Our line can thus just be this document? (This note is still a work in progress)

There definitely is reason to take rdf:langString into account, as many SPARQL functions work specifically on that datatype! For these things, I always worked from the types as mentioned here https://www.w3.org/TR/sparql11-query/#OperatorMapping and here https://www.w3.org/TR/sparql11-query/#SparqlOps

@wschella
Copy link
Member

You'll for example also see mention of 'numeric' there. We want to represent this in the graph as well I think, as the graph should map cleanly to the SPARQL spec.

@jitsedesmet
Copy link
Member Author

@wschella , since I'm editing the type links i wonder if I can make the links used in these objects just point to TypeUrl his version of those links? They should be the same anyway?

If you can that would be nice! When I wrote this TS couldn't handle this cleanly yet.

Apparently it still can't. I should have tested this before asking the question. Sorry

@jitsedesmet
Copy link
Member Author

After having a call with @wschella and considering his comment #101 (comment) . We probably want to rewrite lib/functions/Helpers.ts as well to use the OverloadTree created in that PR more closely. In combination with the auto cast feature we can clean that file quite a bit.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants