-
Notifications
You must be signed in to change notification settings - Fork 7
Auto casting arguments using OverloadTree
#102
Comments
👍
For this, I would recommend allowing this cache to be retrieved from the evaluator, so it can be reused in other evaluators by passing it into |
If we can use a cache, why don't we just precompute? If we do the cache with So the question/tradeoff for me is: you can generate this 'cache' in advance for all types, but it's a combinatorial explosion of entries in the map (i.e. use the tree only as an intermediate representation, and flatten it). Might that be worth it (a bit more memory, but O(1) lookups)? I don't really know. The cache might hit a good sweet spot. I also think the lookup in the tree is quite cheap, and it might be improved a bit still. So implementing a cache might be overkill to start with. Especially if it's configurable, as that would require exposing our internal implementations for all functions! Also let's get terminology straight first, there are at least 2 distinct cases (that I know of):
We can refer to all of these generally as auto-casting, but they're distinct cases w.r.t. to their type relations! I think this is the moment to fix all these things completely. |
Yeah, I also don't know. It may be worth it to precompute. |
Thinking a bit more on it, another argument against precompute is that might add a non-negligible overhead to the initialization phase that will not matter for most expressions. You save no time by doing precomputation, you just move it from the moment of first lookup to all computations in advance, which is possibly a big waste, and really only advantageous if you can do it when you're waiting for e.g. IO. Thus: we can consider the cache more like a lazy computation approach. You can argue that when you have a cache with unlimited size, you do no extra computations, and you only do them when it's needed. (I think there's little reason to limit the cache size). Conclusion: I'm fan of the cache! (especially as the different Bindings in a stream will have likely have very similar types) |
I'm glad you like the cache idea. I have a few question about what you said.
I also want to clarify that the a cache item would (partly) be identified by the node. There would not be any reason to keep a list in the cache like we did with the You also said:
Is it good if I look at these issues after the first PR handling what I meant with auto-cast is done? (That PR will be big enough on itself :) ) |
EDIT: This comment contains false information!@wschella pointed out
It should be noted that we can easily implement this behavior in #103 , we would just need to edit the |
How do you let someone add cache entries without giving access to the possible values (i.e. our function implementations)? But I think I have understood wrongly, I believe the idea actually is just to copy the cache from one evaluator to the next?
Okay, but a cache is a (key, value) store. What are the keys? I believe ArgumentType[], since that's what coming in at runtime? The whole point of the OverloadTree is to associate ArgumentType[] with one of it's nodes (and mostly a specific implementation), so how could you use the Node as a key, if you have only have access to ArgumentType[]?
The problem is I'm not sure what you mean with auto-casting right now! I think it's actually #13 ? So it might be worth checking these out (the first 2), just to have a more formal framework to reason in. |
Indeed. No need for external modification of the cache I think. |
Yes
I'm thinking about making the Tree know what function (as a string name) it represents calling this
args.map(arg => arg.termType === 'literal' ? (<E.Literal<any>> arg).type : arg.termType).join(''); This would make one long string we can use as a key. This string would then be: const key = this.functionName + args.map(arg => arg.termType === 'literal' ? (<E.Literal<any>> arg).type : arg.termType).join(''); It is extremely possible that there is another solution but this already gives some idea as to where I would go with this. Some food fo thought. 😄
I think so too. |
Reading https://www.w3.org/TR/xpath-31/#promotion it says:
So what we did with being able to give I have but 1 idea on how to fix this problem properly (if it's even possible to make these casts) |
Not sure I understand the problem. I see this being mentioned in the spec (https://www.w3.org/TR/xpath-31/#promotion):
So that's exactly what we're trying to achieve, right? |
Yes, but it's not what #103 does right now. Right now it will keep the type |
The type change you're talking about, isn't this something we need to do in all cases? |
That's not how I understand https://www.w3.org/TR/xpath-31/#dt-subtype-substitution . It says:
|
Then I don't know what is supposed to happen 🤷 |
Responding on #102 (comment), regarding caching architecture.
Now, I have another idea entirely for the cache (i.e. the lazy computation): we also make it a tree. So imagine you have a certain call A cache to copy from an other evaluator is just a tree to be merged then. This embodies the idea that we just lazily compute the OverloadTree. The tree that we have now is just the minimal representation, and each concrete type lookup adds a fast path. This also makes "precomputation" a spectrum, we could choose to add these fast paths for certain type & function combinations we now to occur a lot, but are not in the minimal tree. Opinions? If this is what you meant in point 1, please forgive me ^^ [1] For search, I believe this is the order to be respected at each node:
Do you think this is correct? |
My opinions on this:
I think an array we edit our self will be much faster (so what we have now). We never copy the array? This is an ideal search strategy?
Yes exactly! That's what the idea for the caches is. I think we mean the same things but just give them different names. I would keep using the cache system just because it enables some control over the resources. This can be a very good thing to have? I like the name I don't really understand the last order you mention but I think that doesn't matter as much? |
True, I was mistaken in thinking that
The order I mention there defines which implementation we should prefer. If multiple match: i.e. a definition for f(byte) is prefered over f(term) if the arg has the type And just to be sure that we are on the same page, in my head:
|
Absolutely and also avoiding the depth first search of course.
I see where you're coming from. I first thought the same thing. However, @rubensworks pointed out that it is importend to be able to reset these fast paths. Implementing fast paths would create a global state. We need to have absolute control over this state. I would give this control by making a cache a separate data structure that could be provided to the search function (or some other access point in that vicinity) |
Why do you want to reset the fast paths? |
Mainly for test purposes and control over the state. reproducibility |
If your cache is a separate data structure, you need depth first search again[1], and when you find now match you now must start again from the beginning in the OverloadTree. What I do understand is a need to have a separate tree instance for each evaluator, but this can be the same data structure, with the same logic. But this can be solved by just copying the tree[2], which is then done for each evaluator. All state can be reset by making a new evaluator. I don't see the need for more fine grained control? [1] Because you can have partial matches (i.e. a cache hit for the first 2 arguments, but not the third). Although it's a lot faster, cause there's no need to handle subtype substitution and promotion and such, but still. |
If a full match is not made, we need to do the complete search! We can not cut off the first 2. The same reason we need DFS in the first place.
We could do that but I thought you weren't a fan of making the complete path from evaluator to overloadTree class based? Anyway. I think you're approach is a little complicated? Might be because I don't understand it? |
With a separate datastructure, you need 2 complete searches! While in the overload tree, the fast path is just the first option considered, if it fails, search resumes as it normally would. E.g. if you have 2 matches with concrete types, but if the last argument fails, you can check first if e.g. the last argument also accepts a term.
I don't really understand what this means. Caching is indeed something to be left for another PR. Maybe we should make a new issue for it? I'd be happy to have a call discussing this further. |
We could make a new issue. This issue should also take the additionalExtensionTypes into account as this will change the way we cache in some ways. |
This issue can be closed. I wanted to type 102 instead of 103 in #103 (comment) . |
With the newly added (#101 )
OverloadTree
we can enable auto casting like talked about in #94. I create this new issue because #94 is getting a little big and would get much bigger. I'll link some of the, in my opinion, most relevant comments for this issue from #94 here.no-redeclare
rule #94 (comment) and Enableno-redeclare
rule #94 (comment)no-redeclare
rule #94 (comment)When using auto cast, it could happen that searching an overload operation is pretty expensive. The chosen overload for a given array of arguments will never change however. Each node could have some kind of cache to keep the already calculated
searchStack
in. This cache would make sparqlee have a global state. We want to take control over that state so when implementing the cache, we should provide functions to reset and maybe set the cache status. A package that could be used for this is: https://www.npmjs.com/package/lru-cacheIf I forgot something, please add it in the comments. The issue became kinda big to keep track of 😅
The text was updated successfully, but these errors were encountered: