Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on querying IntervalDict #45

Closed
fabiomargarido opened this issue Nov 11, 2020 · 9 comments
Closed

Question on querying IntervalDict #45

fabiomargarido opened this issue Nov 11, 2020 · 9 comments
Labels
question Issue is a question

Comments

@fabiomargarido
Copy link

First of all, thank you so much for the incredibly useful library!

I'd like to know if it is possible to obtain the interval which was used to store a value in an IntervalDict when querying.

For example, if I store something with d[P.closed(0, 10)] = 'whatever' and then query the dict for d.get(P.singleton(5)), is it possible for me to know that the interval in the dict that contains the interval I used for querying is P.closed(0, 10)?

Thanks again

@AlexandreDecan AlexandreDecan added the question Issue is a question label Nov 12, 2020
@AlexandreDecan
Copy link
Owner

AlexandreDecan commented Nov 12, 2020

Hello,

I'm afraid the answer is no, there is no direct way to access the exact "keys" that were used to store a value.
However, you can rely on d.find(x) to retrieve all keys whose value is x. For example:

>>> d = P.IntervalDict()
>>> d[P.closed(0, 10)] = 'whatever'
>>> value = d.get(P.singleton(5))
>>> d.find(value)
[0,10]

But it does not preserve the exact keys that were used to add a value into the dict, as illustrated next:

>>> d = P.IntervalDict()
>>> d[P.closed(0, 10)] = 'whatever'
>>> d[P.closed(10, 20)] = 'whatever'
>>> value = d.get(P.singleton(5))
>>> d.query(value)
[0,20]

... and not [0,10] (the interval corresponding to match the one defined for P.singleton(5)) since the internal storage has been changed from [0,10] -> 'whatever' to [0,10] | [10,20] -> 'whatever', hence [0,20] -> 'whatever'.

I don't know if this "solution" based on find suits your needs. I hope it does, because I don't think there's any solution based on IntervalDict to exactly retrieve the keys used during creation of the dict.

Notice that the above solution is not very efficient, since it requires (1) to find intersections with P.singleton(5), and (2) to iterate on all stored values to find 'whatever'. If performances are an issue, I suggest to "manually" iterate on the IntervalDict.items() (or on d.keys() if you don't need the value), this will be slightly more efficient:

>>> for key, value in d.items():
>>>   if P.singleton(5) in key: 
>>>    break
>>> key
[0,20]

@fabiomargarido
Copy link
Author

I was actually not clear enough/oversimplifying when I said I wanted the interval used to store the value... 😄 What you describe is exactly what I would want, the interval that currently points to a value, even after new insertions caused it to join/split.

In my use case dicts should be small enough that the performance penalty wouldn't be a problem, so the query solution is probably enough.

Thank you, I'll give it a go!

@AlexandreDecan
Copy link
Owner

My bad, it's not .query (that does not even exist) but .find :-)
I'll edit my answer :)

@fabiomargarido
Copy link
Author

Ah, thanks.

Just to add to the discussion, I'm not sure how frequent this use case is for users of the lib in general, but would it be possible in theory to add a new convenience method get_with_interval or whatever that would return a tuple with both the key and the value?

@AlexandreDecan
Copy link
Owner

Thanks for the suggestion. However, it would be a simple wrapper around find or even around items (as illustrated in my first answer). I will consider providing such convenience method if I can find a way to implement it more efficiently than simply looping over items. That's definitely something I'll keep in mind ;-)

@fabiomargarido
Copy link
Author

Sorry to come back to this after so long, but I've gone to implement your suggestion today and one drawback it has is the possibility that the same value is associated to multiple intervals in the dict. So when I call find, it would have to take the extra step of iterating over possible returned intervals to check which of them contains the value I used to call get.

One (admittedly not too huge) more reason in favour of the convenience method... 😄

@AlexandreDecan
Copy link
Owner

I'm really sorry but I did not understand your question ;-) could you clarify what you do, what's you got and what you expected? I'm willing to help you ;-)

@fabiomargarido
Copy link
Author

Sorry for not being clear enough... :)

What I meant is, if there are 2 disjoint intervals, say [0,10] and [20, 30] that are associated to the value 'whatever', when I do find I'm going to get both, so in order to know which atomic interval contained the value I supplied to get I'll have to break the returned interval into its atomic parts and check each one.

Not sure that helped or added to the confusion... 😄 Basically the idea is that when I do d.get(P.singleton(5)) I want to retrieve [0, 10] and if I do d.get(P.singleton(25)) I want to retrieve [20, 30].

@AlexandreDecan
Copy link
Owner

AlexandreDecan commented Nov 30, 2020

Ok, I got it ;-) Thank you ;)

Indeed, the IntervalDict container is not very helpful for this use case, even if I'm convinced it has the appropriate behaviour (hence, as you suggested, the idea of adding a new method). To be honest, currently, I don't see any better workaround than the one you suggest (i.e. iterating on the returned intervals). Actually, I don't even see any other way to implement this efficiently directly in the IntervalDict class, but I guess that the new IntervalTree I'm currently working on would address this issue (because in such new container, atomic intervals are considered as a first-class citizen). I've no ETA on this, unfortunately :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Issue is a question
Projects
None yet
Development

No branches or pull requests

2 participants