Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IntervalDict iteration over sorted atomic intervals #44

Closed
msarrel opened this issue Oct 15, 2020 · 8 comments
Closed

IntervalDict iteration over sorted atomic intervals #44

msarrel opened this issue Oct 15, 2020 · 8 comments
Labels
enhancement New feature or request question Issue is a question

Comments

@msarrel
Copy link

msarrel commented Oct 15, 2020

Hi,

Thank you for making all the improvements in your library in portion 2. However, one of your changes from python-intervals to portion broke my code.

In python-intervals, if I passed an Interval to the IntervalDict.get() method, that method would return the value if the Interval corresponded to a single value in the IntervalDict. In portion, now the get() method always returns a new IntervalDict, even if there is only one entry. I would like a way to have the old functionality as an option, while also keeping the new functionality.

The Python dict.get() method always returns the value, so I would maybe suggest that IntervalDict.get() follow that precedent, and throw an exception if there is more than value for the submitted Interval. Then, the new functionality could be provided by a new method, or by setting a flag to True in the existing IntervalDict.get() method.

Here is an example of my use case. My goal is to iterate over the values in the IntervalDict, one for single-valued atomic interval, in sorted order. I have a cheap work-around, but it fails for infinite intervals.

import portion

d = portion.IntervalDict()
d[portion.closed(0, 3)] = 'banana'
d[4] = 'apple'
d[portion.closed(5, 8)] = 'orange'
d[portion.closed(9, 12)] = 'banana'
d[portion.closed(13, portion.inf)] = 'pomegranate'

print()
print('# The interval dictionary')
print(d)

print()
print('# Sorted list of atomic intervals')
print(list(d.domain()))

print()
print('# I want the value, not another IntervalDict')
print('# This worked in python-intervals, but fails in portion')
for i in list(d.domain()):
    print(d.get(i))

print()
print('# This work-around produces the desired result')
print('# But it fails for infinite intervals')
for i in list(d.domain()):
    print(d.get((i.lower + i.upper)/2))

Here is the output I get when I run the above example.

# The interval dictionary
{[0,3] | [9,12]: 'banana', [4]: 'apple', [5,8]: 'orange', [13,+inf): 'pomegranate'}

# Sorted list of atomic intervals
[[0,3], [4], [5,8], [9,12], [13,+inf)]

# I want the value, not another IntervalDict
# This worked in python-intervals, but fails in portion
{[0,3]: 'banana'}
{[4]: 'apple'}
{[5,8]: 'orange'}
{[9,12]: 'banana'}
{[13,+inf): 'pomegranate'}

# This work-around produces the desired result
# But it fails for infinite intervals
banana
apple
orange
banana
Traceback (most recent call last):
  File "<input>", line 28, in <module>
TypeError: unsupported operand type(s) for +: 'int' and '_PInf'
@AlexandreDecan
Copy link
Owner

AlexandreDecan commented Oct 16, 2020

Hello,

Actually, you made use of an unsupported feature from python-intervals (I've just discovered how and why this works :-D). It's really an edge case that calling .get with an AtomicInterval behaves that way ;-) When I developed this method, I only thought about either passing Interval instances, or "single values", I've never thought about AtomicInterval. Since an AtomicInterval instance falls in the "single value" part of the if-else, it leads to if key in i: which turns out to be true because python-intervals defined containment for AtomicInterval as well.

In short, all this to say that I am surprised (positively) that it is possible to use AtomicIntervals as a key in python-intervals :-)
Obviously, as the AtomicInterval type no longer exists in portion this makes your situation more complicated :-)

I will look to see if there is a functional generic workaround, or if it is possible/useful to implement something that could solve your problem.

@AlexandreDecan AlexandreDecan added enhancement New feature or request question Issue is a question labels Oct 16, 2020
@AlexandreDecan
Copy link
Owner

I've a question about the expected output of your snippet when two keys are adjacent.

Consider for example the following interval dict:

import intervals as portion

d = portion.IntervalDict()
d[portion.closed(0, 2)] = 'a'
d[portion.openclosed(2, 4)] = 'b'

When I run your snippet, I got None.
The reason why I got None is because d.domain() returns [0,4], and this interval is never contained in any key of the interval dict.

I believe you would like to have [[0,2], (2, 4]] so that you get ['a', 'b'] as a result, right?

@msarrel
Copy link
Author

msarrel commented Oct 16, 2020

Yes, that is correct. I had planned to open a separate issue on that problem. Would you still like me to do that? I'd be happy to do so if it will help you.

BTW, here is a description of my full use case. The keys in my interval dictionary are made of Arrow datetimes, and the values are floats. My goal is to produce a graph of the values over time using plotly. I figured it would be more efficient to draw one horizontal line segment per atomic interval, rather than sample the interval dictionary at regular time steps. I also have code to draw the vertical segments to join the horizontal segments. That is why I need to process the atomic intervals in sorted order.

@AlexandreDecan
Copy link
Owner

Actually, I don't think it's an issue per se, it provides the expected output since IntervalDict were not expected to work with AtomicInterval. Anyway, that "potential issue" is no longer valid in portion given there is no longer an AtomicInterval type :-)

Considering your use case, I think the easiest way to achieve it is to (1) retrieve all intervals from the dict, (2) retrieve all atomic intervals from these intervals, (3) and to sort them.

To sort intervals, it's easier to define a custom sorting function based on the lower bound (its value and whether it is open or closed) as we do internally in portion. By default, sorted relies on < to sort elements of a list, but intervals are not comparable the usual way, as explained in the documentation. However, since the intervals stored in an IntervalDict are non-overlapping by definition, and since we will handle atomic intervals from these intervals, there is no issue to use sorted and <. This is not the case for overlapping intervals (e.g., [0,3] is neither < nor > than [2,4]), and not the case for non-atomic ones (e.g., [0,1] | [4,5] is neither < nor > than [2,3]). As a result, we can safely use sorted in our case.

In code:

items = []  # List of (key, value) where key is an atomic interval
for interval, value in d.items():
  for atomic in interval: 
    items.append((atomic, value))
items.sort()

Or more concisely, using a list comprehension:

items = sorted([(atomic, value) for key, value in d.items() for atomic in key])

The explicit sorting of intervals (in O(n.logn)) does not make this less efficient than the "old" solution based on d.domain(), since d.domain() creates an Interval instance that internally sorts intervals as well. Moreover, with the "new" solution, there is no more a need to lookup values from d (i.e. no more d.get(i), an operation that is not the cheapest one in an IntervalDict).

@msarrel
Copy link
Author

msarrel commented Oct 16, 2020

Yes, I already have code that does what you describe. I just thought it might be convenient to add a method to your library to perform that function. Or, perhaps an iterator to iterate over these intervals.

@AlexandreDecan
Copy link
Owner

IntervalDict.items() (and other view-based methods) already returns a sorted list of intervals, but these are not necessarily atomic. I already thought about adding a Boolean atomic parameter to the as_dict method of an IntervalDict, but I found the use case a bit too specific (and the "workaround" so simple) to take the plunge.

Moreover, now, I can even refer to this issue in the documentation :-D

@msarrel
Copy link
Author

msarrel commented Oct 17, 2020

I will look to see if there is a functional generic workaround, or if it is possible/useful to implement something that could solve your problem.

Do you still plan to fix the original question, giving an option to return a value when the user submitted an range that has only one value?

@AlexandreDecan
Copy link
Owner

Not for the moment, I'm not 100% convinced to be honest :-)
Maybe you can open another issue specifically for this, so we can discuss it?

sietse added a commit to sietse/portion that referenced this issue Apr 8, 2022
sietse added a commit to sietse/portion that referenced this issue Apr 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Issue is a question
Projects
None yet
Development

No branches or pull requests

2 participants