Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HeapPriorityQueue doesn't let one iterate in unsorted order #140

Closed
sortie opened this issue Jun 18, 2020 · 8 comments
Closed

HeapPriorityQueue doesn't let one iterate in unsorted order #140

sortie opened this issue Jun 18, 2020 · 8 comments
Labels
type-enhancement A request for a change that isn't a bug

Comments

@sortie
Copy link

sortie commented Jun 18, 2020

My code has several huge priority queues using HeapPriorityQueue, where it occasionally needs to consider all the elements in the queue where the order doesn't matter (e.g. to tally up how long it will take to process the queue, or to consider all the items and move a few to another queue if circumstances changes).

However, HeapPriorityQueue only offers toList() and toSet() for this purpose. However, they spend O(nlogn) to sort the queue (and I need a linear time algorithm) and duplicates the data structure (I just need constant memory). The only way to iterate the queue in linear time in unsorted order is to use removeAll(), however it is destructive.

The HeapPriorityQueue internally just uses a list to manage the data structure, which is what is returned by removeAll(). It could easily gain a Iterable<E> get unsorted getter or something that returns the backing data structure with the appropriate concurrent modification protections.

This is probably a breaking change and needs a major version bump in case anyone implemented HeapPriorityQueue.

I'm currently working around the issue associating each priority queue with a Set in my mod, however it duplicates the amount of memory spent on the queues.

@lrhn
Copy link
Member

lrhn commented Sep 9, 2020

Adding a member to an implementation class is not a breaking change, so we can add it to HeapPriorityQueue. It's potentially breaking to add it to the interface PriorityQueue, since that is intended for general use, and someone might be implementing it.

It's somewhat annoying to add an operation which is not general to all PriorityQueues, but since some priority queues could easily keep objects in priority order all the time, an unorderedElements iterable getter wouldn't make sense for them.
The class interface itself does not implement Iterable, so we haven't promised any way to access elements other than the "highest priority" one, and some priority queues might not even be able to do so (unlikely, but possible).
So, adding just to HeapPriorityQueue can be reasonable.
The HeapPriorityQueue is currently designed so that you cannot iterate it. That means that it doesn't need to check for concurrent modification at any point. If we add a way to iterate elements in place, then we also need to start recording modifcations so that the iteration can catch those and throw a ConcurrentModificationError.
Alternatively, we can go for a "best effort" check that the length hasn't changed, which will catch some modifications, but not all, and will not promise that iteration won't accidentally visit elements more or less than once.
It's safer to just not go there. Iterating a modifiable collection is really an anti-pattern, and it was deliberate to not support it here.

If we do support it anyway, I'd suggest List<E> toUnorderedList(). That would still copy the elements, which means that you can safely iterate the list and modify the queue.

@lrhn lrhn added the type-enhancement A request for a change that isn't a bug label Sep 9, 2020
@lrhn
Copy link
Member

lrhn commented Sep 18, 2020

I've added toUnorderedList to PriorityQueue. Let's see if anyone implements it.

@lrhn lrhn closed this as completed Sep 18, 2020
@sortie
Copy link
Author

sortie commented Sep 22, 2020

The toUnorderedList implementation solves the runtime complexity issue by reducing it from O(nlogn) to O(n). However, it still uses O(n) time and memory to duplicate the queue. It looks like this feature got bundled with null safety so my service can't use it until null safety is released.

The toUnorderedList design seems a bit suboptimal for performance. My ideal interface would be something like Iterable<E> get unordered (or maybe an Iterator<E>):

  • It allows returning the raw backing data structure with the appropriate concurrent modification protections, which uses constant memory, unlike toUnorderedList that spends linear memory.
  • The runtime to iterate is split uniformly between entries, unlike toUnorderedList that has a big up front cost for large data structures.
  • Different implementations of PriorityQueue might use a different backing data structure which might not be a list.

In my case, I have very big priority lists with many millions of entries, and doing operations to consider all the elements (in no particular order) is very expensive (and I try to do it as little as possible). The big up front allocation can create a big delay where I can't use async boundaries to let other code paths remain responsive. Being able to iterate the backing data structure with no duplication makes this operation cheaper.

Make no mistake, toUnorderedList is an improvement, but isn't really more efficient than my code that maintains a set with all the entries in the queue, so the memory is already duplicated, and that set doesn't need to be rebuilt each time the queue is iterated.

@lrhn
Copy link
Member

lrhn commented Sep 22, 2020

So a better API would be Iterable<T> removeAll() which clears the queue and provides an iterable of the elements.
It's not a list, but that also means it doesn't have to be copied. It would probably be possible to use CastIterable as wrapper, since it needs a wrapper.

I'll think about this again, we haven't released it yet.

@lrhn lrhn reopened this Sep 22, 2020
@natebosch
Copy link
Member

natebosch commented Sep 23, 2020

I'll think about this again, we haven't released it yet.

I missed this comment before I published. It's released, but with a prerelease -nullsafety version.

Do we need to consider quickly reverting that to avoid it getting picked up by flutter?

cc @jakemac53

@lrhn
Copy link
Member

lrhn commented Sep 23, 2020

Won't be needed. If we need something different, we can add it that and keep toUnorderedList.

I am not particularly happy about adding a way to iterate elements in-place.
The problem is that iteration can be interleaved with modifications, and a properly designed iteration should detect modifications and bail out with a ConcurrentModificationError.
That necessarily costs. Not a lot, but still a cost which isn't necessary if you don't actually iterate.

The removeAll actually works better than you'd expect. If you do addAll with the result, the elements are in heap order, so the addition will be linear (every added element will check that it's already greater than its parent, and not need to move anywhere).
It still needs twice the memory (2.5 times actually, since the buffer will grow).

But OK, let's try and see how big an impact an unorderedElements iterable will have.

@sortie
Copy link
Author

sortie commented Sep 23, 2020

No, I do not need a destructive iterator. That means I need to pay O(nlogn) to rebuild the queue again. Besides removeAll() is already efficient and returns the elements in no particular order.

I just want to non-destructively iterate the elements in the queue in no particular order in linear time with constant memory use.

@lrhn
Copy link
Member

lrhn commented Oct 13, 2020

Have now landed unorderedElements in b836c70.

@lrhn lrhn closed this as completed Oct 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-enhancement A request for a change that isn't a bug
Projects
None yet
Development

No branches or pull requests

3 participants