Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better Sequence support for Heap #114

Closed
wants to merge 4 commits into from

Conversation

CTMacUser
Copy link
Contributor

This request expands on Sequence support for Heaps.

  • To match the ordered views, the property for the unordered view has a type-alias.
  • The ordered views fully implement underestimatedCount.
  • The ordered views fully implement element search. This is due from...
  • Heap now has an element-search method.

That last method has two implementations inside, breadth- and depth-first search. Those need to be benchmarked against each other. They also need to be compared against copying an unordered view and running a (linear) search on that. We need various sizes and search target locations (before all actual values, after them, between them, an actual match). The complexity level is a guesstimate; it could be worst-case linear, at least for breath-first. But those are just the time complexities, the scratch space may be significant.

Checklist

  • I've read the Contribution Guidelines
  • My contributions are licensed under the Swift license.
  • I've followed the coding style of the rest of the project.
  • I've added tests covering all new code paths my change adds to the project (if appropriate).
  • I've added benchmarks covering new functionality (if appropriate).
  • I've verified that my change does not break any existing tests or introduce unexplained benchmark regressions.
  • I've updated the documentation if necessary.

@CTMacUser CTMacUser changed the title Heap sequence Better Sequence support for Heap Oct 3, 2021
Copy link
Member

@lorentey lorentey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thumbs up for the additions for underestimatedCount and _customContainsEquatableElement!

The custom implementation of contains needs fixing, though.

Comment on lines +60 to +61
/// - Warning: If `Element` is a reference type, do not mutate elements such
/// that their relative order rankings change.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This warning seems out of place here. If it makes sense to add it, it needs to go on the top level Heap type.

///
/// - Warning: If `Element` is a reference type, do not mutate elements such
/// that their relative order rankings change.
public typealias UnorderedView = [Element]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this new typealias?

///
/// - Complexity: O(log `count`).
@inlinable
public func contains(_ element: Element) -> Bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither of these approaches seem good to me. A contains that requires an allocation doesn't seem like a good idea, especially because it is so easy to navigate the heap -- we can get to parent / left-child / right-child by doing trivial binary arithmetic on the indices.

Let's just implement a depth-first search but limited to strictly O(1) space. (I.e., no recursion.) This can be done because the integer index into the storage index represents a full path in the tree -- we don't need to express it on the call stack (as with recursion), or in an explicitly allocated stack.

The default implementation of contains simply performs a linear search in the storage array; this is effectively a breadth-first search with O(1) space, although it doesn't (can't) implement trimming. That's the algorithm to beat -- it isn't immediately obvious to me that a trimming depth-first search will perform better in practice, at least for element types with a trivial </== implementation.

/// - Returns: `true` if the element was found in the heap; otherwise,
/// `false`.
///
/// - Complexity: O(log `count`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// - Complexity: O(log `count`).
/// - Complexity: O(`count`).

}

// Out-of-bounds check
if (isMinLevel ? (<) : (>))(target, nodeElement) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is clever, but I prefer boring! 😉

Suggested change
if (isMinLevel ? (<) : (>))(target, nodeElement) {
if isMinLevel ? target < nodeElement : target > nodeElement {

Better yet, the algorithm could perhaps be "unrolled" to get rid of the isMinLevel branch. (However, it isn't obvious that will actually perform better than this -- this needs to be benchmarked.)

Comment on lines +248 to +250
// Pre-allocate an array for all potential searched-for nodes. (Only
// appending will happen, no removals nor random inserts, to minimize cache
// invalidation.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copying each visited element into a throwaway array is not a good idea.

@lorentey
Copy link
Member

I'm closing this without merging. We decided to get rid of the ordered views -- I was and remain unconvinced that they serve any useful purpose in practice that cannot be better achieved with other means. (They were over-elaborate equivalents of sequence(heap, next: { $0.popMin() }); but perhaps more importantly, they did not behave like views do in the stdlib. The hidden cost of having to copy and mutate a hidden copy of the original heap made them a non-starter.) If I'm proved wrong, we will have plenty of chance to re-add them in a future release.

Adding a contains(_:) method is not necessarily a bad idea, and we may revisit it in the future. But it alone is probably not that useful -- it should also come with some sort of equivalent to firstIndex(of:)/lastIndex(of:), and that opens up a can of worms I don't think we're ready to tackle yet. The question is mostly about whether concrete use cases want to be able to find and remove previously added items, or update their value without having to suffer an O(n) search. Determining whether or not a particular item is already in the heap is probably not that interesting on its own, and if it was, then it likely would be better achieved by maintaining a Set of existing items on the side. The proposed contains implementation is not immediately viable, so there is little reason to keep this PR open just in case we'll need a contains later.

Let's keep an eye on user feedback once 1.1.0 is out, and evolve Heap's APIs based on that. If feedback indicates that e.g. potential adopters are interested in a priority queue implementation that supports out-of-order retrieval, then we should design an update that directly covers their use case, rather than trying to imagine what it might be.

@lorentey lorentey closed this Oct 14, 2022
@lorentey lorentey added the Heap Min-max heap module label Nov 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Heap Min-max heap module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants