Simultaneous Minimum and Maximum Evaluation #90

CTMacUser · 2021-03-07T04:10:11Z

Description

Adds a method to Sequence to evaluate min(by:) and max(by:) in a single pass. Adds an overload when the Element type is Comparable to default the predicate to <.

Should fix Swift bug SR-890.

Detailed Design

Include any additional information about the design here. At minimum, show any new API:

extension Sequence {
    /// Returns both the minimum and maximum elements in the sequence, using
    /// the given predicate as the comparison between elements.
    public func extrema(by areInIncreasingOrder: (Element, Element) throws -> Bool) rethrows -> (min: Element, max: Element)?
}

extension Sequence where Element: Comparable {
    /// Returns both the minimum and maximum elements in the sequence.
    @inlinable public func extrema() -> (min: Element, max: Element)?
}

Documentation Plan

There is documentation on the new methods. Plus the "Minmax" documentation file has been extended with descriptions of these methods.

Test Plan

The "Minmax" test file has new tests for the new methods.

Source Impact

The change should be additive.

Checklist

I've added at least one test that validates that my change is working, if appropriate
I've followed the code style of the rest of the project
I've read the Contribution Guidelines
I've updated the documentation if necessary

Add a method extending sequences that finds both the minimal and maximal elements of the receiver in a single pass, based off a given ordering predicate. Add an overload that defaults the comparison predicate to the standard less-than operator.

CTMacUser · 2021-03-07T04:12:42Z

I just splatted my documentation next to the existing docs in that file. There should be more work to merge them together.

I'm not familiar with Python. But if their containers have an equivalent method for single-pass min and max, please suggest something to add to the end of the documentation.

natecook1000

Thanks for this addition, @CTMacUser! 📉📈

I’m still mulling over the method name. It looks like there’s some good input on the forums — thanks for starting that thread! I have a couple more notes below.

natecook1000 · 2021-03-08T22:56:10Z

Sources/Algorithms/MinMax.swift

+      if try areInIncreasingOrder(element, lowest) {
+        lowest = element
+      } else if try areInIncreasingOrder(highest, element) {
+        highest = element
+      }


There's a neat optimization here where you can take two elements from the sequence instead of just one, and then compare them against each other. The higher one can only ever be a candidate for highest, while the lower can only be a candidate for lowest. You end up saving a quarter of the comparisons — want to give that a go?

I'm not familiar with this optimization. Is it documented anywhere?

It's described in more detail in the original bug — please let me know if you need more info!

natecook1000 · 2021-03-09T17:10:15Z

Sources/Algorithms/MinMax.swift

+  ///   argument; otherwise, `false`.
+  /// - Returns: A tuple with the sequence's minimum element, followed by its
+  ///   maximum element. For either member, if the sequence provides multiple
+  ///   qualifying elements, the one chosen is unspecified. The same element may


I’d like to take a more principled stand than this. Right now we have two different behaviors for which maximum element is chosen. First, sort(), max(count:), and Swift.max (not the sequence version) all either return the last equal value or, in the sort/min(count:) case, put it at the end.

/// All instances of `Foo` are equal to each other. class Foo: Comparable { static func ==(lhs: Foo, rhs: Foo) -> Bool { true } static func < (lhs: Foo, rhs: Foo) -> Bool { false } } let a = Foo() let b = Foo() max(a, b) === b // true [a, b].sorted().last === b // true [a, b].max(count: 1).first === b // true

Second, the sequence version of max() returns the first instance of the maximum value.

[a, b].max() === b // false

On the min side, there’s only one behavior — all the equivalent methods return the first minimum element.

Even though we’ll end up where this method and the stdlib’s Sequence.max() return different instances, let’s align this new method with the majoritarian position, and I’ll keep looking at how we can switch Sequence.max() to come into alignment.

Enshrining which element is chosen to break ties seems like an over-specification; someone's code may be buggy if they're depending on a specific equivalent element.

We already specify this in e.g. Swift.max, and I'd like to make the sort() stability a guarantee, which is similar. The dependence is exactly why I'd like to specify the behavior — rather than someone depending on the result of an implementation quirk, they can depend on documented behavior, and we can test to make sure we meet the expectation.

Change the name of the functions finding the minimum and maximum values simultaneously to something more colloquial.

Document the fact that the functions finding the minimum and maximum values simultaneously assume the source sequence has a finite number of elements.

Change the core loop of the function that finds a sequence's minimum and maximum values simultaneously such that it scans two elements at a time. Adjust the corresponding tests to cover all the new decision branches.

CTMacUser · 2021-03-30T04:02:58Z

I conditionally added @natecook1000 's suggestion to test two elements at a time within the loop. Can someone actually measure this with both short and (very?) long sequences? It looks like the sort of slick programming you'd think improves runtimes, but actually doesn't (or does so rarely it's not worth it).

natecook1000 · 2021-03-31T15:49:37Z

@CTMacUser I did some benchmarking and got I think what I'd expect — when the comparison is fast, there isn't a big difference between the two implementations, but when the comparison is non-trivial, the optimized version wins, as it's only performing 3/4 the number of comparisons.

Here's a comparison of running min() and max() separately, along with the two minAndMax() implementations, on an array of Doubles. Comparing doubles is fast, so there's not much difference between the two single-pass versions. The primary savings here is in skipping the second pass:

When we compare strings with matching prefixes, on the other hand, the optimized version starts to win, since it's performing fewer expensive comparisons:

(edited to show a more realistic Array<String> example)

natecook1000 · 2021-04-06T19:01:34Z

@CTMacUser This looks ready to go! Want to remove the conditional bits of the implementation and we can merge?

xwu · 2021-04-07T21:49:17Z

Guides/MinMax.md

 ```

 ### Complexity

-The algorithm used is based on [Soroush Khanlou's research on this matter](https://khanlou.com/2018/12/analyzing-complexity/). The total complexity is `O(k log k + nk)`, which will result in a runtime close to `O(n)` if *k* is a small amount. If *k* is a large amount (more than 10% of the collection), we fall back to sorting the entire array. Realistically, this means the worst case is actually `O(n log n)`.
+The algorithm used for an extrema block is based on [Soroush Khanlou's research on this matter](https://khanlou.com/2018/12/analyzing-complexity/). The total complexity is `O(k log k + nk)`, which will result in a runtime close to `O(n)` if *k* is a small amount. If *k* is a large amount (more than 10% of the collection), we fall back to sorting the entire array. Realistically, this means the worst case is actually `O(n log n)`.


What is an "extrema block"? Can you use other terminology that would be more generally familiar to users?

xwu · 2021-04-07T21:54:19Z

Sources/Algorithms/MinMax.swift

+  ///     let hues = ["Heliotrope": 296, "Coral": 16, "Aquamarine": 156]
+  ///     let minmax = hues.minAndMax { a, b in a.value < b.value }
+  ///     let leastHue = minmax?.min, greatestHue = minmax?.max
+  ///     print(leastHue, greatestHue)


Nit: can we simplify this code example by force unwrapping? The user doesn't need to read Optional(...) printed twice to understand the concept here, so best not to clutter. Something like (typed freehand--please check it for accuracy):

print(minmax!) // Prints '(min: (key: "Coral", value: 16), max: (key: "Heliotrope", value: 296))'

xwu · 2021-04-07T21:55:10Z

Sources/Algorithms/MinMax.swift

+        }
+      }
+    }
+    #else


Remove the conditional code choosing between the original and optimized core loop, retaining the optimized version. Update some documentation.

In the documentation blocks for the functions that find their receiving sequence's MIN and MAX in one pass, make the examples more readable.

NevinBR · 2021-04-10T15:04:51Z

Sources/Algorithms/MinMax.swift

+      }
+    }
+    return (lowest, highest)
+  }


The while loop can be substantially simplified, since iterators are guaranteed to keep returning nil after they’ve done so once:

while var low = iterator.next() { var high = iterator.next() ?? low if try areInIncreasingOrder(high, low) { swap(&low, &high) } if try areInIncreasingOrder(low, lowest) { lowest = low } if try !areInIncreasingOrder(high, highest) { highest = high } }

This does technically perform one extraneous comparison on odd-length sequences (high vs low after the ?? was triggered), but even that can be rectified by restoring the guard if benchmarks show a measurable difference:

while var low = iterator.next() { guard var high = iterator.next() else { if try areInIncreasingOrder(low, lowest) { lowest = low } if try !areInIncreasingOrder(low, highest) { highest = low } break } if try areInIncreasingOrder(high, low) { swap(&low, &high) } if try areInIncreasingOrder(low, lowest) { lowest = low } if try !areInIncreasingOrder(high, highest) { highest = high } }

NevinBR · 2021-04-10T15:31:07Z

Has a 4-at-a-time version of minAndMax been benchmarked?

It doesn’t change the total number of comparisons versus the pairwise version, but per Steve Canon’s remarks on the forums it may still run faster by halving the loop-carried dependencies.

And it’s pretty easy to write:

while let a = iterator.next() {
  let b = iterator.next() ?? a
  let c = iterator.next() ?? b
  let d = iterator.next() ?? c
  let (low, high) = minAndMaxOf4(a, b, c, d)
  if try areInIncreasingOrder(low, lowest) { lowest = low }
  if try !areInIncreasingOrder(high, highest) { highest = high }
}

Using the helper function:

func minAndMaxOf4<T>(
  _ a: T, _ b: T, _ c: T, _ d: T,
  by areInIncreasingOrder: (T, T) throws -> Bool
) rethrows -> (min: T, max: T) {
  var (a, b, c, d) = (a, b, c, d)
  try sort(&a, &b, by: areInIncreasingOrder)
  try sort(&c, &d, by: areInIncreasingOrder)
  try sort(&a, &c, by: areInIncreasingOrder)
  try sort(&b, &d, by: areInIncreasingOrder)
  return (a, d)
}

func sort<T>(
  _ a: inout T, _ b: inout T,
  by areInIncreasingOrder: (T, T) throws -> Bool
) rethrows {
  if try areInIncreasingOrder(b, a) { swap(&a, &b) }
}

For the function that finds a sequence's minimum and maximum values simultaneously, tighten the code actually used for the loop. Improve the comments.

Change the core loop of the function that finds a sequence's minimum and maximum values simultaneously such that it scans four elements at a time.

CTMacUser · 2021-04-15T02:04:47Z

In my quick runs of the already-provided tests, using the four-at-a-time loop results in more calls to the comparison predicate than the improved two-at-a-time loop did (78 vs. 68).

NevinBR · 2021-04-15T04:09:17Z

In my quick runs of the already-provided tests, using the four-at-a-time loop results in more calls to the comparison predicate than the improved two-at-a-time loop did (78 vs. 68).

(a) There should be at most 4 extra comparisons the way I wrote it, and they can be avoided by replacing the ??s with if lets.

(b) The important thing to compare is the actual time the algorithm takes to run.

(c) The potential time-save comes from reducing the loop-carried dependencies, not the number of comparisons.

LemonSpike · 2021-04-16T19:24:33Z

Sources/Algorithms/MinMax.swift

+  /// with the highest value.
+  ///
+  ///     let hues = ["Heliotrope": 296, "Coral": 16, "Aquamarine": 156]
+  ///     if let extremeHues = hues.minAndMax(by: {$0.value < $1.value}) {


Nit: Spaces between the curly braces.

LemonSpike · 2021-04-16T19:30:08Z

Sources/Algorithms/MinMax.swift

+  /// - Complexity: O(*n*), where *n* is the length of the sequence.
+  @inlinable
+  public func minAndMax() -> (min: Element, max: Element)? {
+    return minAndMax(by: <)


Nit: We don't need a return here.

I like the return.

As a matter of style, I think return should only be omitted when the entire{...} block is on a single line.

I wish we had a style guide for these things. But you're right to be consistent with the whole codebase 😊 (I think). Looks like SE-0250 is not coming back soon, but I could raise it if it does.

natecook1000 · 2021-05-04T18:01:43Z

@CTMacUser My benchmarking shows very small differences between find the min and max two-at-a-time vs four-at-a-time; which one wins depends on the cost of the comparison. In any case, we don't need to let that decision block merging this addition — could you choose one implementation and we'll get this merged? Thanks!

NevinBR · 2021-05-04T21:59:05Z

@CTMacUser My benchmarking shows very small differences between find the min and max two-at-a-time vs four-at-a-time; which one wins depends on the cost of the comparison. In any case, we don't need to let that decision block merging this addition — could you choose one implementation and we'll get this merged? Thanks!

Thanks for testing, Nate. If the performance is similar, then pairwise is clearly preferable for implementation simplicity.

natecook1000 · 2021-05-12T20:48:12Z

@swift-ci Please test

natecook1000 · 2021-05-12T21:11:10Z

@CTMacUser I can single out the implementation post-merge — thanks again for your contribution! 👏🏻👏🏻

CTMacUser · 2021-07-04T05:44:28Z

Sorry for flaking out, but it seems everything got handled anyway. I've spent the time procuring an ARM-based Mac and transitioning my programming to it.

natecook1000 · 2021-07-06T15:13:31Z

No worries, and congrats! 🎉

natecook1000 reviewed Mar 9, 2021

View reviewed changes

CTMacUser added 3 commits March 12, 2021 22:55

Rename function to find MIN and MAX in one pass

3fd03ea

Change the name of the functions finding the minimum and maximum values simultaneously to something more colloquial.

Make an implicit precondition explicit

b1453e4

Document the fact that the functions finding the minimum and maximum values simultaneously assume the source sequence has a finite number of elements.

Optionally add a possible optimization

6fd8cc7

Change the core loop of the function that finds a sequence's minimum and maximum values simultaneously such that it scans two elements at a time. Adjust the corresponding tests to cover all the new decision branches.

xwu reviewed Apr 7, 2021

View reviewed changes

Sources/Algorithms/MinMax.swift

}

}

}

#else

Copy link

Contributor

xwu Apr 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔

LemonSpike reacted with laugh emoji

CTMacUser added 3 commits April 8, 2021 19:13

Approve an optimization

49ccd48

Remove the conditional code choosing between the original and optimized core loop, retaining the optimized version. Update some documentation.

Improve terminology in documentation

cbf35df

Improve function example code

0c30dcf

In the documentation blocks for the functions that find their receiving sequence's MIN and MAX in one pass, make the examples more readable.

NevinBR reviewed Apr 10, 2021

View reviewed changes

CTMacUser added 2 commits April 13, 2021 03:56

Simplify a core loop

4ee09be

For the function that finds a sequence's minimum and maximum values simultaneously, tighten the code actually used for the loop. Improve the comments.

Optionally add a second optimization

66fd25e

Change the core loop of the function that finds a sequence's minimum and maximum values simultaneously such that it scans four elements at a time.

LemonSpike reviewed Apr 16, 2021

View reviewed changes

Merge branch 'main' into minmax

57cbb28

natecook1000 merged commit 4038b5d into apple:main May 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simultaneous Minimum and Maximum Evaluation #90

Simultaneous Minimum and Maximum Evaluation #90

CTMacUser commented Mar 7, 2021

CTMacUser commented Mar 7, 2021

natecook1000 left a comment

natecook1000 Mar 8, 2021

CTMacUser Mar 17, 2021

natecook1000 Mar 17, 2021

natecook1000 Mar 9, 2021 •

edited

Loading

CTMacUser Mar 17, 2021

natecook1000 Mar 17, 2021

CTMacUser commented Mar 30, 2021

natecook1000 commented Mar 31, 2021 •

edited

Loading

natecook1000 commented Apr 6, 2021

xwu Apr 7, 2021

xwu Apr 7, 2021 •

edited

Loading

xwu Apr 7, 2021

NevinBR Apr 10, 2021

NevinBR commented Apr 10, 2021

CTMacUser commented Apr 15, 2021

NevinBR commented Apr 15, 2021

LemonSpike Apr 16, 2021

LemonSpike Apr 16, 2021

NevinBR Apr 16, 2021

LemonSpike Apr 16, 2021 •

edited

Loading

natecook1000 commented May 4, 2021

NevinBR commented May 4, 2021

natecook1000 commented May 12, 2021

natecook1000 commented May 12, 2021

CTMacUser commented Jul 4, 2021

natecook1000 commented Jul 6, 2021

Simultaneous Minimum and Maximum Evaluation #90

Simultaneous Minimum and Maximum Evaluation #90

Conversation

CTMacUser commented Mar 7, 2021

Description

Detailed Design

Documentation Plan

Test Plan

Source Impact

Checklist

CTMacUser commented Mar 7, 2021

natecook1000 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

natecook1000 Mar 9, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CTMacUser commented Mar 30, 2021

natecook1000 commented Mar 31, 2021 • edited Loading

natecook1000 commented Apr 6, 2021

Choose a reason for hiding this comment

xwu Apr 7, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NevinBR commented Apr 10, 2021

CTMacUser commented Apr 15, 2021

NevinBR commented Apr 15, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LemonSpike Apr 16, 2021 • edited Loading

Choose a reason for hiding this comment

natecook1000 commented May 4, 2021

NevinBR commented May 4, 2021

natecook1000 commented May 12, 2021

natecook1000 commented May 12, 2021

CTMacUser commented Jul 4, 2021

natecook1000 commented Jul 6, 2021

natecook1000 Mar 9, 2021 •

edited

Loading

natecook1000 commented Mar 31, 2021 •

edited

Loading

xwu Apr 7, 2021 •

edited

Loading

LemonSpike Apr 16, 2021 •

edited

Loading