Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Issue 10706 setops functions need to require SortedRanges #6795

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

burner
Copy link
Member

@burner burner commented Dec 6, 2018

https://issues.dlang.org/show_bug.cgi?id=10706

  • largestPartialIntersection
  • largestPartialIntersectionWeighted
  • multiwayMerge
  • multiwayUnion
  • setDifference
  • setIntersection
  • setSymmetricDifference

Now all require SortedRanges to be passed, as that is what they expect
implicitly anyway.

A new public symbol isSortedRange was added to std.range.

@dlang-bot
Copy link
Contributor

dlang-bot commented Dec 6, 2018

Thanks for your pull request, @burner!

Bugzilla references

Auto-close Bugzilla Severity Description
10706 enhancement Functions that require a sorted range to take a SortedRange?

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + phobos#6795"

@thewilsonator
Copy link
Contributor

LGTM, given the possible fixing breakage that this will incur, this needs a changelog.

@thewilsonator thewilsonator added the Needs Changelog A changelog entry needs to be added to /changelog label Dec 6, 2018
@@ -10923,6 +10923,21 @@ that break its sorted-ness, `SortedRange` will work erratically.
a.sort.sort!"a > b";
}

/**
Determines whether `T` is a $(D SortedRange).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use LREF here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@burner burner force-pushed the issue10706 branch 4 times, most recently from 486d30c to 583c6a3 Compare December 6, 2018 12:50
burner pushed a commit to burner/dub that referenced this pull request Dec 6, 2018
in preperation for

https://issues.dlang.org/show_bug.cgi?id=10706
dlang/phobos#6795

the passed string[] are sorted so assumeSorted is ok.
burner pushed a commit to burner/dub that referenced this pull request Dec 6, 2018
in preperation for

https://issues.dlang.org/show_bug.cgi?id=10706
dlang/phobos#6795

the passed string[] are sorted so assumeSorted is ok.
@burner
Copy link
Member Author

burner commented Dec 6, 2018

changelog added

@thewilsonator thewilsonator removed the Needs Changelog A changelog entry needs to be added to /changelog label Dec 6, 2018
n8sh
n8sh previously requested changes Dec 7, 2018
Copy link
Member

@n8sh n8sh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be broken in one release. Assuming for the sake of argument that this should go ahead, the right way to do it would be to create deprecated wrapper functions that accept non-SortedRange arguments and call assumeSorted on them with a warning that in the future only SortedRanges will be accepted (leaving the wrappers in place permanently would defeat the purpose of this change), and those wrappers should be left in place for at least one version. Stuff shouldn't break without any warning.

@thewilsonator
Copy link
Contributor

Nice idea!

@burner
Copy link
Member Author

burner commented Dec 9, 2018

I will do the suggested change, but SortedRange will check for sortedness in debug. So this is still a breaking change.

But from

Assuming for the sake of argument that this should go ahead

it sounds like an opinion has already been formed regarding this PR. If this would be my first PR it would be my last. It's the tone that makes the music.

@burner burner force-pushed the issue10706 branch 2 times, most recently from cd1fe3a to 0810567 Compare December 10, 2018 13:31
@n8sh
Copy link
Member

n8sh commented Dec 11, 2018

I'm sorry it came across that way. I think the red X makes the tone seem more negative than intended.

@burner
Copy link
Member Author

burner commented Dec 11, 2018

@n8sh I didn't notice the red X until you just mentioned it.

Anyway, what is the way forward with this.

@n8sh
Copy link
Member

n8sh commented Dec 11, 2018

Anyway, what is the way forward with this.

You said you intend to do the suggested change, so when that is done this should be ready to go ahead.

@burner
Copy link
Member Author

burner commented Dec 11, 2018

I already added the overloads.

But I can't mark them as deprecated as phobos will fail to build.

@dlang-bot dlang-bot removed the stalled label Jun 2, 2022
@burner
Copy link
Member Author

burner commented Jun 2, 2022

@RazvanN7 I actually got it done I think.

changelog/pr6795.dd Outdated Show resolved Hide resolved
@burner
Copy link
Member Author

burner commented Feb 17, 2023

@RazvanN7 should we take another look at this?

Copy link
Collaborator

@RazvanN7 RazvanN7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the comments, a few general observations:

  • why don't we just deprecate string predicates? They are ugly, provide no benefit and just look at how many hoops we have to jump to support them. If comparing predicates for equality is the answer, we can compare simple lambda functions with __traits(isSame)
  • I see that you are using both sort and assumeSorted to obtain sorted ranges from already sorted arrays. Why not use only `assumeSorted? It should be more efficient that way. Now that being said:
  • I possible counter argument for this addition is the fact that there might be situations where you know the range is correctly sorted (even though it is not a SortedRange), but you don't want to pay the penalty of creating a SortedRange. I admit that this should be a very rare case, but with this addition we are closing the door to such scenarios, whereas, with the status quo it's on the user to make sure about the correctness of the provided data. Also, consider, that users might want to use their own version of a sorted range - this becomes impossible without wrapping their type in a SortedRange.

// sets
auto a = sort([ 1, 2, 4, 5, 7, 9 ]);
auto b = assumeUnique([ 0, 1, 2, 4, 7, 8 ]);
auto intersection = setIntersection(assumeSorted(a), assumeSorted(b));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add a commented example that issues an error.

sort([ 1.0, 7.0 ]),
sort([ 1.0, 7.0, 8.0]),
sort([ 4.0 ]),
sort([ 7.0 ]),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not assumeSorted ?

@@ -10785,7 +10785,7 @@ enum SortedRangeOptions
*/
struct SortedRange(Range, alias pred = "a < b",
SortedRangeOptions opt = SortedRangeOptions.assumeSorted)
if (isInputRange!Range && !isInstanceOf!(SortedRange, Range))
if (isInputRange!Range && !isInstanceOf!(isInputRange, Range))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this a contradiction? How can Range be both an input range but not be an instance of isInputRange?.

Also, what is it you are trying to achieve? The former condition did not accept SortedRanges as the basis for constructing other SortedRanges (indeed, an arbitrary limitation, but on the other hand, when would that be useful?), but I am having a hard time trying to understand what the new condition's purpose is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I don't know how that happend. Reverted now

`true` if `T` is a $(LREF SortedRange);
`false` otherwise
*/
template isSortedRange(T) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@burner : @andralex 's review remains unaddressed.

std/algorithm/setops.d Outdated Show resolved Hide resolved
std/algorithm/setops.d Outdated Show resolved Hide resolved
std/algorithm/setops.d Outdated Show resolved Hide resolved
@burner burner force-pushed the issue10706 branch 6 times, most recently from 25e323a to fc93a95 Compare March 11, 2023 21:58
@tgehr
Copy link
Contributor

tgehr commented Mar 11, 2023

FWIW: I would prefer if this did not go ahead. We are not really doing this kind of thing with any other invariants and SortedRange is not even able to properly enforce its own invariant due to mutable aliasing. In fact, I would also prefer to have binary search functions in std.algorithm that do not require a sorted range.

@burner
Copy link
Member Author

burner commented Mar 12, 2023

FWIW: I would prefer if this did not go ahead. We are not really doing this kind of thing with any other invariants and SortedRange is not even able to properly enforce its own invariant due to mutable aliasing. In fact, I would also prefer to have binary search functions in std.algorithm that do not require a sorted range.

I think that is a terrible argument. This PR makes the usage of the public facing api of phobos less error prone. To not do this because other parts of phobos not do this, yet, or because the already existing and used abstraction SortedRange has some problem, that hardly any user or dev cares about is the wrong approach in my opinion.

@andralex
Copy link
Member

IIRC all a user needs to do to transform a range known to be sorted into a SortedRange is insert an assumeSorted, which performs a cheap (but surprisingly valuable statistically) test of the range's sortedness.

As long as does not force the library into a memory safety issue, being mindful of breaking sortedness by means of mutable aliasing is user's responsibility. Sortedness of a SortedRange is in many ways opt-in, not a guarantee.

@andralex
Copy link
Member

Looking through the existing code, I found an issue with SortedRange. The entire type is parameterized by SortedRangeOptions, but that parameter is only used by the constructor.

  1. Parameterizing the entire type by a constructor parameter is wasteful
  2. Even in the constructor, that parameter can be a simple run-time parameter that costs virtually nothing to test.
  3. In my opinion the log(n) checking (checkRoughly) should be the default or the only option. Doing no checking at all really legitimizes a cast and we already have strict checking with isSorted.

@tgehr
Copy link
Contributor

tgehr commented Mar 12, 2023

IIRC all a user needs to do to transform a range known to be sorted into a SortedRange is insert an assumeSorted, which performs a cheap (but surprisingly valuable statistically) test of the range's sortedness.

iota(n).assumeSorted. 🤡
Also, by default this actually does not check anything.

@tgehr
Copy link
Contributor

tgehr commented Mar 12, 2023

FWIW: I would prefer if this did not go ahead. We are not really doing this kind of thing with any other invariants and SortedRange is not even able to properly enforce its own invariant due to mutable aliasing. In fact, I would also prefer to have binary search functions in std.algorithm that do not require a sorted range.

I think that is a terrible argument. This PR makes the usage of the public facing api of phobos less error prone. To not do this because other parts of phobos not do this, yet,

The issue is that this way to enforce invariants results in less than great UX.

or because the already existing and used abstraction SortedRange has some problem,

It provides a false sense of security. (Or some amount of cognitive dissonance, like in my case.)

that hardly any user or dev cares about

Hardly any user or dev cares for the SortedRange design.

is the wrong approach in my opinion.

What this change does is cause people to roll their own versions of those functions. It's already the case with binary search.

@pbackus
Copy link
Contributor

pbackus commented Mar 12, 2023

In my opinion the log(n) checking (checkRoughly) should be the default or the only option. Doing no checking at all really legitimizes a cast and we already have strict checking with isSorted.

I would be very surprised to learn that a function with "assume" in its name does any checking at all by default. We don't do any checking for assumeUnique or assumeSafeAppend, for example, and even assumeUTF only enables its checks in debug mode.

@jamesragray
Copy link

For what it is worth. I have not found the assume sorted mechanism ergonomic or useful, ever. Hence since (a) finding out that using it does some calculations wasting time; (b) being forced to use it even more, our team will no longer use the Phobos versions of these functions if this goes through. Perhaps one can add a way of opting out? e.g. add an overload or a flag that doesn't require this.

@andralex
Copy link
Member

In my opinion the log(n) checking (checkRoughly) should be the default or the only option. Doing no checking at all really legitimizes a cast and we already have strict checking with isSorted.

I would be very surprised to learn that a function with "assume" in its name does any checking at all by default. We don't do any checking for assumeUnique or assumeSafeAppend, for example, and even assumeUTF only enables its checks in debug mode.

A fair point. There's no meaningful checking we can do in the first two anyway - if we could, probably the story would be different. assumeUTF is a more relevant precedent, so perhaps assumeSorted could limit its checking to debug mode.

@andralex
Copy link
Member

For what it is worth. I have not found the assume sorted mechanism ergonomic or useful, ever. Hence since (a) finding out that using it does some calculations wasting time; (b) being forced to use it even more, our team will no longer use the Phobos versions of these functions if this goes through. Perhaps one can add a way of opting out? e.g. add an overload or a flag that doesn't require this.

First it needs to be understood we're looking at like 20 comparisons for one million elements. The utility of those is high. I can't find the paper I looked at back in the day, but perhaps this (and/or its references) may be a good starting point: https://cs-people.bu.edu/sofya/pubs/TestingSortednessEncyclopedia.pdf

I don't see the utility of a rewrite. Would it consist of one-liners that append .assumeSorted at the end of the ranges involved?

In your application, would it be feasible and useful to keep ranges that are known to be sorted as SortedRanges? That would have the same properties as the original range plus the information that it is sorted.

@burner
Copy link
Member Author

burner commented Mar 12, 2023

I think I'm missing something but assumeSorted doesn't check as far as I can see. The SortedRange ctor only checks on checkRoughly and checkStrictly which is not the case. The default is "assumeSorted" !?

@andralex
Copy link
Member

I think I'm missing something but assumeSorted doesn't check as far as I see. The SortedRange ctor only checks on checkRoughly and checkStrictly which is not the case. The default is "assumeSorted" !?

Indeed, and per my previous answer that seems to be a problem (one of several of SortedRange), checkStrictly is useless because it overlaps isSorted; I consider no checking specious, and checkRoughly is the sweet spot.

@jamesragray
Copy link

jamesragray commented Mar 13, 2023

For what it is worth. I have not found the assume sorted mechanism ergonomic or useful, ever. Hence since (a) finding out that using it does some calculations wasting time; (b) being forced to use it even more, our team will no longer use the Phobos versions of these functions if this goes through. Perhaps one can add a way of opting out? e.g. add an overload or a flag that doesn't require this.

First it needs to be understood we're looking at like 20 comparisons for one million elements. The utility of those is high. I can't find the paper I looked at back in the day, but perhaps this (and/or its references) may be a good starting point: https://cs-people.bu.edu/sofya/pubs/TestingSortednessEncyclopedia.pdf

Fair point, but still not needed (in my opinion) in most cases. I think that it is probably true that most common case to want to do say a binary search is on an array. You are forcing the user to wrap their type for a tiny bit of safety.

I don't see the utility of a rewrite. Would it consist of one-liners that append .assumeSorted at the end of the ranges involved?

I haven't studied the code carefully but I would expect one could simply copy the phobos code removing the template constraints, so that SortedRange is not needed.

In your application, would it be feasible and useful to keep ranges that are known to be sorted as SortedRanges?

I guess this might be reasonable in some cases, but I do not think Phobos should be forcing users do this.

https://issues.dlang.org/show_bug.cgi?id=10706

* largestPartialIntersection
* largestPartialIntersectionWeighted
* multiwayMerge
* multiwayUnion
* setDifference
* setIntersection
* setSymmetricDifference

Now all require SortedRanges to be passed, as that is what they expect
implicitly anyway.

A new public symbol `isSortedRange` was added to std.range.
Comment about usage with non SortedRange ranges
Add `deprecated` for PR 6795 except on multiwayMerge & largestPartialIntersectionWeighted
pragma msg because deprecated will not compile

forgot my testing thing

deprecate non sorted multiwayMerge

deprecate multiwayUnion use with non sorted ranges

deprecate usage of setDifference without sorted range

transition the last things to use SortedRanges

fix changelog

another fixup

redo a thing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet