Add Unary Overload of startsWith() and endsWith() #3752

nordlow · 2015-10-19T19:18:24Z

Adds overload that enables calling startsWith and endsWith with only a haystack and predicate pred.

See discussion at: http://forum.dlang.org/post/zobvqdmxnjnilqibtcph@forum.dlang.org

Should RandomAccess have special treatment via .length and opIndex or should we just use .empty and .front for every case?

If not I guess we could just do

{
    return !haystack.empty && haystack.front.unaryFun!pred
}

right?

MetaLang · 2015-10-19T20:14:01Z

Is there any difference between this and range.countUntil!pred == 0? If not, is it worth adding this overload of startsWith? If we do decide that it is worth it, there should be a corresponding overload added for endsWith.

MetaLang · 2015-10-19T20:23:18Z

Not sure about front vs. opIndex, but it may matter for length vs. empty. I forget whether there's ever a case where empty does non-trivial work, but if such a case exists, then maybe we should use length if possible.

nordlow · 2015-10-19T21:09:57Z

The motivation for this is purely usability or perhaps discoverability for new users and ease of remembering a solution when the other overloads have been learnt.

jmdavis · 2015-10-20T06:08:12Z

Honestly, I seriously question that this is worth adding. All it's doing is creating a function that does

auto result = !haystack.empty && pred(haystack.front);

That's trivial to write yourself, and this overload goes against the idea that startsWith tests whether a range starts with something. It's just testing whether the first element matches a predicate, not whether it starts with something. It's similar, but it's different enough that I think that having startsWith do this is questionable, and this is so trivial to write on your own in one line that it makes this whole thing seem like an insane amount of boilerplate - though part of that is the fact that it includes separate stuff for random access ranges for some reason (which makes no sense at all - I don't understand the logic behind it; a random access range is an input range and supports all of the input range primitives). So, if this is merged in, its implementation should be changed to

return !haystack.empty && unaryFun!pred(haystack.front);

but if it's that simple, I think that it's pretty pointless to add it to Phobos.

nordlow · 2015-10-20T08:40:47Z

Simplified implementations.

quickfur · 2015-10-21T17:25:26Z

Implementation seems OK. Not sure if something this simple is worth adding to Phobos, though. I'll let the other Phobos devs decide.

MetaLang · 2015-10-21T17:27:36Z

I don't think it's worth it, but I'm alright if we decide to add it for completeness' sake.

andralex · 2015-10-21T20:31:16Z

These are worth adding; the counterargument doesn't quite stand - by it, r.startsWith(x) ain't worth adding because it's simply !r.empty && r.front == x. Yet startsWith has been very useful. I think the predicate-only version is useful as well albeit more niche.

However, please consolidate the docs with /// ditto such that there's only one documentation entry for startsWith that comprehends all overloads.

I preapprove and archive this. If you need further resolution please email me. Thanks!

nordlow · 2015-10-21T20:51:38Z

@andralex do you mean that I should extend the main documentation for startsWith and endsWith to include explaining the case where no needles but only one predicate is present?

quickfur · 2015-10-21T20:54:20Z

@nordlow That would make the most sense, I think. Add /// ditto then update the current startsWith docs to explain what happens when no needles are passed.

MetaLang · 2015-10-21T20:55:42Z

std/algorithm/searching.d

+*/
+bool endsWith(alias pred = "a", R)(R doesThisEnd)
+    if (isInputRange!R &&
+        is(typeof(unaryFun!pred(doesThisEnd.front)) : bool))


This constraint is not enough, as I found out the hard way recently. This code won't compile although it should, because among returns a uint which for some reason does not implicitly cast to bool.

//Error: cannot deduce type... "asdf".startsWith!(c => c.among!('a', 'b', 'c'));

It'd probably be better to change it to:
if (isInputRange!R && __traits(compiles, cast(bool)unaryFun!pred(doesThisStart.front)))

Or something similar.

Is this problem different from the binary case? If so, how? If not, isn't it better to tackle this problem in a another PR addressing all the overloads of startsWith and endsWith?

What is the "binary case"?

Sorry, I meant the case when pred is a binaryFun. if we should fix the constraints for these two new overloads we might aswell fix the constraints for all the other overloads aswell, right? Your proposed new constraint works for me, anyhow.

It's probably a good idea, though I don't know if changing the constraints for the other overloads would break any code. Regardless, I think your implementation of isIfTestable from the other PR would fit here.

Ok, I'll look into it.

I don't think this is the right PR for that. It's a wider Phobos issue that I think should be tackled separately.

Don't overcomplicate things. Here you don't need bool, just ifTestable. I recall we have it somewhere from a recent pull request.

jmdavis · 2015-10-22T00:01:17Z

Well, where startsWith really shines is when it takes multiple arguments, but that is a good point.

Actually, with regards to the multiple arguments, I think that startsWith is some of the coolest metaprogramming in Phobos. It just about blew my mind when I first wrapped my head around how it works.

nordlow · 2015-10-22T06:40:17Z

Should the return type of the unary overload be bool or uint? The other overloads return uint.

If bool it feels a bit awkward to document the return value as either uint or bool, since you want me to have a single documentation for all overloads.

jmdavis · 2015-10-23T07:23:07Z

Should the return type of the unary overload be bool or uint? The other overloads return uint.

It should return bool. The overload that takes one argument already returns bool. It's just the one that takes multiple which returns uint. And they already share their documentation.

JackStouffer · 2015-10-30T18:21:56Z

I think the decision block tag can be removed; people may not review this because of that tag.

nordlow · 2015-11-03T10:41:25Z

I guess if this comes in we should add a corresponding overload for skipOver, right? Is it ok if I add that change to this PR aswell?

JakobOvrum · 2015-11-03T11:49:00Z

@jmdavis, it wouldn't solve the problem in general. The issue is that many of these algorithms check if the return type of the predicate function is implicitly convertible to bool, when they should test if the return type is testable in an if-statement.

nordlow · 2015-11-07T09:38:16Z

FYI: ifTestable was just added to std.traits. I guess we could reuse this here aswell.

jmdavis · 2015-11-08T07:55:14Z

FYI: ifTestable was just added to std.traits. I guess we could reuse this here aswell.

Except that the results aren't always used in if statements. It might make more sense to not require that the result be implicitly convertible to bool, but I think that we should be very sure that we understand the effects of that decision before we go down that route, and normally a predicate should result in bool.

jmdavis · 2015-11-08T07:58:59Z

The documentation for these new overloads still needs to be merged with the existing overloads as Andrei requested. There should be one non-ditto ddoc comment for startsWith and one for endsWith. That's what we've had until now, and it should be quite possible to update that documentation with regards to the new overloads rather than documenting them separately.

jmdavis · 2015-11-08T07:59:47Z

I guess if this comes in we should add a corresponding overload for skipOver, right? Is it ok if I add that change to this PR aswell?

It's an unrelated function. Please create separate PR for it.

nordlow · 2016-01-06T13:49:34Z

These overloads should be nothrow.

How do we make this happen considering that .front() throws?

See also: http://forum.dlang.org/post/gincfsrbdypejnrdjjlx@forum.dlang.org

dnadlinger · 2016-01-06T14:05:53Z

@nordlow: There is no way to express "this function does not throw if empty() is false" in the type system, so this can't be done in general.

nordlow · 2016-01-06T18:37:27Z

What about adding a wrapper function to Phobos, say maybeFront(), and use it here? Don't you agree that it would be great to prevent these overloads to always potentially throw?

It seems http://dlang.org/phobos/std_exception.html#.assumeWontThrow could be used right?

jmdavis · 2016-01-10T01:56:52Z

As long as startsWith/endsWith doesn't do anything on its own that can result in throwing, whether they can be nothrow or not depends entirely on the ranges that are being iterated over. If their front, empty, etc. are nothrow, then startsWith/endsWith should be inferred to be nothrow by the compiler. If they're not, then they won't be. There should be no need to mark any templated function as nothrow, and it's usually a bad idea, because it unnecessarily restricts what can be used with them. Attributes rarely belong on templated functions.

And as for ranges of characters, they can pretty much never involve nothrow functions at the moment, because decoding can throw a UTFException. There has been talk of making that an Error (e.g. UTFError) instead, but that hasn't been done thus far, and until it is, almost nothing involving strings can be nothrow.

andralex · 2016-01-10T23:32:52Z

@nordlow asked me by email, there are any number of conditions that we can easily prove during compilations they won't throw but it's outside type system's capability to carry the proof. I'd say just use assumeNothrow here.

jmdavis · 2016-01-11T00:01:19Z

@andralex Then I must be missing something here, because the only way that these functions should be nothrow is if the range's empty and front are nothrow, and the given predicate is nothrow, in which case the compiler should infer strartsWith/endsWith to be nothrow. And if any of those are not nothrow, then startsWith/endsWith can't be nothrow.

And in all of the examples in the unit tests, they can't be nothrow, because they're using string, and front throws for arrays of char or wchar, because decode throws a UTFException for them rather than using the replacement character.

andralex · 2016-01-12T06:02:38Z

@jmdavis oh I see what you're saying. Various components of the expression may throw for other reasons than using .front or .back against an empty range.

D's type system cannot detect that kind of stuff. So just give up on trying to make the function nothrow. At some point we got to stop refining, lest we go crazy.

JakobOvrum · 2016-01-12T06:17:42Z

We often use assert/preconditions for range primitives instead of enforce. Functions that throw Errors are still nothrow. To eliminate UTFException, use std.utf.byXchar which replace with std.utf.replacementDchar instead of throwing.

jmdavis · 2016-01-12T13:58:51Z

D's type system cannot detect that kind of stuff. So just give up on trying to make the function nothrow. At some point we got to stop refining, lest we go crazy.

As long as front/back, empty, and the predicate are nothrow, then IMHO it's the compiler's job to determine that the rest is nothrow, and if it can't, that's an enhancement request that can be dealt with in whatever manner is appropriate. These functions are exactly the sort of functions that we can't force to be nothrow without risking problems due to ranges that don't behave as expected (just like putting @trusted on them would be wrong), and we really will go crazy if we have to add a bunch of extra machinery to something this simple to make it nothrow.

We often use assert/preconditions for range primitives instead of enforce. Functions that throw Errors are still nothrow.

In general, empty should be nothrow, and front/back should never throw anything but an Error of some kind if the range is empty (and thus can and should be nothrow if the rest of what they're doing allows it, but that's up to the range implementor to get right). IIRC though, at one point, Walter was insisting that we use assertions to check for empty rather than doing something like throw RangeErrors. Regardless, it's up to whoever wrote the ranges to get it right, and a function like this shouldn't need to worry about it one way or the other beyond ensuring that it doesn't call front or back on an empty range.

To eliminate UTFException, use std.utf.byXchar which replace with std.utf.replacementDchar instead of throwing.

IMHO, we should just make it so that front and back just use the replacement character. decode already supports it. It's just that whoever did that was chicken and decided that the risk of breaking code was too high and made it so that whether you got a UTFException or the replacement character was based on an optional template argument that defaults to UTFException. The issue of front and back throwing for strings is such that we pretty much need to either make it so that they don't, or we need to not actually use strings with range-based functions and use wrappers like by*char instead. Regardless, it's not something that this function needs to worry about. Either we fix it so that front/back don't throw, or the caller needs to pass the string in via a wrapper.

andralex · 2016-01-13T17:34:51Z

As long as front/back, empty, and the predicate are nothrow, then IMHO it's the compiler's job to determine that the rest is nothrow, and if it can't, that's an enhancement request that can be dealt with in whatever manner is appropriate.

It's not that simple.

jmdavis · 2016-01-13T18:22:37Z

It's not that simple.

Well, then I'm missing something here. Certainly, if

return !doesThisEnd.empty && unaryFun!pred(doesThisEnd.back);

can't be inferred as nothrow based on whether empty, back, and unaryFun!pred are nothrow, then I don't see how attribute inference is ever going to work. It doesn't get much simpler than that. The main place that I've seen where attribute inference doesn't infer nothrow and can't infer nothrow is code like

auto s = format("a message: %s", value);

because while a programmer can see whether it can throw or not by verifying that the format specifiers work with the given arguments, that's something that can only be programmatically determined at runtime. In contrast, if empty, back, etc. can be known to be nothrow at compile time if they're marked that way or inferred to be that way, then startsWith and endsWith should be able to be inferred as nothrow. Sure, they could contain stuff like format that precludes them from being inferred as nothrow, but that's something that the implementor of those functions needs to deal with if they want them to be nothrow and not a concern of startsWith or endsWith. The compiler should be able to infer startsWith/endsWith to be nothrow as long as the functions that they're calling have the nothrow attribute.

Regardless, if the compiler fails to infer nothrow properly here, then doing anything to force it (e.g. using assumeWontThrow and marking the function as nothrow) would be a very bad idea, because some ranges will legitimately throw from empty, front, etc - as will some predicates. So, it would be a lot like marking the function as @trusted when some range functions or predicates may not be @safe at all.

andralex · 2016-01-13T19:36:24Z

Sorry, I misunderstood - thought you want the compiler to figure the conditional, i.e. for many ranges if they're not empty the front won't throw.

andralex · 2016-01-13T19:37:56Z

std/algorithm/searching.d

+   $(D bool) if the first element in range $(D doesThisEnd) fulfils predicate
+   $(D pred), $(D false) otherwise.
+*/
+bool endsWith(alias pred = "a", R)(R doesThisEnd)


There's no need for a default predicate here. r.endsWith is weird at best.

andralex · 2016-01-13T19:44:54Z

OK @nordlow so it seems this is ready to go pending a few trivial work items. Could you please finish? Thx!

jmdavis · 2016-01-13T21:15:19Z

Sorry, I misunderstood - thought you want the compiler to figure the conditional, i.e. for many ranges if they're not empty the front won't throw.

Yeah. That couldn't be nothrow, because it depends on runtime information (like the call to format) - though since we've pretty much decided that it's undefined behavior to call front on an empty range, that should probably be an assertion, which wouldn't prevent front from being nothrow, but that's an implementation issue of the range and outside of startsWith's purview.

OK @nordlow so it seems this is ready to go pending a few trivial work items. Could you please finish? Thx!

Yeah. The main thing that still needs to be fixed here based on the previous comments is that the ddoc comments for the existing overloads should be updated to encompass these new overloads rather than having separate documentation for them. If that had been done after the last update back in November, then I probably would have merged it then.

nordlow · 2016-01-18T08:49:42Z

Ok, @andralex. I'm right on it.

What happens with

assumeWontThrow(!doesThisStart.empty && unaryFun!pred(doesThisStart.front))

when pred may throw?

jmdavis · 2016-01-18T11:42:27Z

What happens with

assumeWontThrow(!doesThisStart.empty && unaryFun!pred(doesThisStart.front))

when pred may throw?

It'll throw an AssertError and kill your program. nothrow is specifically for functions that cannot throw, not for functions which shouldn't normally throw. assumeWontThrow exists for the rare cases where the programmer knows that it's impossible for the function to throw, but the compiler can't determine that (which almost certainly means that it's because whether it will throw or not depends on runtime arguments that the programmer knows enough about to guarantee won't throw, whereas the compiler doesn't care about the runtime arguments - just that the function can throw under some set of circumstances). I think that the only cases that I've seen where assumeWontThrow would be appropriate would be with format where you know that the format specifiers and the arguments match and will never cause format to throw or when constructing an object with data that you know is valid when the constructor throws on invalid arguments.

assumeWontThrow is not appropriate here. It is perfectly acceptable for the predicate, empty, front, or back to throw if they're not marked nothrow and thus unacceptable for startsWith or endsWith to be marked with nothrow.

And since per the range API, it's effectively undefined behavior for a range to have front, popFront, etc. called when the range is empty, ranges should not be throwing Exceptions based on the state of empty. Calling front, popFront, etc. on an empty range is a logic error, and ranges should be asserting in that case, not throwing Exceptions. So, if your concern is that a range which throws an Exception when its front is called when it's not empty is passed to startsWith or endsWith, and you want that to be nothrow, the problem is with that range, not with startsWith or endsWith. Using assumeWontThrow on templated code is a lot like using @trusted on templated code (though at least when assumeWontThrow is used incorrectly, it'll kill your program rather than just let it corrupt memory, like when @trusted is used incorrectly).

Please remove the assumeWontThrow from startsWith and endsWith. Having them there is just going to make code that is nothrow less efficient and make it so that code that is legitimately not nothrow because it throws Exceptions for other valid reasons will end up having an AssertError thrown instead and killing whatever program it's in.

nordlow · 2016-01-18T11:46:30Z

@jmdavis ok.

Removed calls to assumeWontThrow.

I guess such optimizations, if any, should be part of the compiler.

Should be ready now.

jmdavis · 2016-01-18T12:03:36Z

I guess such optimizations, if any, should be part of the compiler.

The compiler should infer nothrow based on whether the function cannot throw - which does not take runtime arguments into account; the function must never be able to throw an Exception. So, something like

if(obj.foo()) // foo is nothrow
    obj.bar();  // bar is _not_ nothrow but won't actually throw if foo returns true

will never be inferred to be nothrow. That sort of thing is outside of the purview of nothrow and requires the programmer see it, and in that case, assumeWontThrow may be appropriate. But the programmer must know that it's okay, just like they need to know that a function using @system stuff is actually @safe in order to mark it as @trusted. And in templated functions, the programmer usually does not know that the code can be @trusted or that it really won't throw, because that usually depends on the template arguments, which can generally be pretty much anything. If the programmer does actually know enough about what's going to be passed to the template to guarantee that it's @safe in all cases in spite of @system code (e.g. because of template constraints), then they can mark it @trusted, and if they know that it can never throw for any template argument, then they can use assumeWontThrow - but that's rarely the case, and it definitely isn't the case here.

AFAIK, the compiler already does all of the attribute inference for this code that it's supposed to be able to do, though it's certainly possible that there are bugs related to attribute inference that I'm not aware of (particularly if it related to lambdas or some other language construct that might require attribute inference on the predicate).

jmdavis · 2016-01-20T06:27:57Z

Auto-merge toggled on

Add Unary Overload of startsWith() and endsWith()

nordlow force-pushed the unary-startsWith branch 3 times, most recently from 1931832 to 8b3f9bb Compare October 19, 2015 19:44

nordlow force-pushed the unary-startsWith branch 2 times, most recently from f7c6099 to c1a0363 Compare October 20, 2015 08:40

quickfur added the Review:Needs Decision label Oct 21, 2015

MetaLang reviewed Oct 21, 2015
View reviewed changes

jmdavis removed the Review:Needs Decision label Oct 30, 2015

JakobOvrum added the Review:Needs Review label Nov 3, 2015

nordlow force-pushed the unary-startsWith branch from c1a0363 to 0e91a82 Compare November 3, 2015 12:31

nordlow changed the title ~~Add unary overload of algorithm startsWith~~ Add Unary Overload of startsWith() and endsWith() Nov 11, 2015

andralex reviewed Jan 13, 2016
View reviewed changes

nordlow added 2 commits January 17, 2016 21:21

Add unary overloads for startsWith and endsWith

0b93485

Add alternative commenting

bfebdc7

nordlow force-pushed the unary-startsWith branch from 0e91a82 to 0a5cc5f Compare January 18, 2016 09:19

Review updates

f7f4847

nordlow force-pushed the unary-startsWith branch from 0a5cc5f to f7f4847 Compare January 18, 2016 11:45

jmdavis added a commit that referenced this pull request Jan 20, 2016

Merge pull request #3752 from nordlow/unary-startsWith

8046230

Add Unary Overload of startsWith() and endsWith()

jmdavis merged commit 8046230 into dlang:master Jan 20, 2016

nordlow deleted the unary-startsWith branch October 19, 2021 19:45

Uh oh!

Add Unary Overload of startsWith() and endsWith() #3752

Add Unary Overload of startsWith() and endsWith() #3752

Uh oh!

Conversation

nordlow commented Oct 19, 2015

Uh oh!

MetaLang commented Oct 19, 2015

Uh oh!

MetaLang commented Oct 19, 2015

Uh oh!

nordlow commented Oct 19, 2015

Uh oh!

jmdavis commented Oct 20, 2015

Uh oh!

nordlow commented Oct 20, 2015

Uh oh!

quickfur commented Oct 21, 2015

Uh oh!

MetaLang commented Oct 21, 2015

Uh oh!

andralex commented Oct 21, 2015

Uh oh!

nordlow commented Oct 21, 2015

Uh oh!

quickfur commented Oct 21, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jmdavis commented Oct 22, 2015

Uh oh!

nordlow commented Oct 22, 2015

Uh oh!

jmdavis commented Oct 23, 2015

Uh oh!

JackStouffer commented Oct 30, 2015

Uh oh!

nordlow commented Nov 3, 2015

Uh oh!

JakobOvrum commented Nov 3, 2015

Uh oh!

nordlow commented Nov 7, 2015

Uh oh!

jmdavis commented Nov 8, 2015

Uh oh!

jmdavis commented Nov 8, 2015

Uh oh!

jmdavis commented Nov 8, 2015

Uh oh!

nordlow commented Jan 6, 2016

Uh oh!

dnadlinger commented Jan 6, 2016

Uh oh!

nordlow commented Jan 6, 2016

Uh oh!

jmdavis commented Jan 10, 2016

Uh oh!

andralex commented Jan 10, 2016

Uh oh!

jmdavis commented Jan 11, 2016

Uh oh!

andralex commented Jan 12, 2016

Uh oh!

JakobOvrum commented Jan 12, 2016

Uh oh!

jmdavis commented Jan 12, 2016