Skip to content

Conversation

nordlow
Copy link
Contributor

@nordlow nordlow commented Oct 19, 2015

Adds overload that enables calling startsWith and endsWith with only a haystack and predicate pred.

See discussion at: http://forum.dlang.org/post/zobvqdmxnjnilqibtcph@forum.dlang.org

Should RandomAccess have special treatment via .length and opIndex or should we just use .empty and .front for every case?

If not I guess we could just do

{
    return !haystack.empty && haystack.front.unaryFun!pred
}

right?

@nordlow nordlow force-pushed the unary-startsWith branch 3 times, most recently from 1931832 to 8b3f9bb Compare October 19, 2015 19:44
@MetaLang
Copy link
Member

Is there any difference between this and range.countUntil!pred == 0? If not, is it worth adding this overload of startsWith? If we do decide that it is worth it, there should be a corresponding overload added for endsWith.

@MetaLang
Copy link
Member

Not sure about front vs. opIndex, but it may matter for length vs. empty. I forget whether there's ever a case where empty does non-trivial work, but if such a case exists, then maybe we should use length if possible.

@nordlow
Copy link
Contributor Author

nordlow commented Oct 19, 2015

The motivation for this is purely usability or perhaps discoverability for new users and ease of remembering a solution when the other overloads have been learnt.

@jmdavis
Copy link
Member

jmdavis commented Oct 20, 2015

Honestly, I seriously question that this is worth adding. All it's doing is creating a function that does

auto result = !haystack.empty && pred(haystack.front);

That's trivial to write yourself, and this overload goes against the idea that startsWith tests whether a range starts with something. It's just testing whether the first element matches a predicate, not whether it starts with something. It's similar, but it's different enough that I think that having startsWith do this is questionable, and this is so trivial to write on your own in one line that it makes this whole thing seem like an insane amount of boilerplate - though part of that is the fact that it includes separate stuff for random access ranges for some reason (which makes no sense at all - I don't understand the logic behind it; a random access range is an input range and supports all of the input range primitives). So, if this is merged in, its implementation should be changed to

return !haystack.empty && unaryFun!pred(haystack.front);

but if it's that simple, I think that it's pretty pointless to add it to Phobos.

@nordlow nordlow force-pushed the unary-startsWith branch 2 times, most recently from f7c6099 to c1a0363 Compare October 20, 2015 08:40
@nordlow
Copy link
Contributor Author

nordlow commented Oct 20, 2015

Simplified implementations.

@quickfur
Copy link
Member

Implementation seems OK. Not sure if something this simple is worth adding to Phobos, though. I'll let the other Phobos devs decide.

@MetaLang
Copy link
Member

I don't think it's worth it, but I'm alright if we decide to add it for completeness' sake.

@andralex
Copy link
Member

These are worth adding; the counterargument doesn't quite stand - by it, r.startsWith(x) ain't worth adding because it's simply !r.empty && r.front == x. Yet startsWith has been very useful. I think the predicate-only version is useful as well albeit more niche.

However, please consolidate the docs with /// ditto such that there's only one documentation entry for startsWith that comprehends all overloads.

I preapprove and archive this. If you need further resolution please email me. Thanks!

@nordlow
Copy link
Contributor Author

nordlow commented Oct 21, 2015

@andralex do you mean that I should extend the main documentation for startsWith and endsWith to include explaining the case where no needles but only one predicate is present?

@quickfur
Copy link
Member

@nordlow That would make the most sense, I think. Add /// ditto then update the current startsWith docs to explain what happens when no needles are passed.

*/
bool endsWith(alias pred = "a", R)(R doesThisEnd)
if (isInputRange!R &&
is(typeof(unaryFun!pred(doesThisEnd.front)) : bool))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constraint is not enough, as I found out the hard way recently. This code won't compile although it should, because among returns a uint which for some reason does not implicitly cast to bool.

//Error: cannot deduce type...
"asdf".startsWith!(c => c.among!('a', 'b', 'c'));

It'd probably be better to change it to:
if (isInputRange!R && __traits(compiles, cast(bool)unaryFun!pred(doesThisStart.front)))

Or something similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this problem different from the binary case? If so, how? If not, isn't it better to tackle this problem in a another PR addressing all the overloads of startsWith and endsWith?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the "binary case"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I meant the case when pred is a binaryFun. if we should fix the constraints for these two new overloads we might aswell fix the constraints for all the other overloads aswell, right? Your proposed new constraint works for me, anyhow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably a good idea, though I don't know if changing the constraints for the other overloads would break any code. Regardless, I think your implementation of isIfTestable from the other PR would fit here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll look into it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the right PR for that. It's a wider Phobos issue that I think should be tackled separately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't overcomplicate things. Here you don't need bool, just ifTestable. I recall we have it somewhere from a recent pull request.

@jmdavis
Copy link
Member

jmdavis commented Oct 22, 2015

Well, where startsWith really shines is when it takes multiple arguments, but that is a good point.

Actually, with regards to the multiple arguments, I think that startsWith is some of the coolest metaprogramming in Phobos. It just about blew my mind when I first wrapped my head around how it works.

@nordlow
Copy link
Contributor Author

nordlow commented Oct 22, 2015

Should the return type of the unary overload be bool or uint? The other overloads return uint.

If bool it feels a bit awkward to document the return value as either uint or bool, since you want me to have a single documentation for all overloads.

@jmdavis
Copy link
Member

jmdavis commented Oct 23, 2015

Should the return type of the unary overload be bool or uint? The other overloads return uint.

It should return bool. The overload that takes one argument already returns bool. It's just the one that takes multiple which returns uint. And they already share their documentation.

@JackStouffer
Copy link
Contributor

I think the decision block tag can be removed; people may not review this because of that tag.

@nordlow
Copy link
Contributor Author

nordlow commented Nov 3, 2015

I guess if this comes in we should add a corresponding overload for skipOver, right? Is it ok if I add that change to this PR aswell?

@JakobOvrum
Copy link
Contributor

@jmdavis, it wouldn't solve the problem in general. The issue is that many of these algorithms check if the return type of the predicate function is implicitly convertible to bool, when they should test if the return type is testable in an if-statement.

@nordlow
Copy link
Contributor Author

nordlow commented Nov 7, 2015

FYI: ifTestable was just added to std.traits. I guess we could reuse this here aswell.

@jmdavis
Copy link
Member

jmdavis commented Nov 8, 2015

FYI: ifTestable was just added to std.traits. I guess we could reuse this here aswell.

Except that the results aren't always used in if statements. It might make more sense to not require that the result be implicitly convertible to bool, but I think that we should be very sure that we understand the effects of that decision before we go down that route, and normally a predicate should result in bool.

@jmdavis
Copy link
Member

jmdavis commented Nov 8, 2015

The documentation for these new overloads still needs to be merged with the existing overloads as Andrei requested. There should be one non-ditto ddoc comment for startsWith and one for endsWith. That's what we've had until now, and it should be quite possible to update that documentation with regards to the new overloads rather than documenting them separately.

@jmdavis
Copy link
Member

jmdavis commented Nov 8, 2015

I guess if this comes in we should add a corresponding overload for skipOver, right? Is it ok if I add that change to this PR aswell?

It's an unrelated function. Please create separate PR for it.

@nordlow nordlow changed the title Add unary overload of algorithm startsWith Add Unary Overload of startsWith() and endsWith() Nov 11, 2015
@nordlow
Copy link
Contributor Author

nordlow commented Jan 6, 2016

These overloads should be nothrow.

How do we make this happen considering that .front() throws?

See also: http://forum.dlang.org/post/gincfsrbdypejnrdjjlx@forum.dlang.org

@dnadlinger
Copy link
Contributor

@nordlow: There is no way to express "this function does not throw if empty() is false" in the type system, so this can't be done in general.

@nordlow
Copy link
Contributor Author

nordlow commented Jan 6, 2016

What about adding a wrapper function to Phobos, say maybeFront(), and use it here? Don't you agree that it would be great to prevent these overloads to always potentially throw?

It seems http://dlang.org/phobos/std_exception.html#.assumeWontThrow could be used right?

@jmdavis
Copy link
Member

jmdavis commented Jan 10, 2016

As long as startsWith/endsWith doesn't do anything on its own that can result in throwing, whether they can be nothrow or not depends entirely on the ranges that are being iterated over. If their front, empty, etc. are nothrow, then startsWith/endsWith should be inferred to be nothrow by the compiler. If they're not, then they won't be. There should be no need to mark any templated function as nothrow, and it's usually a bad idea, because it unnecessarily restricts what can be used with them. Attributes rarely belong on templated functions.

And as for ranges of characters, they can pretty much never involve nothrow functions at the moment, because decoding can throw a UTFException. There has been talk of making that an Error (e.g. UTFError) instead, but that hasn't been done thus far, and until it is, almost nothing involving strings can be nothrow.

@andralex
Copy link
Member

@nordlow asked me by email, there are any number of conditions that we can easily prove during compilations they won't throw but it's outside type system's capability to carry the proof. I'd say just use assumeNothrow here.

@jmdavis
Copy link
Member

jmdavis commented Jan 11, 2016

@andralex Then I must be missing something here, because the only way that these functions should be nothrow is if the range's empty and front are nothrow, and the given predicate is nothrow, in which case the compiler should infer strartsWith/endsWith to be nothrow. And if any of those are not nothrow, then startsWith/endsWith can't be nothrow.

And in all of the examples in the unit tests, they can't be nothrow, because they're using string, and front throws for arrays of char or wchar, because decode throws a UTFException for them rather than using the replacement character.

@andralex
Copy link
Member

@jmdavis oh I see what you're saying. Various components of the expression may throw for other reasons than using .front or .back against an empty range.

D's type system cannot detect that kind of stuff. So just give up on trying to make the function nothrow. At some point we got to stop refining, lest we go crazy.

@JakobOvrum
Copy link
Contributor

We often use assert/preconditions for range primitives instead of enforce. Functions that throw Errors are still nothrow. To eliminate UTFException, use std.utf.byXchar which replace with std.utf.replacementDchar instead of throwing.

@jmdavis
Copy link
Member

jmdavis commented Jan 12, 2016

D's type system cannot detect that kind of stuff. So just give up on trying to make the function nothrow. At some point we got to stop refining, lest we go crazy.

As long as front/back, empty, and the predicate are nothrow, then IMHO it's the compiler's job to determine that the rest is nothrow, and if it can't, that's an enhancement request that can be dealt with in whatever manner is appropriate. These functions are exactly the sort of functions that we can't force to be nothrow without risking problems due to ranges that don't behave as expected (just like putting @trusted on them would be wrong), and we really will go crazy if we have to add a bunch of extra machinery to something this simple to make it nothrow.

We often use assert/preconditions for range primitives instead of enforce. Functions that throw Errors are still nothrow.

In general, empty should be nothrow, and front/back should never throw anything but an Error of some kind if the range is empty (and thus can and should be nothrow if the rest of what they're doing allows it, but that's up to the range implementor to get right). IIRC though, at one point, Walter was insisting that we use assertions to check for empty rather than doing something like throw RangeErrors. Regardless, it's up to whoever wrote the ranges to get it right, and a function like this shouldn't need to worry about it one way or the other beyond ensuring that it doesn't call front or back on an empty range.

To eliminate UTFException, use std.utf.byXchar which replace with std.utf.replacementDchar instead of throwing.

IMHO, we should just make it so that front and back just use the replacement character. decode already supports it. It's just that whoever did that was chicken and decided that the risk of breaking code was too high and made it so that whether you got a UTFException or the replacement character was based on an optional template argument that defaults to UTFException. The issue of front and back throwing for strings is such that we pretty much need to either make it so that they don't, or we need to not actually use strings with range-based functions and use wrappers like by*char instead. Regardless, it's not something that this function needs to worry about. Either we fix it so that front/back don't throw, or the caller needs to pass the string in via a wrapper.

@andralex
Copy link
Member

As long as front/back, empty, and the predicate are nothrow, then IMHO it's the compiler's job to determine that the rest is nothrow, and if it can't, that's an enhancement request that can be dealt with in whatever manner is appropriate.

It's not that simple.

@jmdavis
Copy link
Member

jmdavis commented Jan 13, 2016

It's not that simple.

Well, then I'm missing something here. Certainly, if

return !doesThisEnd.empty && unaryFun!pred(doesThisEnd.back);

can't be inferred as nothrow based on whether empty, back, and unaryFun!pred are nothrow, then I don't see how attribute inference is ever going to work. It doesn't get much simpler than that. The main place that I've seen where attribute inference doesn't infer nothrow and can't infer nothrow is code like

auto s = format("a message: %s", value);

because while a programmer can see whether it can throw or not by verifying that the format specifiers work with the given arguments, that's something that can only be programmatically determined at runtime. In contrast, if empty, back, etc. can be known to be nothrow at compile time if they're marked that way or inferred to be that way, then startsWith and endsWith should be able to be inferred as nothrow. Sure, they could contain stuff like format that precludes them from being inferred as nothrow, but that's something that the implementor of those functions needs to deal with if they want them to be nothrow and not a concern of startsWith or endsWith. The compiler should be able to infer startsWith/endsWith to be nothrow as long as the functions that they're calling have the nothrow attribute.

Regardless, if the compiler fails to infer nothrow properly here, then doing anything to force it (e.g. using assumeWontThrow and marking the function as nothrow) would be a very bad idea, because some ranges will legitimately throw from empty, front, etc - as will some predicates. So, it would be a lot like marking the function as @trusted when some range functions or predicates may not be @safe at all.

@andralex
Copy link
Member

Sorry, I misunderstood - thought you want the compiler to figure the conditional, i.e. for many ranges if they're not empty the front won't throw.

$(D bool) if the first element in range $(D doesThisEnd) fulfils predicate
$(D pred), $(D false) otherwise.
*/
bool endsWith(alias pred = "a", R)(R doesThisEnd)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need for a default predicate here. r.endsWith is weird at best.

@andralex
Copy link
Member

OK @nordlow so it seems this is ready to go pending a few trivial work items. Could you please finish? Thx!

@jmdavis
Copy link
Member

jmdavis commented Jan 13, 2016

Sorry, I misunderstood - thought you want the compiler to figure the conditional, i.e. for many ranges if they're not empty the front won't throw.

Yeah. That couldn't be nothrow, because it depends on runtime information (like the call to format) - though since we've pretty much decided that it's undefined behavior to call front on an empty range, that should probably be an assertion, which wouldn't prevent front from being nothrow, but that's an implementation issue of the range and outside of startsWith's purview.

OK @nordlow so it seems this is ready to go pending a few trivial work items. Could you please finish? Thx!

Yeah. The main thing that still needs to be fixed here based on the previous comments is that the ddoc comments for the existing overloads should be updated to encompass these new overloads rather than having separate documentation for them. If that had been done after the last update back in November, then I probably would have merged it then.

@nordlow
Copy link
Contributor Author

nordlow commented Jan 18, 2016

Ok, @andralex. I'm right on it.

What happens with

assumeWontThrow(!doesThisStart.empty && unaryFun!pred(doesThisStart.front))

when pred may throw?

@jmdavis
Copy link
Member

jmdavis commented Jan 18, 2016

What happens with

assumeWontThrow(!doesThisStart.empty && unaryFun!pred(doesThisStart.front))

when pred may throw?

It'll throw an AssertError and kill your program. nothrow is specifically for functions that cannot throw, not for functions which shouldn't normally throw. assumeWontThrow exists for the rare cases where the programmer knows that it's impossible for the function to throw, but the compiler can't determine that (which almost certainly means that it's because whether it will throw or not depends on runtime arguments that the programmer knows enough about to guarantee won't throw, whereas the compiler doesn't care about the runtime arguments - just that the function can throw under some set of circumstances). I think that the only cases that I've seen where assumeWontThrow would be appropriate would be with format where you know that the format specifiers and the arguments match and will never cause format to throw or when constructing an object with data that you know is valid when the constructor throws on invalid arguments.

assumeWontThrow is not appropriate here. It is perfectly acceptable for the predicate, empty, front, or back to throw if they're not marked nothrow and thus unacceptable for startsWith or endsWith to be marked with nothrow.

And since per the range API, it's effectively undefined behavior for a range to have front, popFront, etc. called when the range is empty, ranges should not be throwing Exceptions based on the state of empty. Calling front, popFront, etc. on an empty range is a logic error, and ranges should be asserting in that case, not throwing Exceptions. So, if your concern is that a range which throws an Exception when its front is called when it's not empty is passed to startsWith or endsWith, and you want that to be nothrow, the problem is with that range, not with startsWith or endsWith. Using assumeWontThrow on templated code is a lot like using @trusted on templated code (though at least when assumeWontThrow is used incorrectly, it'll kill your program rather than just let it corrupt memory, like when @trusted is used incorrectly).

Please remove the assumeWontThrow from startsWith and endsWith. Having them there is just going to make code that is nothrow less efficient and make it so that code that is legitimately not nothrow because it throws Exceptions for other valid reasons will end up having an AssertError thrown instead and killing whatever program it's in.

@nordlow
Copy link
Contributor Author

nordlow commented Jan 18, 2016

@jmdavis ok.

Removed calls to assumeWontThrow.

I guess such optimizations, if any, should be part of the compiler.

Should be ready now.

@jmdavis
Copy link
Member

jmdavis commented Jan 18, 2016

I guess such optimizations, if any, should be part of the compiler.

The compiler should infer nothrow based on whether the function cannot throw - which does not take runtime arguments into account; the function must never be able to throw an Exception. So, something like

if(obj.foo()) // foo is nothrow
    obj.bar();  // bar is _not_ nothrow but won't actually throw if foo returns true

will never be inferred to be nothrow. That sort of thing is outside of the purview of nothrow and requires the programmer see it, and in that case, assumeWontThrow may be appropriate. But the programmer must know that it's okay, just like they need to know that a function using @system stuff is actually @safe in order to mark it as @trusted. And in templated functions, the programmer usually does not know that the code can be @trusted or that it really won't throw, because that usually depends on the template arguments, which can generally be pretty much anything. If the programmer does actually know enough about what's going to be passed to the template to guarantee that it's @safe in all cases in spite of @system code (e.g. because of template constraints), then they can mark it @trusted, and if they know that it can never throw for any template argument, then they can use assumeWontThrow - but that's rarely the case, and it definitely isn't the case here.

AFAIK, the compiler already does all of the attribute inference for this code that it's supposed to be able to do, though it's certainly possible that there are bugs related to attribute inference that I'm not aware of (particularly if it related to lambdas or some other language construct that might require attribute inference on the predicate).

@jmdavis
Copy link
Member

jmdavis commented Jan 20, 2016

Auto-merge toggled on

jmdavis added a commit that referenced this pull request Jan 20, 2016
Add Unary Overload of startsWith() and endsWith()
@jmdavis jmdavis merged commit 8046230 into dlang:master Jan 20, 2016
@nordlow nordlow deleted the unary-startsWith branch October 19, 2021 19:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants