#8729 fixes #828

monarchdodra · 2012-10-02T06:22:49Z

This PR contains:

[REMOVED]8729: [parse|to]!double should NOT accept " 123.5"[REMOVED]
Typo: "case-insensive" => "case-insensitive"
Proper checking in parse array: These boldly accessed the front of the range, without checking for elements: This created asserts, when parse promises to throw a ConvException.

Anything regarding skip white has now been removed from this pull (which is now just a bug fix).

skipWhite is now a standalone ER @ 827 #827

[EDIT]: Previous branch accidentally deleted with gihub for windows, then botched at reviving. sorry :(
#817

9rnsr · 2012-10-02T11:26:17Z

LGTM. @andralex , Is it OK to merge this?

andralex · 2012-10-02T14:27:12Z

@9rnsr, @monarchdodra: I've been thinking more about this all and unfortunately my conclusion went the other way. Allow me to explain.

As I mentioned, my initial intent was "make parsing strict by default, let the user add permissiveness as needed". I think this goal is good and technically sound. However, there was this slippage in implementation that allowed whitespace preceding floating point values to be ignored.

Now, this diff fixes that bug and makes everything consistent. However, it will break the code of everybody who parses floats and ignores whitespace. This is undesirable.

A less disruptive change would be to increase permissiveness for everyone. Increasing permissiveness does bereak code. However, little, if any, code is likely to rely on the strict behavior as a feature. (Most likely, code is carefully skipping whitespace explicitly.) The underlying school of thought here would be "be liberal in what you accept and conservative in what you send". Also, there is no net loss of expressiveness because user code can always enforce that there's no preceding whitespace by looking at .front.

What do you think? @monarchdodra, sorry for asking you to throw work away.

9rnsr · 2012-10-02T15:06:21Z

"be liberal in what you accept and conservative in what you send"

I agree with the school basically, however we should not apply it in this case, because it would work just only for the limited underlying text protocol (e.g. http protocol header, it is ASCII-based).
But, parse would be used for general text parsing - used for the unlimited underlying text protocol. Various spaces are defined in Unicode, and often, skipping all of them would be unexpected.

Unicode might be the most used text code set, but it does not means that it is a silver bullet for the text processing. I think, unfortunately, if we want to make parse function convenient, we should stop adding permissive behaviors.

andralex · 2012-10-02T15:19:52Z

From a technical perspective, I also agree going more strict is better. But from a sociological perspective, I think we should avoid breakage if we can. My liberal vs. conservative quote was an attempt to better motivate the choice technically.

I think what I'm saying here is that I think we should go with the technically inferior solution because it is sociologically superior.

ghost · 2012-10-02T15:36:41Z

I think parse should be more strict but to should be more permissive. I have a gut feeling that people who don't normalize input before passing it to a conversion function usually use to. I have no data to back my claims though. :)

Also, people often request how to read ints/floats/strings from stdin, and a much better alternative to to is the cmdln.interact library from @JesseKPhillips which is a small but versatile set of 3-4 functions (~200 lines with comments). It seems to have disappeared from github though (although I have a copy and it's boost-licensed). I mean check this out:

auto name = userInput("What is your name");
if (userInput!bool("Do you want to continue?")) { }
auto num = require!(int, "a > 0 && a <= 10")("Enter a number from 1 to 10");

That stuff is probably what most people need rather than to or parse, at least when dealing with input strings.

9rnsr · 2012-10-02T15:38:16Z

Frankly speaking, I don't think that current Phobos library is less worth as the standard library of D. Because it seems to me that is a just collection of social decision, except some modules (e.g. std.algorithm, std.range).
If std.conv module is the third party library in the D package system, like CPAN, the behavior you talk is acceptable. But, it is in the standard library. I think we must select the best design for the modules under std package.
Furthermore, D is still not used widely, and avoiding breaking existing codes is too, too early IMO.

monarchdodra · 2012-10-02T15:39:12Z

Tell you what, for now, I've extract the 8729 "fix" from this PR, because it contains another fix: Proper handling of premature end of stream. Also, a more efficient internal skipWS.

Could we get this pull through alone (once I'm squashed)? This will give me more leeway to make another individual development (especially if we don't know what we want yet).

I had actually prepared a fix which I think was pretty good, and should make everyone happy. I'll push it in my (now closed) pull 827 #827

andralex · 2012-10-02T15:40:51Z

I think parse should be more strict but to should be more permissive. I have a gut feeling that people who don't normalize input before passing it to a conversion function usually use to. I have no data to back my claims though. :)

There should be no distinction in permissiveness of parse and to. The only reason for the existence of the two is that parse records progress in input and as such is composable.

If std.conv module is the third party library in the D package system, like CPAN, the behavior you talk is acceptable. But, it is in the standard library. I think we must select the best design for the modules under std package.

I agree for larger issues. But this is a small detail, and again, we're not losing functionality here. Reasonable people may argue that the library should make the more frequent behavior the default (and this is indeed how most other standard parsing conversions work in other languages).

andralex · 2012-10-02T15:42:06Z

std/conv.d

@@ -68,6 +68,7 @@ private void parseError(lazy string msg, string fn = __FILE__, size_t ln = __LIN

 private void parseCheck(alias source)(dchar c, string fn = __FILE__, size_t ln = __LINE__)
 {
+    if(source.empty) parseError(text("unexpected end of input when expecting", "\"", c, "\""));


since if ( is everywhere else in this module, please preserve it

andralex · 2012-10-02T15:47:07Z

std/conv.d

+    }
+    else
+    {
+        for ( ; !r.empty && std.ascii.isWhite(r.front) ; r.popFront())


hrm, I'd have hoped this default loop works as well as the specialized implementation

AFAIK (but @jmdavis probably know better than I do), this has something to do with iterating with unicode that makes foreach cheaper.

I could be wrong though... ? :(

#8729 fixes

andralex · 2012-10-02T15:47:29Z

merged

monarchdodra · 2012-10-02T15:48:00Z

Well, too late to squash I guess :/

monarchdodra · 2012-10-02T15:52:34Z

Just FYI, the behavior I had in mind was something akin to how "to" works:

Target parse(Target, Source)(ref Source s)
{
    skipWhite(s);
    return parseImpl(s);
}

This way, the "layman" user will just call parse, and it will work "nicely". But the one that wants more control can call "parseImpl" directly.

Calling "parseImpl" directly would imply that the input had better damn well be properly prepared. The user can the freelly call "skipWhite" manually if, when and how he feels it is necessary (or other functions). The change also avoids impacting the code too much.

Permissive and safe, but customizable. Thoughts?

9rnsr · 2012-10-02T16:01:14Z

I can not agree to enforce calling parseImpl by users.
Instead, adding require function (@AndrejMitrovic says in here) == skipWhite + strict parse is still better.

9rnsr · 2012-10-02T16:07:56Z

@andralex: Don't forget that your permissive change will break existing codes relies on the current parse functions strict behavior except parse!double. About that point, mine and your argument are same. Then, the issue is "which is breaking less code than another, and more consistent?".

ghost · 2012-10-02T16:15:08Z

How could you possibly determine which decision will end up breaking less code though? It's a guessing game. Maybe it's time to move the convo to NG.

9rnsr · 2012-10-02T16:18:51Z

There are 11 overloads of parse in std.conv, and just one overload skips leading white spaces.
Based on @andralex argument, 10/11 functions behavior will be changed. Based on mine, 1/11 will be changed.

braddr · 2012-10-02T16:24:34Z

The other general rule of thumb that I pull out in situations like this: it's possible to build loose on top of strict, but not the other way around (unless the loose version somehow gives you access to exactly how it was loose).

9rnsr · 2012-10-02T16:33:35Z

@braddr: I completely agree with you. It is a part of my opinion.

jmdavis · 2012-10-02T16:41:31Z

You're going to break code no matter what you do, but strict is what the documentation says even if it's only clear about that in the examples. Anyone who's been following what the docs say has likely not been bitten by this bug, because they were skipping whitespace when the implementation had already skipped it, but if they care about whitespace, then they will be bitten by making parse lax.

Anyone relying on parse's laxness is obviously only using a very specific subset of its functionality, because otherwise, they'd have hit bug# 8729 as reported (which was based on the incorrect understanding that parse always skipped whitespace). So, making it strict will break those people's code but not the code of anyone using many of the overloads of parse (since most of them are properly strict) and not the code of anyone who was already specifically skipping whitespace per the docs. Making it lax will avoid breaking their code but again will break the folks actually following the docs.

As you're going to break code regardless, strict is arguably the best implementation given that you can build lax on top of strict but not the other way around and that it's what the docs actually say the function does (not to mention that "be strict in what you emit but lax in what you accept" is one of the worst ideas ever to catch on in programming and has been devastating to the compatibility and standards compliance of the internet). So, while I think that having a parsing function which skips whitespace is useful, I think that it makes the most sense to make parse fully strict as originally intended and documented.

As for to, I've long argued that it should be strict with regards to whitespace whenever it's come up, and I'd hate to see it become lax just because we wanted it to be consistent with parse and we decided to make parse lax due to this bug (not to mention, I think that it's safe to say that changing to would break way more code than changing parse would).

JesseKPhillips · 2012-10-02T17:15:47Z

@AndrejMitrovic, it isn't gone, I chose a bad organization style when I started using git. I intend to break it apart, but currently it is hidden under a branch https://github.com/JesseKPhillips/JPDLibs/tree/cmdln

andralex · 2012-10-02T17:20:47Z

In this case you can build either on top of either.

string s = "  123";
// strict -> lax
auto x = s.stripLeft.to!int();
// lax -> strict
enforce(!s.empty && !isSpace(s.front));
auto x = s.to!int();

The "less breakage" argument is that increasing acceptance will cause more strings to be parsable, not fewer. Clearly there's the converse risk that more silent errors will happen.

Anyway, I don't feel all too strongly about this, and if you all do, no problem. Let's make everything more strict. If we hear user complaints later we will all have learned a good lesson.

9rnsr · 2012-10-02T17:43:32Z

Anyway, I don't feel all too strongly about this, and if you all do, no problem. Let's make everything more strict. If we hear user complaints later we will all have learned a good lesson.

OK. We might receive voices like "parse is inconvenient", but it would be better than "parse is broken" .

andralex · 2012-10-02T17:52:33Z

@monarchdodra thanks for being so understanding :o)

repeatedly · 2012-10-02T18:09:30Z

This conclusion is good to me :)

IMHO, base library should provide simple feature.
An application has own requirement, so the sanitization like ws skipping should be applied in its application layer or by specified function.

#8729 fixes

52303d1

monarchdodra added 2 commits October 2, 2012 18:17

Update std/conv.d

52fe9fd

Update std/conv.d

9207654

Update std/conv.d

35a41f1

andralex reviewed Oct 2, 2012
View reviewed changes

space after if

848d47c

andralex reviewed Oct 2, 2012
View reviewed changes

andralex added a commit that referenced this pull request Oct 2, 2012

Merge pull request #828 from monarchdodra/tobool

196107e

#8729 fixes

andralex merged commit 196107e into dlang:master Oct 2, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#8729 fixes #828

#8729 fixes #828

monarchdodra commented Oct 2, 2012

9rnsr commented Oct 2, 2012

andralex commented Oct 2, 2012

9rnsr commented Oct 2, 2012

andralex commented Oct 2, 2012

ghost commented Oct 2, 2012

9rnsr commented Oct 2, 2012

monarchdodra commented Oct 2, 2012

andralex commented Oct 2, 2012

andralex Oct 2, 2012

andralex Oct 2, 2012

monarchdodra Oct 2, 2012

andralex commented Oct 2, 2012

monarchdodra commented Oct 2, 2012

monarchdodra commented Oct 2, 2012

9rnsr commented Oct 2, 2012

9rnsr commented Oct 2, 2012

ghost commented Oct 2, 2012

9rnsr commented Oct 2, 2012

braddr commented Oct 2, 2012

9rnsr commented Oct 2, 2012

jmdavis commented Oct 2, 2012

JesseKPhillips commented Oct 2, 2012

andralex commented Oct 2, 2012

9rnsr commented Oct 2, 2012

andralex commented Oct 2, 2012

repeatedly commented Oct 2, 2012

#8729 fixes #828

#8729 fixes #828

Conversation

monarchdodra commented Oct 2, 2012

9rnsr commented Oct 2, 2012

andralex commented Oct 2, 2012

9rnsr commented Oct 2, 2012

andralex commented Oct 2, 2012

ghost commented Oct 2, 2012

9rnsr commented Oct 2, 2012

monarchdodra commented Oct 2, 2012

andralex commented Oct 2, 2012

andralex Oct 2, 2012

Choose a reason for hiding this comment

andralex Oct 2, 2012

Choose a reason for hiding this comment

monarchdodra Oct 2, 2012

Choose a reason for hiding this comment

andralex commented Oct 2, 2012

monarchdodra commented Oct 2, 2012

monarchdodra commented Oct 2, 2012

9rnsr commented Oct 2, 2012

9rnsr commented Oct 2, 2012

ghost commented Oct 2, 2012

9rnsr commented Oct 2, 2012

braddr commented Oct 2, 2012

9rnsr commented Oct 2, 2012

jmdavis commented Oct 2, 2012

JesseKPhillips commented Oct 2, 2012

andralex commented Oct 2, 2012

9rnsr commented Oct 2, 2012

andralex commented Oct 2, 2012

repeatedly commented Oct 2, 2012