-
-
Notifications
You must be signed in to change notification settings - Fork 701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#8729 fixes #828
#8729 fixes #828
Conversation
LGTM. @andralex , Is it OK to merge this? |
@9rnsr, @monarchdodra: I've been thinking more about this all and unfortunately my conclusion went the other way. Allow me to explain. As I mentioned, my initial intent was "make parsing strict by default, let the user add permissiveness as needed". I think this goal is good and technically sound. However, there was this slippage in implementation that allowed whitespace preceding floating point values to be ignored. Now, this diff fixes that bug and makes everything consistent. However, it will break the code of everybody who parses floats and ignores whitespace. This is undesirable. A less disruptive change would be to increase permissiveness for everyone. Increasing permissiveness does bereak code. However, little, if any, code is likely to rely on the strict behavior as a feature. (Most likely, code is carefully skipping whitespace explicitly.) The underlying school of thought here would be "be liberal in what you accept and conservative in what you send". Also, there is no net loss of expressiveness because user code can always enforce that there's no preceding whitespace by looking at What do you think? @monarchdodra, sorry for asking you to throw work away. |
I agree with the school basically, however we should not apply it in this case, because it would work just only for the limited underlying text protocol (e.g. http protocol header, it is ASCII-based). Unicode might be the most used text code set, but it does not means that it is a silver bullet for the text processing. I think, unfortunately, if we want to make |
From a technical perspective, I also agree going more strict is better. But from a sociological perspective, I think we should avoid breakage if we can. My liberal vs. conservative quote was an attempt to better motivate the choice technically. I think what I'm saying here is that I think we should go with the technically inferior solution because it is sociologically superior. |
I think Also, people often request how to read ints/floats/strings from stdin, and a much better alternative to auto name = userInput("What is your name");
if (userInput!bool("Do you want to continue?")) { }
auto num = require!(int, "a > 0 && a <= 10")("Enter a number from 1 to 10"); That stuff is probably what most people need rather than |
Frankly speaking, I don't think that current Phobos library is less worth as the standard library of D. Because it seems to me that is a just collection of social decision, except some modules (e.g. std.algorithm, std.range). |
Tell you what, for now, I've extract the 8729 "fix" from this PR, because it contains another fix: Proper handling of premature end of stream. Also, a more efficient internal skipWS. Could we get this pull through alone (once I'm squashed)? This will give me more leeway to make another individual development (especially if we don't know what we want yet). I had actually prepared a fix which I think was pretty good, and should make everyone happy. I'll push it in my (now closed) pull 827 #827 |
There should be no distinction in permissiveness of
I agree for larger issues. But this is a small detail, and again, we're not losing functionality here. Reasonable people may argue that the library should make the more frequent behavior the default (and this is indeed how most other standard parsing conversions work in other languages). |
@@ -68,6 +68,7 @@ private void parseError(lazy string msg, string fn = __FILE__, size_t ln = __LIN | |||
|
|||
private void parseCheck(alias source)(dchar c, string fn = __FILE__, size_t ln = __LINE__) | |||
{ | |||
if(source.empty) parseError(text("unexpected end of input when expecting", "\"", c, "\"")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since if (
is everywhere else in this module, please preserve it
} | ||
else | ||
{ | ||
for ( ; !r.empty && std.ascii.isWhite(r.front) ; r.popFront()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hrm, I'd have hoped this default loop works as well as the specialized implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK (but @jmdavis probably know better than I do), this has something to do with iterating with unicode that makes foreach cheaper.
I could be wrong though... ? :(
merged |
Well, too late to squash I guess :/ |
Just FYI, the behavior I had in mind was something akin to how "to" works: Target parse(Target, Source)(ref Source s) { skipWhite(s); return parseImpl(s); } This way, the "layman" user will just call parse, and it will work "nicely". But the one that wants more control can call "parseImpl" directly. Calling "parseImpl" directly would imply that the input had better damn well be properly prepared. The user can the freelly call "skipWhite" manually if, when and how he feels it is necessary (or other functions). The change also avoids impacting the code too much. Permissive and safe, but customizable. Thoughts? |
I can not agree to enforce calling |
@andralex: Don't forget that your permissive change will break existing codes relies on the current |
How could you possibly determine which decision will end up breaking less code though? It's a guessing game. Maybe it's time to move the convo to NG. |
There are 11 overloads of |
The other general rule of thumb that I pull out in situations like this: it's possible to build loose on top of strict, but not the other way around (unless the loose version somehow gives you access to exactly how it was loose). |
@braddr: I completely agree with you. It is a part of my opinion. |
You're going to break code no matter what you do, but strict is what the documentation says even if it's only clear about that in the examples. Anyone who's been following what the docs say has likely not been bitten by this bug, because they were skipping whitespace when the implementation had already skipped it, but if they care about whitespace, then they will be bitten by making Anyone relying on As you're going to break code regardless, strict is arguably the best implementation given that you can build lax on top of strict but not the other way around and that it's what the docs actually say the function does (not to mention that "be strict in what you emit but lax in what you accept" is one of the worst ideas ever to catch on in programming and has been devastating to the compatibility and standards compliance of the internet). So, while I think that having a parsing function which skips whitespace is useful, I think that it makes the most sense to make As for |
@AndrejMitrovic, it isn't gone, I chose a bad organization style when I started using git. I intend to break it apart, but currently it is hidden under a branch https://github.com/JesseKPhillips/JPDLibs/tree/cmdln |
string s = " 123";
// strict -> lax
auto x = s.stripLeft.to!int();
// lax -> strict
enforce(!s.empty && !isSpace(s.front));
auto x = s.to!int();
Anyway, I don't feel all too strongly about this, and if you all do, no problem. Let's make everything more strict. If we hear user complaints later we will all have learned a good lesson. |
OK. We might receive voices like "parse is inconvenient", but it would be better than "parse is broken" . |
@monarchdodra thanks for being so understanding :o) |
This conclusion is good to me :) IMHO, base library should provide simple feature. |
This PR contains:
[REMOVED]8729: [parse|to]!double should NOT accept " 123.5"[REMOVED]
Typo: "case-insensive" => "case-insensitive"
Proper checking in parse array: These boldly accessed the front of the range, without checking for elements: This created asserts, when parse promises to throw a ConvException.
Anything regarding skip white has now been removed from this pull (which is now just a bug fix).
skipWhite is now a standalone ER @ 827 #827
[EDIT]: Previous branch accidentally deleted with gihub for windows, then botched at reviving. sorry :(
#817