Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#8729 fixes #828

Merged
merged 5 commits into from
Oct 2, 2012
Merged

#8729 fixes #828

merged 5 commits into from
Oct 2, 2012

Conversation

monarchdodra
Copy link
Collaborator

This PR contains:

[REMOVED]8729: [parse|to]!double should NOT accept " 123.5"[REMOVED]
Typo: "case-insensive" => "case-insensitive"
Proper checking in parse array: These boldly accessed the front of the range, without checking for elements: This created asserts, when parse promises to throw a ConvException.

Anything regarding skip white has now been removed from this pull (which is now just a bug fix).

skipWhite is now a standalone ER @ 827 #827

[EDIT]: Previous branch accidentally deleted with gihub for windows, then botched at reviving. sorry :(
#817

@9rnsr
Copy link
Contributor

9rnsr commented Oct 2, 2012

LGTM. @andralex , Is it OK to merge this?

@andralex
Copy link
Member

andralex commented Oct 2, 2012

@9rnsr, @monarchdodra: I've been thinking more about this all and unfortunately my conclusion went the other way. Allow me to explain.

As I mentioned, my initial intent was "make parsing strict by default, let the user add permissiveness as needed". I think this goal is good and technically sound. However, there was this slippage in implementation that allowed whitespace preceding floating point values to be ignored.

Now, this diff fixes that bug and makes everything consistent. However, it will break the code of everybody who parses floats and ignores whitespace. This is undesirable.

A less disruptive change would be to increase permissiveness for everyone. Increasing permissiveness does bereak code. However, little, if any, code is likely to rely on the strict behavior as a feature. (Most likely, code is carefully skipping whitespace explicitly.) The underlying school of thought here would be "be liberal in what you accept and conservative in what you send". Also, there is no net loss of expressiveness because user code can always enforce that there's no preceding whitespace by looking at .front.

What do you think? @monarchdodra, sorry for asking you to throw work away.

@9rnsr
Copy link
Contributor

9rnsr commented Oct 2, 2012

"be liberal in what you accept and conservative in what you send"

I agree with the school basically, however we should not apply it in this case, because it would work just only for the limited underlying text protocol (e.g. http protocol header, it is ASCII-based).
But, parse would be used for general text parsing - used for the unlimited underlying text protocol. Various spaces are defined in Unicode, and often, skipping all of them would be unexpected.

Unicode might be the most used text code set, but it does not means that it is a silver bullet for the text processing. I think, unfortunately, if we want to make parse function convenient, we should stop adding permissive behaviors.

@andralex
Copy link
Member

andralex commented Oct 2, 2012

From a technical perspective, I also agree going more strict is better. But from a sociological perspective, I think we should avoid breakage if we can. My liberal vs. conservative quote was an attempt to better motivate the choice technically.

I think what I'm saying here is that I think we should go with the technically inferior solution because it is sociologically superior.

@ghost
Copy link

ghost commented Oct 2, 2012

I think parse should be more strict but to should be more permissive. I have a gut feeling that people who don't normalize input before passing it to a conversion function usually use to. I have no data to back my claims though. :)

Also, people often request how to read ints/floats/strings from stdin, and a much better alternative to to is the cmdln.interact library from @JesseKPhillips which is a small but versatile set of 3-4 functions (~200 lines with comments). It seems to have disappeared from github though (although I have a copy and it's boost-licensed). I mean check this out:

auto name = userInput("What is your name");
if (userInput!bool("Do you want to continue?")) { }
auto num = require!(int, "a > 0 && a <= 10")("Enter a number from 1 to 10");

That stuff is probably what most people need rather than to or parse, at least when dealing with input strings.

@9rnsr
Copy link
Contributor

9rnsr commented Oct 2, 2012

Frankly speaking, I don't think that current Phobos library is less worth as the standard library of D. Because it seems to me that is a just collection of social decision, except some modules (e.g. std.algorithm, std.range).
If std.conv module is the third party library in the D package system, like CPAN, the behavior you talk is acceptable. But, it is in the standard library. I think we must select the best design for the modules under std package.
Furthermore, D is still not used widely, and avoiding breaking existing codes is too, too early IMO.

@monarchdodra
Copy link
Collaborator Author

Tell you what, for now, I've extract the 8729 "fix" from this PR, because it contains another fix: Proper handling of premature end of stream. Also, a more efficient internal skipWS.

Could we get this pull through alone (once I'm squashed)? This will give me more leeway to make another individual development (especially if we don't know what we want yet).

I had actually prepared a fix which I think was pretty good, and should make everyone happy. I'll push it in my (now closed) pull 827 #827

@andralex
Copy link
Member

andralex commented Oct 2, 2012

I think parse should be more strict but to should be more permissive. I have a gut feeling that people who don't normalize input before passing it to a conversion function usually use to. I have no data to back my claims though. :)

There should be no distinction in permissiveness of parse and to. The only reason for the existence of the two is that parse records progress in input and as such is composable.

If std.conv module is the third party library in the D package system, like CPAN, the behavior you talk is acceptable. But, it is in the standard library. I think we must select the best design for the modules under std package.

I agree for larger issues. But this is a small detail, and again, we're not losing functionality here. Reasonable people may argue that the library should make the more frequent behavior the default (and this is indeed how most other standard parsing conversions work in other languages).

@@ -68,6 +68,7 @@ private void parseError(lazy string msg, string fn = __FILE__, size_t ln = __LIN

private void parseCheck(alias source)(dchar c, string fn = __FILE__, size_t ln = __LINE__)
{
if(source.empty) parseError(text("unexpected end of input when expecting", "\"", c, "\""));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since if ( is everywhere else in this module, please preserve it

}
else
{
for ( ; !r.empty && std.ascii.isWhite(r.front) ; r.popFront())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hrm, I'd have hoped this default loop works as well as the specialized implementation

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK (but @jmdavis probably know better than I do), this has something to do with iterating with unicode that makes foreach cheaper.

I could be wrong though... ? :(

andralex added a commit that referenced this pull request Oct 2, 2012
@andralex andralex merged commit 196107e into dlang:master Oct 2, 2012
@andralex
Copy link
Member

andralex commented Oct 2, 2012

merged

@monarchdodra
Copy link
Collaborator Author

Well, too late to squash I guess :/

@monarchdodra
Copy link
Collaborator Author

Just FYI, the behavior I had in mind was something akin to how "to" works:

Target parse(Target, Source)(ref Source s)
{
    skipWhite(s);
    return parseImpl(s);
}

This way, the "layman" user will just call parse, and it will work "nicely". But the one that wants more control can call "parseImpl" directly.

Calling "parseImpl" directly would imply that the input had better damn well be properly prepared. The user can the freelly call "skipWhite" manually if, when and how he feels it is necessary (or other functions). The change also avoids impacting the code too much.

Permissive and safe, but customizable. Thoughts?

@9rnsr
Copy link
Contributor

9rnsr commented Oct 2, 2012

I can not agree to enforce calling parseImpl by users.
Instead, adding require function (@AndrejMitrovic says in here) == skipWhite + strict parse is still better.

@9rnsr
Copy link
Contributor

9rnsr commented Oct 2, 2012

@andralex: Don't forget that your permissive change will break existing codes relies on the current parse functions strict behavior except parse!double. About that point, mine and your argument are same. Then, the issue is "which is breaking less code than another, and more consistent?".

@ghost
Copy link

ghost commented Oct 2, 2012

How could you possibly determine which decision will end up breaking less code though? It's a guessing game. Maybe it's time to move the convo to NG.

@9rnsr
Copy link
Contributor

9rnsr commented Oct 2, 2012

There are 11 overloads of parse in std.conv, and just one overload skips leading white spaces.
Based on @andralex argument, 10/11 functions behavior will be changed. Based on mine, 1/11 will be changed.

@braddr
Copy link
Member

braddr commented Oct 2, 2012

The other general rule of thumb that I pull out in situations like this: it's possible to build loose on top of strict, but not the other way around (unless the loose version somehow gives you access to exactly how it was loose).

@9rnsr
Copy link
Contributor

9rnsr commented Oct 2, 2012

@braddr: I completely agree with you. It is a part of my opinion.

@jmdavis
Copy link
Member

jmdavis commented Oct 2, 2012

You're going to break code no matter what you do, but strict is what the documentation says even if it's only clear about that in the examples. Anyone who's been following what the docs say has likely not been bitten by this bug, because they were skipping whitespace when the implementation had already skipped it, but if they care about whitespace, then they will be bitten by making parse lax.

Anyone relying on parse's laxness is obviously only using a very specific subset of its functionality, because otherwise, they'd have hit bug# 8729 as reported (which was based on the incorrect understanding that parse always skipped whitespace). So, making it strict will break those people's code but not the code of anyone using many of the overloads of parse (since most of them are properly strict) and not the code of anyone who was already specifically skipping whitespace per the docs. Making it lax will avoid breaking their code but again will break the folks actually following the docs.

As you're going to break code regardless, strict is arguably the best implementation given that you can build lax on top of strict but not the other way around and that it's what the docs actually say the function does (not to mention that "be strict in what you emit but lax in what you accept" is one of the worst ideas ever to catch on in programming and has been devastating to the compatibility and standards compliance of the internet). So, while I think that having a parsing function which skips whitespace is useful, I think that it makes the most sense to make parse fully strict as originally intended and documented.

As for to, I've long argued that it should be strict with regards to whitespace whenever it's come up, and I'd hate to see it become lax just because we wanted it to be consistent with parse and we decided to make parse lax due to this bug (not to mention, I think that it's safe to say that changing to would break way more code than changing parse would).

@JesseKPhillips
Copy link
Contributor

@AndrejMitrovic, it isn't gone, I chose a bad organization style when I started using git. I intend to break it apart, but currently it is hidden under a branch https://github.com/JesseKPhillips/JPDLibs/tree/cmdln

@andralex
Copy link
Member

andralex commented Oct 2, 2012

  1. In this case you can build either on top of either.
string s = "  123";
// strict -> lax
auto x = s.stripLeft.to!int();
// lax -> strict
enforce(!s.empty && !isSpace(s.front));
auto x = s.to!int();
  1. The "less breakage" argument is that increasing acceptance will cause more strings to be parsable, not fewer. Clearly there's the converse risk that more silent errors will happen.

Anyway, I don't feel all too strongly about this, and if you all do, no problem. Let's make everything more strict. If we hear user complaints later we will all have learned a good lesson.

@9rnsr
Copy link
Contributor

9rnsr commented Oct 2, 2012

Anyway, I don't feel all too strongly about this, and if you all do, no problem. Let's make everything more strict. If we hear user complaints later we will all have learned a good lesson.

OK. We might receive voices like "parse is inconvenient", but it would be better than "parse is broken" .

@andralex
Copy link
Member

andralex commented Oct 2, 2012

@monarchdodra thanks for being so understanding :o)

@repeatedly
Copy link
Member

This conclusion is good to me :)

IMHO, base library should provide simple feature.
An application has own requirement, so the sanitization like ws skipping should be applied in its application layer or by specified function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants