Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

#8729 fixes #817

Closed
wants to merge 2 commits into
from

Conversation

Projects
None yet
4 participants
Collaborator

monarchdodra commented Sep 27, 2012

This PR contains:

  1. 8729: [parse|to]!double should NOT accept " 123.5"
  2. Typo: "case-insensive" => "case-insensitive"
  3. Proper checking in parse array: These boldly accessed the front of the range, without checking for elements: This created asserts, when parse promises to throw a ConvException.

Anything regarding skip white has now been removed from this pull (which is now just a bug fix).

skipWhite is now a standalone ER @ 827 D-Programming-Language#827

@monarchdodra monarchdodra commented on the diff Sep 27, 2012

std/conv.d
@@ -1711,8 +1727,8 @@ Target parse(Target, Source)(ref Source s)
else
{
// Larger than int types
- if (s.empty)
- goto Lerr;
@monarchdodra

monarchdodra Sep 27, 2012

Collaborator

This test was not needed, as the code naturally handles empty. No point in paying every time for a case which should rarilly happen.

@monarchdodra monarchdodra commented on an outdated diff Sep 27, 2012

std/conv.d
@@ -2719,7 +2766,8 @@ unittest
private void skipWS(R)(ref R r)
{
- skipAll(r, ' ', '\n', '\t', '\r');
+ for ( ; !r.empty && std.uni.isWhite(r.front) ; r.popFront())
+ { }
@monarchdodra

monarchdodra Sep 27, 2012

Collaborator

This code was changed because it did not properly support unicode whitespaces (such as the japanese double width hiragana space and whatnot).

It may be (arguably) more efficient to boot.

Member

9rnsr commented Sep 28, 2012

This is incorrect change. The parse family should not skip leading white spaces implicitly.
So, just only parse!double should be fixed.

Member

9rnsr commented Sep 28, 2012

There are some reasons:

  • It's based on unix philosophy: "Write programs that do one thing and do it well". It's really bad that "some parse functions may remove the leading WS, others doesn't".
  • The definition of whitespace depends on the application work.
    If you write a text processor, skipping "fullwidth whitespace(U+3000)" would be expected, but if you write a D source code parser, it's not expected (See here).
  • With UFCS, you can express the code behavior like follows:
string input = "..."
int n = input.parse!int();
double f = input.skipWhiteSpace.parse!double();
// parse double after removing leading whitespaces
@ghost

ghost commented Sep 28, 2012

That's input.stripLeft.parse.. btw.

Member

9rnsr commented Sep 28, 2012

That's input.stripLeft.parse.. btw.

It's not so bad. But, unfortunately, std.string.stripLeft doesn't work for character ranges, just for [wd]?string.

Collaborator

monarchdodra commented Sep 28, 2012

OK, 3 things:

  1. I'll rollback the change and fix floating point type case. I'll also investigate the array parser, which I think may also be skiping leading ws.
  2. I'll add a documentation note, as there is apparently some confusion regarding proper behavior of parse. (along with recommended ways of skipping ws)
  3. Regarding the definition of "ws": I thought parse was meant as an end user function, not an internal parser...? Was my change in "skipWS" wrong, or just arguable?
Collaborator

monarchdodra commented Sep 28, 2012

Technically:
double f = input.stripLeft.parse!double();
Will not work, because stripLeft will take and return by value, so:

  1. parse will reffuse to bind a ref to a temporary
  2. even if it did, the original input would not be modified.

Nothing a 2 liner can't fix of course.

Member

9rnsr commented Sep 28, 2012

Regarding the definition of "ws": I thought parse was meant as an end user function, not an internal parser...? Was my change in "skipWS" wrong, or just arguable?

skipWS is just used for compound literal parsing: array literal and associative array literal.
Then, it is wrong to me that parsing the string "[1, 2,\u30003]" to [1,2,3] by parse!(int[]).

Owner

andralex commented Sep 28, 2012

The reason for the rigid design or parse is exactly as @9rnsr mentioned. My intent was to have 100% control over the parsed format if there's a need for it. However, it's also true that most often you do want to skip whitespace. So maybe my design was wrong because I made the less common case the default. I'm thinking it would be okay to skip whitespace by default. People who do NOT want to skip whitespace can look at r.front and see if it's a whitespace. That's more work for them, but probably it's a rare case. Thoughts?

Member

9rnsr commented Sep 28, 2012

I'm thinking it would be okay to skip whitespace by default.

Maybe it is true, and I can agree with it. But, I'm worried about parse!SomeChar (code).

It strips just one leading character from the given input. If we introduce the skip leading WS behavior, it will become just one overload which has a special behavior? Or will become to strip one or more characters?
In either case, the inconsistent behavior among parse overloads will be introduced.

Collaborator

monarchdodra commented Sep 28, 2012

skipWS is just used for compound literal parsing: array literal and associative array literal.

True, I though about that during the night. Good point.

Maybe it is true, and I can agree with it. But, I'm worried about parse!SomeChar.

It strips just one leading character from the given input. If we introduce the skip leading WS behavior, it will become just one overload which has a special behavior? Or will become to strip one or more characters?

Arguably, I'd expect parse!SomeChar to extract the first non-ws, and strip one or more characters. If "parse" is designed to work anything like C++'s stream parsing (as I thought it did, and apparently, Jonathan M Davis too), then ws is stripped away, including for chars:

std::stringstream ss("123    a");
int i;
char a;
ss >> i >> a;
assert(i == 123);
assert(a == 'a');

In either case, the inconsistent behavior among parse overloads will be introduced.

Actually, that would be consistent behavior, no?

On a side note, your point highlights that my fix was not "complete" because I did not do anything for parse!SomeChar. Gonna fix that now.

My proposal: I'm going to finish this PR, for full support of skip ws parsing, because I already started it. We can then discuss the change in the forums?

In the meantime, I'll also do a clean simple fix for no-strip with doubles (which also has the 1-2 issues I found).

Member

jmdavis commented Sep 29, 2012

I thought that parse did skip whitespace, and it does appear to when parsing integers and floats. So, it looks like it's current behavior is inconsistent regardless of what it should be doing. Personally, I think that it would be most useful if it strip leading whitespace for you so that you can parse without worrying about whitespace but also parse with caring about it, because you can explicitly handle the whitespace after parsing a value (since it would be stripped from in front of a value when parsing it). The one downside to that approach that I can think of is that if you were ignoring whitespace and the string ended with whitespace and you were parsing as long as the string wasn't empty, you'd end up trying to parse whitespace and presumably end up with an exception. It could be solved by making parse configurable with regards to whether it strips whitespace or not, but I don't know if that's complicating things too much or not.

Collaborator

monarchdodra commented Sep 29, 2012

The one downside to that approach that I can think of is that if you were ignoring whitespace and the string ended with whitespace and you were parsing as long as the string wasn't empty, you'd end up trying to parse whitespace and presumably end up with an exception.

You know, that is a very good point actually. I don't think there is a good solution either.

It could be solved by making parse configurable with regards to whether it strips whitespace or not, but I don't know if that's complicating things too much or not.

I think that would be very complicated.


I had thought of a solution I hadn't mentioned yet, but I'm starting to think that maybe just adding a "parseWhite" function which would take and return by ref would be a simple, convenient, low impact and configurable fix, all in one!

string ss = "123 12.5"
int i = ss.parseWhite().parse!double();
double d = ss.parseWhite().parse!double();

or

string ss = "..."
while(!ss.parseWhite().empty())
    writeln(ss.parse!int());

I think it would be a really good solution. Thoughts?

PS: This would NOT be duplication with stripLeft because...

  1. Strip left only operates on natural strings, but not generic range of characters, such as cycle!(char[])
  2. Strip left can't be ufcs chained, and does not modify by ref, which would make it verboselly inconvenient.

@monarchdodra monarchdodra commented on the diff Sep 29, 2012

std/conv.d
+ foreach(i, dchar c; r)
+ {
+ if(!std.ascii.isWhite(c))
+ {
+ r = r[i .. $];
+ return;
+ }
+ }
+ r = r[0 .. 0]; //Empty string with correct type.
+ return;
+ }
+ else
+ {
+ for ( ; !r.empty && std.ascii.isWhite(r.front) ; r.popFront())
+ { }
+ }
}
@monarchdodra

monarchdodra Sep 29, 2012

Collaborator

Arguably more efficient in all cases.

@monarchdodra monarchdodra commented on the diff Sep 29, 2012

std/conv.d
@@ -2788,6 +2838,14 @@ unittest
assert(a2 == ["aaa", "bbb", "ccc"]);
}
+unittest
+{
+ //Check proper failure
+ auto ss = "[ 1 , 2 , 3 ]";
+ foreach(i ; 0..ss.length-1)
+ assertThrown!ConvException(parse!(int[])(ss[0 .. i]));
+}
@monarchdodra

monarchdodra Sep 29, 2012

Collaborator

!Important test!

Collaborator

monarchdodra commented Sep 29, 2012

Done! Thoughts?

Member

9rnsr commented Sep 29, 2012

I like this direction, except one point: it is the name of parseWhite.
It just skip leading white spaces instead of parse them, because parseWhite does not return parsed them.
So, I think that skipWhite is better name than it..

Collaborator

monarchdodra commented Sep 29, 2012

I like this direction, except one point: it is the name of parseWhite.
It just skip leading white spaces instead of parse them, because parseWhite does not return parsed them.
So, I think that skipWhite is better name than it..

I had the exact same thought, but at the same time, I would have liked the word "parse" to appear in the function name... :/

skipWhite is probably better though.

Collaborator

monarchdodra commented Oct 1, 2012

Per the "1 change per pull", I removed the skipWhite dev from this PR. Please see D-Programming-Language#827 instead

@monarchdodra monarchdodra reopened this Oct 2, 2012

@monarchdodra monarchdodra referenced this pull request Oct 2, 2012

Merged

#8729 fixes #828

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment