Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#8729 fixes #817

Closed
wants to merge 2 commits into from
Closed

#8729 fixes #817

wants to merge 2 commits into from

Conversation

monarchdodra
Copy link
Collaborator

This PR contains:

  1. 8729: [parse|to]!double should NOT accept " 123.5"
  2. Typo: "case-insensive" => "case-insensitive"
  3. Proper checking in parse array: These boldly accessed the front of the range, without checking for elements: This created asserts, when parse promises to throw a ConvException.

Anything regarding skip white has now been removed from this pull (which is now just a bug fix).

skipWhite is now a standalone ER @ 827 #827

@@ -1711,8 +1727,8 @@ Target parse(Target, Source)(ref Source s)
else
{
// Larger than int types
if (s.empty)
goto Lerr;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was not needed, as the code naturally handles empty. No point in paying every time for a case which should rarilly happen.

@9rnsr
Copy link
Contributor

9rnsr commented Sep 28, 2012

This is incorrect change. The parse family should not skip leading white spaces implicitly.
So, just only parse!double should be fixed.

@9rnsr
Copy link
Contributor

9rnsr commented Sep 28, 2012

There are some reasons:

  • It's based on unix philosophy: "Write programs that do one thing and do it well". It's really bad that "some parse functions may remove the leading WS, others doesn't".
  • The definition of whitespace depends on the application work.
    If you write a text processor, skipping "fullwidth whitespace(U+3000)" would be expected, but if you write a D source code parser, it's not expected (See here).
  • With UFCS, you can express the code behavior like follows:
string input = "..."
int n = input.parse!int();
double f = input.skipWhiteSpace.parse!double();
// parse double after removing leading whitespaces

@ghost
Copy link

ghost commented Sep 28, 2012

That's input.stripLeft.parse.. btw.

@9rnsr
Copy link
Contributor

9rnsr commented Sep 28, 2012

That's input.stripLeft.parse.. btw.

It's not so bad. But, unfortunately, std.string.stripLeft doesn't work for character ranges, just for [wd]?string.

@monarchdodra
Copy link
Collaborator Author

OK, 3 things:

  1. I'll rollback the change and fix floating point type case. I'll also investigate the array parser, which I think may also be skiping leading ws.
  2. I'll add a documentation note, as there is apparently some confusion regarding proper behavior of parse. (along with recommended ways of skipping ws)
  3. Regarding the definition of "ws": I thought parse was meant as an end user function, not an internal parser...? Was my change in "skipWS" wrong, or just arguable?

@monarchdodra
Copy link
Collaborator Author

Technically:
double f = input.stripLeft.parse!double();
Will not work, because stripLeft will take and return by value, so:

  1. parse will reffuse to bind a ref to a temporary
  2. even if it did, the original input would not be modified.

Nothing a 2 liner can't fix of course.

@9rnsr
Copy link
Contributor

9rnsr commented Sep 28, 2012

Regarding the definition of "ws": I thought parse was meant as an end user function, not an internal parser...? Was my change in "skipWS" wrong, or just arguable?

skipWS is just used for compound literal parsing: array literal and associative array literal.
Then, it is wrong to me that parsing the string "[1, 2,\u30003]" to [1,2,3] by parse!(int[]).

@andralex
Copy link
Member

The reason for the rigid design or parse is exactly as @9rnsr mentioned. My intent was to have 100% control over the parsed format if there's a need for it. However, it's also true that most often you do want to skip whitespace. So maybe my design was wrong because I made the less common case the default. I'm thinking it would be okay to skip whitespace by default. People who do NOT want to skip whitespace can look at r.front and see if it's a whitespace. That's more work for them, but probably it's a rare case. Thoughts?

@9rnsr
Copy link
Contributor

9rnsr commented Sep 28, 2012

I'm thinking it would be okay to skip whitespace by default.

Maybe it is true, and I can agree with it. But, I'm worried about parse!SomeChar (code).

It strips just one leading character from the given input. If we introduce the skip leading WS behavior, it will become just one overload which has a special behavior? Or will become to strip one or more characters?
In either case, the inconsistent behavior among parse overloads will be introduced.

@monarchdodra
Copy link
Collaborator Author

skipWS is just used for compound literal parsing: array literal and associative array literal.

True, I though about that during the night. Good point.

Maybe it is true, and I can agree with it. But, I'm worried about parse!SomeChar.

It strips just one leading character from the given input. If we introduce the skip leading WS behavior, it will become just one overload which has a special behavior? Or will become to strip one or more characters?

Arguably, I'd expect parse!SomeChar to extract the first non-ws, and strip one or more characters. If "parse" is designed to work anything like C++'s stream parsing (as I thought it did, and apparently, Jonathan M Davis too), then ws is stripped away, including for chars:

std::stringstream ss("123    a");
int i;
char a;
ss >> i >> a;
assert(i == 123);
assert(a == 'a');

In either case, the inconsistent behavior among parse overloads will be introduced.

Actually, that would be consistent behavior, no?

On a side note, your point highlights that my fix was not "complete" because I did not do anything for parse!SomeChar. Gonna fix that now.

My proposal: I'm going to finish this PR, for full support of skip ws parsing, because I already started it. We can then discuss the change in the forums?

In the meantime, I'll also do a clean simple fix for no-strip with doubles (which also has the 1-2 issues I found).

@jmdavis
Copy link
Member

jmdavis commented Sep 29, 2012

I thought that parse did skip whitespace, and it does appear to when parsing integers and floats. So, it looks like it's current behavior is inconsistent regardless of what it should be doing. Personally, I think that it would be most useful if it strip leading whitespace for you so that you can parse without worrying about whitespace but also parse with caring about it, because you can explicitly handle the whitespace after parsing a value (since it would be stripped from in front of a value when parsing it). The one downside to that approach that I can think of is that if you were ignoring whitespace and the string ended with whitespace and you were parsing as long as the string wasn't empty, you'd end up trying to parse whitespace and presumably end up with an exception. It could be solved by making parse configurable with regards to whether it strips whitespace or not, but I don't know if that's complicating things too much or not.

@monarchdodra
Copy link
Collaborator Author

The one downside to that approach that I can think of is that if you were ignoring whitespace and the string ended with whitespace and you were parsing as long as the string wasn't empty, you'd end up trying to parse whitespace and presumably end up with an exception.

You know, that is a very good point actually. I don't think there is a good solution either.

It could be solved by making parse configurable with regards to whether it strips whitespace or not, but I don't know if that's complicating things too much or not.

I think that would be very complicated.


I had thought of a solution I hadn't mentioned yet, but I'm starting to think that maybe just adding a "parseWhite" function which would take and return by ref would be a simple, convenient, low impact and configurable fix, all in one!

string ss = "123 12.5"
int i = ss.parseWhite().parse!double();
double d = ss.parseWhite().parse!double();

or

string ss = "..."
while(!ss.parseWhite().empty())
    writeln(ss.parse!int());

I think it would be a really good solution. Thoughts?

PS: This would NOT be duplication with stripLeft because...

  1. Strip left only operates on natural strings, but not generic range of characters, such as cycle!(char[])
  2. Strip left can't be ufcs chained, and does not modify by ref, which would make it verboselly inconvenient.

{
for ( ; !r.empty && std.ascii.isWhite(r.front) ; r.popFront())
{ }
}
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arguably more efficient in all cases.

@monarchdodra
Copy link
Collaborator Author

Done! Thoughts?

@9rnsr
Copy link
Contributor

9rnsr commented Sep 29, 2012

I like this direction, except one point: it is the name of parseWhite.
It just skip leading white spaces instead of parse them, because parseWhite does not return parsed them.
So, I think that skipWhite is better name than it..

@monarchdodra
Copy link
Collaborator Author

I like this direction, except one point: it is the name of parseWhite.
It just skip leading white spaces instead of parse them, because parseWhite does not return parsed them.
So, I think that skipWhite is better name than it..

I had the exact same thought, but at the same time, I would have liked the word "parse" to appear in the function name... :/

skipWhite is probably better though.

@monarchdodra
Copy link
Collaborator Author

Per the "1 change per pull", I removed the skipWhite dev from this PR. Please see #827 instead

@monarchdodra monarchdodra reopened this Oct 2, 2012
@monarchdodra monarchdodra mentioned this pull request Oct 2, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants