-
-
Notifications
You must be signed in to change notification settings - Fork 705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support html named entities in std.conv.parseEscape #9600
Labels
Comments
dmitry.olsh (@DmitryOlshansky) commented on 2013-03-01T02:59:43ZIs it documented anywhere that std.conv.parse should follow D lexer conventions on parsing??
If not I guess we shouldn't pretend it does and pull the whole freaking table of HTML4/5 entities in *every* program that uses parse to read a couple of ints. |
monarchdodra commented on 2013-03-01T03:27:10Z(In reply to comment #1)
> Is it documented anywhere that std.conv.parse should follow D lexer conventions
> on parsing??
Well it's kind of implied, isn't it? Why would parse follow a convention other than D's ? No it's not documented, but I do remember somewhere in the threads that Jonathan (I thin it was him), specifically saying that the idea is that it allowed parsing pretty much anything that's valid D.
> If not I guess we shouldn't pretend it does and pull the whole freaking table
> of HTML4/5 entities in *every* program that uses parse to read a couple of
> ints.
I Disagree because the function *is* named parse, and is capable of parsing a string, and returning the object parsed (in this case a string). If "\"" is a valid D string, then I'd expect parse to not choke on it.
As long as the user is parsing string to int, then no, he shouldn't need it, but if the parse outcome is a string, there is no excuse to not do it right.
Shouldn't the fact that the table would only ever be used in a template function (parse) mean the compiler should be able to know whether or not to link with said table? Or would importing std.conv immediately link in the table into the final executable? |
monarchdodra commented on 2013-03-01T03:30:39Z(In reply to comment #1)
> If not I guess we shouldn't pretend it does and pull the whole freaking table
> of HTML4/5 entities in *every* program that uses parse to read a couple of
> ints.
How does std.uni does it?
I mean, in the case I want to know if unicode character is white, does it mean I'll have to pull the entire unicode tables for isUpper etc. etc. etc.
I'm not trying to justify by comparison, but trying to see how other modules work with this "problem". |
dmitry.olsh (@DmitryOlshansky) commented on 2013-03-01T04:12:34Z(In reply to comment #3)
> (In reply to comment #1)
> > If not I guess we shouldn't pretend it does and pull the whole freaking table
> > of HTML4/5 entities in *every* program that uses parse to read a couple of
> > ints.
>
> How does std.uni does it?
>
That's why I'm increasinlgy against of adding tables that are hidden behind opaque interface. I feel uneasy about it.
That's why I exposed all I ould about tables & predefined sets in std.uni. For instance any set is usable not only for std.uni puprposes. I also took tremendous effort to not include tables unless user code needs them and will seek new ways to avoid it.
Having a dead HTML5 entity table burried beneath innocently looking function is NOT good enough. If we do it there HAS to be a way to tap into HTML entities so that people wouldn't have to include the VERY SAME table twice should they need full access to HTML5 entities.
> I mean, in the case I want to know if unicode character is white, does it mean
> I'll have to pull the entire unicode tables for isUpper etc. etc. etc.
Something I'm going to change. Technically there is no reason to pull these tables. Also in case of parse the cost to benefit is far greater since if you use isXXX you surely need the table, period. In case of parse you may easily never hit escape sequence or even mean to unescape it in your data but you'd pay all the same.
> I'm not trying to justify by comparison, but trying to see how other modules
> work with this "problem".
I thought std.conv.parse goal was closer to sscanf of C. In other words that it's a backbone behind the formattedRead, readf etc.
If the goal is to parse whatever D strings are I fail to see the use case as e.g. std.d.lexer would 100% likely to use its own tricks to process escapes etc. to be more efficient. |
dmitry.olsh (@DmitryOlshansky) commented on 2013-03-01T04:13:40Z> Something I'm going to change. Technically there is no reason to pull these
> tables. Also in case of parse the cost to benefit is far
I've meant lower, obviously.
> since if you
> use isXXX you surely need the table, period. In case of parse you may easily
> never hit escape sequence or even mean to unescape it in your data but you'd
> pay all the same. |
dmitry.olsh (@DmitryOlshansky) commented on 2013-03-01T04:33:15Z(In reply to comment #5)
> > Something I'm going to change. Technically there is no reason to pull these
> > tables. Also in case of parse the cost to benefit is far
>
> I've meant lower, obviously.
Looks like I'm on streak... for std.conv.parse it's *higher* cost to benefit ratio after all. Sorry for the confusion. |
monarchdodra commented on 2013-03-01T04:50:56Z(In reply to comment #4)
> I thought std.conv.parse goal was closer to sscanf of C. In other words that
> it's a backbone behind the formattedRead, readf etc.
I guess the whole discussion boils down to rather "what should/does formattedRead" accept then? Given the fact that it is "higher order" and capable of parsing arrays of stuff, what happens what it parses a string that represents an array of strings?
I mean, imagine this program:
string s1 = ...
string s2[];
formattedRead(s1, "%s", &s2);
The question is: What are legal s1 values?
s1 = `["a", "b"]`; => ["a", "b"]
s1 = `["a", "b", ]`; => ["a", "b"] (1)
s1 = `["ab", ['a', 'b']]` => ["ab", "ab"]
s1 = `["\t", "\n"]`; => ["\t", "\n"]
s1 = `["\0"]`; => ["\0"] (2)
s1 = `["\141"]`; => ["a"]
s1 = `["\x61"]`; => ["a"]
s1 = `["\u0061"]`; => ["a"]
s1 = `["\U00000061"]`; => ["a"]
s1 = `["\&"]`; => ["&"] (3)
(1) //Not currently supported
(2) //Not currently supported
(3) //Not currently supported
Unless formatted read can document what it can(should) and doesn't support, we'll just run around in circles... |
dlang-bot commented on 2019-11-13T12:08:47Z@berni44 created dlang/phobos pull request #7273 "Fix partially Issue 9621 - std.conv.parseEscape fails on octals and named" mentioning this issue:
- Fix partially Issue 9621 - std.conv.parseEscape fails on octals and named
https://github.com/dlang/phobos/pull/7273 |
dlang-bot commented on 2019-11-13T19:20:20Z@berni44 created dlang/phobos pull request #7274 "Fix Issue 9621 - std.conv.parseEscape fails on octals and named" fixing this issue:
- Fix Issue 9621 - std.conv.parseEscape fails on octals and named
https://github.com/dlang/phobos/pull/7274 |
dlang-bot commented on 2019-11-14T04:41:31Zdlang/phobos pull request #7273 "Fix partially Issue 9621 - std.conv.parseEscape fails on octals and named" was merged into master:
- 932d49b2178c52ebc6c74f11f9797d6ff85c0ab0 by Bernhard Seckinger:
Fix partially Issue 9621 - std.conv.parseEscape fails on octals and named
https://github.com/dlang/phobos/pull/7273 |
dkorpel commented on 2022-11-06T16:37:36ZThe octal part has been fixed, so I changed the title accordingly |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
monarchdodra reported this on 2013-03-01T02:27:46Z
Transfered from https://issues.dlang.org/show_bug.cgi?id=9621
CC List
Description
D allows this: void main() { string s1 = "\&"; string s2 = "\141"; assert(s1 == "&"); assert(s2 == "a"); } But parse doesn't allow it (not supported in parse escape). //---- void main() { string s1 = `[ "\&", "\141" ]`; writeln(parse!(string[])(s1)); } //---- Can't parse string: Unknown escape character & Can't parse string: Unknown escape character 1The text was updated successfully, but these errors were encountered: