-
-
Notifications
You must be signed in to change notification settings - Fork 706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Numeric value for unicode #1052
Conversation
| /++ | ||
| Returns whether $(D c) is an ASCII numerical character. | ||
| +/ | ||
| bool isNumber(dchar c) @safe pure nothrow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have std.ascii.isDigit which does the same thing as isNumber and does it more efficiently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops!
Well, technically, unicode defines both isNumber and isDigit. In the long run, we should end up with both std.uni.isNumber and std.uni.isDigit. For ascii, I'd say we should also have a symbol called isNumber, even if it is just an alias of isDigit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we need to alias isDigit to isNumber in std.ascii for completeness and compatibility with std.uni, that's fine, but it should be an alias, not a new function, and I see no reason to create the alias until std.uni.isNumber is added, which this pull doesn't appear to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100% agree about alias over implementation of course.
That said, we already have std.uni.isNumber, it's std.uni.isDigit that is missing.
I can implement that in this pull.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. Please do. And we should identify any other functions which are in std.ascii but not in std.uni (or vice versa) but will need to be in the other in the future and add stub implementations for them (which probably assert(0)) or something so that we can break the code that imports them earlier rather than later. Otherwise, we'll reach the point where we won't be able to add them because of the code breakage that it would cause. Though that's the sort of thing that we risk always breaking with Phobos simply due to how D's module system works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Off the top of my head, there are quite a few a functions that only appear in one or the other, but wouldn't make sense to be both. EG:
isPrivateUseorisSurrogatewould make no sense instd.asciiisOctaland friends wouldn't make much sense instd.unieither.
It would appear the only functions std.uni has that could be in std.ascii are isMark and isNonCharacter.
There would be no actual characters in ascii that would fit those descriptions, so I'm not sure about this one. Adding them would probably be useless, and break code now, rather than prevent breakage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's only useful to declare them if we expect to need them later. If a function makes sense in one but not the other, then there's no point in declaring it in the other. If std.ascii.isNumber is the only one that we're missing, then all the better.
|
I was able to get all the feedback and information I needed for moving forward. I'll close this for now. |
Note that this development is still under discussion at:
http://forum.dlang.org/thread/nnaebebtfwojtglhjjyk@forum.dlang.org
Pretty straight forward, and as discussed in the message boards.
"isNumber" was also added to std.ascii.
I did run into a couple of issues, namelly that I'm not getting 100% equivalence between chars that are numeric, and chars with numeric value... Is this normal...?
Maybe we should just return -1 on invalid unicode? Or maybe it's just my input file:
http://www.unicode.org/Public/UNIDATA/UnicodeData.txt
It doesn't have a separate field for isNumber/numericValue, so it is forced to write a wild number. Maybe these four chars should return nan?
Anyways, please feel free to build on this, or destroy it.