Numeric value for unicode #1052

Closed
wants to merge 2 commits into
from

Projects

None yet

2 participants

@monarchdodra
Collaborator

Note that this development is still under discussion at:
http://forum.dlang.org/thread/nnaebebtfwojtglhjjyk@forum.dlang.org

Pretty straight forward, and as discussed in the message boards.

"isNumber" was also added to std.ascii.

I did run into a couple of issues, namelly that I'm not getting 100% equivalence between chars that are numeric, and chars with numeric value... Is this normal...?

  • There's a fair bit of chars that have numeric value, but aren't isNumber. I think they might be new in 6.1.0. But I'm not sure. I decided it was best to have them return nan, instead of having inconsistent behavior.
  • There's a couple characters in tableLo that have numeric values. These aren't considered in isNumber either. I think this might be a bug though.
  • There are 4 "non-number numeric" characters in "CUNEIFORM NUMERIC SIGN". These return wild values, and in particular two of them return -1. I think this should actually return nan for us, because (AFAIK), -1 is just wild for invalid :/

Maybe we should just return -1 on invalid unicode? Or maybe it's just my input file:
http://www.unicode.org/Public/UNIDATA/UnicodeData.txt
It doesn't have a separate field for isNumber/numericValue, so it is forced to write a wild number. Maybe these four chars should return nan?

Anyways, please feel free to build on this, or destroy it.

@jmdavis jmdavis commented on the diff Jan 4, 2013
std/ascii.d
@@ -277,6 +277,34 @@ unittest
}
}
+/++
+ Returns whether $(D c) is an ASCII numerical character.
+ +/
+bool isNumber(dchar c) @safe pure nothrow
jmdavis
jmdavis Jan 4, 2013 Member

We already have std.ascii.isDigit which does the same thing as isNumber and does it more efficiently.

monarchdodra
monarchdodra Jan 4, 2013 Collaborator

Oops!

Well, technically, unicode defines both isNumber and isDigit. In the long run, we should end up with both std.uni.isNumber and std.uni.isDigit. For ascii, I'd say we should also have a symbol called isNumber, even if it is just an alias of isDigit.

jmdavis
jmdavis Jan 4, 2013 Member

If we need to alias isDigit to isNumber in std.ascii for completeness and compatibility with std.uni, that's fine, but it should be an alias, not a new function, and I see no reason to create the alias until std.uni.isNumber is added, which this pull doesn't appear to do.

monarchdodra
monarchdodra Jan 4, 2013 Collaborator

100% agree about alias over implementation of course.

That said, we already have std.uni.isNumber, it's std.uni.isDigit that is missing.

I can implement that in this pull.

jmdavis
jmdavis Jan 4, 2013 Member

Okay. Please do. And we should identify any other functions which are in std.ascii but not in std.uni (or vice versa) but will need to be in the other in the future and add stub implementations for them (which probably assert(0)) or something so that we can break the code that imports them earlier rather than later. Otherwise, we'll reach the point where we won't be able to add them because of the code breakage that it would cause. Though that's the sort of thing that we risk always breaking with Phobos simply due to how D's module system works.

monarchdodra
monarchdodra Jan 4, 2013 Collaborator

Off the top of my head, there are quite a few a functions that only appear in one or the other, but wouldn't make sense to be both. EG:

  1. isPrivateUse or isSurrogate would make no sense in std.ascii
  2. isOctal and friends wouldn't make much sense in std.uni either.

It would appear the only functions std.uni has that could be in std.ascii are isMark and isNonCharacter.

There would be no actual characters in ascii that would fit those descriptions, so I'm not sure about this one. Adding them would probably be useless, and break code now, rather than prevent breakage.

jmdavis
jmdavis Jan 4, 2013 Member

It's only useful to declare them if we expect to need them later. If a function makes sense in one but not the other, then there's no point in declaring it in the other. If std.ascii.isNumber is the only one that we're missing, then all the better.

Collaborator

I was able to get all the feedback and information I needed for moving forward. I'll close this for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment