Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numeric value for unicode #1052

Closed
wants to merge 2 commits into from
Closed

Numeric value for unicode #1052

wants to merge 2 commits into from

Conversation

monarchdodra
Copy link
Collaborator

Note that this development is still under discussion at:
http://forum.dlang.org/thread/nnaebebtfwojtglhjjyk@forum.dlang.org

Pretty straight forward, and as discussed in the message boards.

"isNumber" was also added to std.ascii.

I did run into a couple of issues, namelly that I'm not getting 100% equivalence between chars that are numeric, and chars with numeric value... Is this normal...?

  • There's a fair bit of chars that have numeric value, but aren't isNumber. I think they might be new in 6.1.0. But I'm not sure. I decided it was best to have them return nan, instead of having inconsistent behavior.
  • There's a couple characters in tableLo that have numeric values. These aren't considered in isNumber either. I think this might be a bug though.
  • There are 4 "non-number numeric" characters in "CUNEIFORM NUMERIC SIGN". These return wild values, and in particular two of them return -1. I think this should actually return nan for us, because (AFAIK), -1 is just wild for invalid :/

Maybe we should just return -1 on invalid unicode? Or maybe it's just my input file:
http://www.unicode.org/Public/UNIDATA/UnicodeData.txt
It doesn't have a separate field for isNumber/numericValue, so it is forced to write a wild number. Maybe these four chars should return nan?

Anyways, please feel free to build on this, or destroy it.

/++
Returns whether $(D c) is an ASCII numerical character.
+/
bool isNumber(dchar c) @safe pure nothrow
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have std.ascii.isDigit which does the same thing as isNumber and does it more efficiently.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops!

Well, technically, unicode defines both isNumber and isDigit. In the long run, we should end up with both std.uni.isNumber and std.uni.isDigit. For ascii, I'd say we should also have a symbol called isNumber, even if it is just an alias of isDigit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need to alias isDigit to isNumber in std.ascii for completeness and compatibility with std.uni, that's fine, but it should be an alias, not a new function, and I see no reason to create the alias until std.uni.isNumber is added, which this pull doesn't appear to do.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100% agree about alias over implementation of course.

That said, we already have std.uni.isNumber, it's std.uni.isDigit that is missing.

I can implement that in this pull.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Please do. And we should identify any other functions which are in std.ascii but not in std.uni (or vice versa) but will need to be in the other in the future and add stub implementations for them (which probably assert(0)) or something so that we can break the code that imports them earlier rather than later. Otherwise, we'll reach the point where we won't be able to add them because of the code breakage that it would cause. Though that's the sort of thing that we risk always breaking with Phobos simply due to how D's module system works.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Off the top of my head, there are quite a few a functions that only appear in one or the other, but wouldn't make sense to be both. EG:

  1. isPrivateUse or isSurrogate would make no sense in std.ascii
  2. isOctal and friends wouldn't make much sense in std.uni either.

It would appear the only functions std.uni has that could be in std.ascii are isMark and isNonCharacter.

There would be no actual characters in ascii that would fit those descriptions, so I'm not sure about this one. Adding them would probably be useless, and break code now, rather than prevent breakage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's only useful to declare them if we expect to need them later. If a function makes sense in one but not the other, then there's no point in declaring it in the other. If std.ascii.isNumber is the only one that we're missing, then all the better.

@monarchdodra
Copy link
Collaborator Author

I was able to get all the feedback and information I needed for moving forward. I'll close this for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants