Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Add support for NumericValue #71

Merged
merged 9 commits into from
May 31, 2022
Merged

WIP: Add support for NumericValue #71

merged 9 commits into from
May 31, 2022

Conversation

wismill
Copy link
Collaborator

@wismill wismill commented May 13, 2022

Fixes #68

Copy link
Member

@harendra-kumar harendra-kumar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just some nitpicks.

@@ -41,3 +45,15 @@ isNumber c = case generalCategory c of
LetterNumber -> True
OtherNumber -> True
_ -> False


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove additional blank line.

lib/Unicode/Char/Numeric.hs Outdated Show resolved Hide resolved
lib/Unicode/Char/Numeric.hs Outdated Show resolved Hide resolved
it "isNumber implies a numeric value" do
-- [NOTE] the following does not hold with the current predicate `isNumber`.
-- 'let check c = (UNumeric.isNumber c `xor` isNothing (UNumeric.numericValue c))
let check c = not (UNumeric.isNumber c) || isJust (UNumeric.numericValue c)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how many such cases are there where isNumber returns False but the value is numeric. Is it a lot of cases or just a few?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

81 CJK characters:

ns = filter (\c -> not (isNumber c `xor`  isNothing (numericValue c))) [minBound .. maxBound]
-- 81
(\c -> (showHex (ord c) "", generalCategory c)) <$> ns
-- [("3405",OtherLetter),("3483",OtherLetter),("382a",OtherLetter),("3b4d",OtherLetter),("4e00",OtherLetter),("4e03",OtherLetter),("4e07",OtherLetter),("4e09",OtherLetter),("4e5d",OtherLetter),("4e8c",OtherLetter),("4e94",OtherLetter),("4e96",OtherLetter),("4ebf",OtherLetter),("4ec0",OtherLetter),("4edf",OtherLetter),("4ee8",OtherLetter),("4f0d",OtherLetter),("4f70",OtherLetter),("5104",OtherLetter),("5146",OtherLetter),("5169",OtherLetter),("516b",OtherLetter),("516d",OtherLetter),("5341",OtherLetter),("5343",OtherLetter),("5344",OtherLetter),("5345",OtherLetter),("534c",OtherLetter),("53c1",OtherLetter),("53c2",OtherLetter),("53c3",OtherLetter),("53c4",OtherLetter),("56db",OtherLetter),("58f1",OtherLetter),("58f9",OtherLetter),("5e7a",OtherLetter),("5efe",OtherLetter),("5eff",OtherLetter),("5f0c",OtherLetter),("5f0d",OtherLetter),("5f0e",OtherLetter),("5f10",OtherLetter),("62fe",OtherLetter),("634c",OtherLetter),("67d2",OtherLetter),("6f06",OtherLetter),("7396",OtherLetter),("767e",OtherLetter),("8086",OtherLetter),("842c",OtherLetter),("8cae",OtherLetter),("8cb3",OtherLetter),("8d30",OtherLetter),("9621",OtherLetter),("9646",OtherLetter),("964c",OtherLetter),("9678",OtherLetter),("96f6",OtherLetter),("f96b",OtherLetter),("f973",OtherLetter),("f978",OtherLetter),("f9b2",OtherLetter),("f9d1",OtherLetter),("f9d3",OtherLetter),("f9fd",OtherLetter),("20001",OtherLetter),("20064",OtherLetter),("200e2",OtherLetter),("20121",OtherLetter),("2092a",OtherLetter),("20983",OtherLetter),("2098c",OtherLetter),("2099c",OtherLetter),("20aea",OtherLetter),("20afd",OtherLetter),("20b19",OtherLetter),("22390",OtherLetter),("22998",OtherLetter),("23b1b",OtherLetter),("2626d",OtherLetter),("2f890",OtherLetter)]d","4f70","5104","5146","5169","516b","516d","5341","5343","5344","5345","534c","53c1","53c2","53c3","53c4","56db","58f1","58f9","5e7a","5efe","5eff","5f0c","5f0d","5f0e","5f10","62fe","634c","67d2","6f06","7396","767e","8086","842c","8cae","8cb3","8d30","9621","9646","964c","9678","96f6","f96b","f973","f978","f9b2","f9d1","f9d3","f9fd","20001","20064","200e2","20121","2092a","20983","2098c","2099c","20aea","20afd","20b19","22390","22998","23b1b","2626d","2f890"]

--
-- __Note:__ a character may have a numeric value but return 'False' with
-- the predicate 'isNumber', because 'isNumber' only tests 'GeneralCategory':
-- some CJK characters are 'OtherLetter' and do have a numeric value.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can write a note that integerValue is a special case (subset) of numericValue where the value is integral.

--
-- __Note:__ a character may have a numeric value (see 'numericValue') but return
-- 'False', because 'isNumber' only tests 'GeneralCategory':
-- some CJK characters are 'OtherLetter' and do have a numeric value.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can suggest something like - use isJust . numericValue if you wish to cover those cases as well.

@harendra-kumar
Copy link
Member

Did you measure the perf of isNumber vs hasNumericValue? Maybe we can use a benchmark for that. The difference is looking up generalCategory vs looking up numeric value, I am not sure which one is faster and if there is a significant difference.

@wismill
Copy link
Collaborator Author

wismill commented May 17, 2022

Did you measure the perf of isNumber vs hasNumericValue? Maybe we can use a benchmark for that. The difference is looking up generalCategory vs looking up numeric value, I am not sure which one is faster and if there is a significant difference.

Yes: hasNumericValue is little faster. See the benchmark.

@wismill
Copy link
Collaborator Author

wismill commented May 28, 2022

@harendra-kumar kind reminder for a review

lib/Unicode/Char/Numeric.hs Outdated Show resolved Hide resolved
@wismill wismill merged commit 543f629 into master May 31, 2022
@wismill wismill deleted the wip/numeric branch September 26, 2022 05:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support NumericValue property
2 participants