-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug in converting some hanzi to numbers #114
Comments
Also, there are problems in converting numbers to hanzi. |
I believe some of oddities are due to floating point precision problems. |
It seems not consistent:
|
@pinxue Thanks for spotting the bug. I'll fix it. Meanwhile if someone good with numbers can take a look at |
i once passed by an accurate (i came up with many test cases and it worked well) Chinese-to-number converter with very good handling of 零's, omitted last unit, mixed arabics and supports decimal point. also it supports to near 1e16 due to using double precision number, failing on 九千零七兆一千九百九十二亿五千四百七十四万零九百九十三. most of others only deals with 1e12 range and 0.01 precision. i wrote an arbitrary precision configurable (but only integer for now) number-to-Chinese converter. it may be reference and test case generator. i think it's proper ad here. the current number to hanzi converter seems lack 零 on: omitting the last unit is not compatible with omitting all 零s. 一百一 is the minimal example (101/110). since wenyan is written Chinese and omitting the last unit is colloquial, and omitting 零 is frequent in wenyan, i suggest that we disallow omitting the last unit. allowing 一萬萬, 一萬億, 一億億 etc. along with 兆京 etc. also enabled may be troublesome. consider 一萬億、一萬零一億 (the toshuo one seems to simply replace "萬億" to "兆", so it fails on this)、一萬零一萬、一億零一萬億 etc. refer to “最高用万” “最高用亿” in my converter. i still don't have a clear idea about allowing 十百千萬 for small number units, but i think it may be troublesome too.... btw according to wikipedia and many other sources, 分厘毫絲忽微纖沙塵埃渺漠 are all units by 10. why do you say
? |
@farteryhr Wow! Thanks for all the resources. I'll peak into them and learn from those implementation.
In fact I wrote the Thanks again. |
For fraction part, 納(10^-9)、皮(10^-12)、飛(10^-15) are more popular than other names. Here is a full list: https://baike.baidu.com/item/数字/6204#8 |
For floating point accuracy problems, I suggest that we only convert to/from its decimal string representation (in fixed or scientific notation), and let JavaScript/target language do the string to/from number conversion. Conversion between binary floating point and decimal representation is extremely tricky. It took several decades of research to get a fully correct algorithm (that is not too slow), and it would require tens of dev-months to implement that from scratch. (Thus I also suggest not to implement things like For Chinese-to-number conversion, there are lots of edge cases: 一千零一百 = Error |
It's the SI scheme where all levels differ by 10^3, the corresponding big units are 千 兆 吉 太 拍 etc. my converter also supports reading big integers in SI scheme in Chinese (which sounds weird). |
added test cases for 渺、埃、尘、沙、纤、微, failed for 纤 and 尘 - need code change for compiler |
Just wanted to add: only for 纤 and 尘 failed. If we support simplified Chinese, we may change the code for compiler (their traditional Chinese character versions work as expected) |
好家伙,在文言文里秀英语 |
一千一十萬埃 is converted to 1.0090000000000005e-13.
一千一萬埃 is converted to 9.990000000000006e-14.
負一萬埃 is converted to 1.0000000000000006e-16.
一萬埃 is converted to -1.0000000000000006e-16.
And currently there is even
assert.equal(hanzi2num("三千萬埃"), 2.9990000000000027e-13);
in src/test/test.js.None of them seem to be correct.
The text was updated successfully, but these errors were encountered: