Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On Chinese numerals #338

Open
brynne8 opened this issue Dec 22, 2019 · 5 comments
Open

On Chinese numerals #338

brynne8 opened this issue Dec 22, 2019 · 5 comments
Labels
numeral Issues related to converting Chinese numerals

Comments

@brynne8
Copy link

brynne8 commented Dec 22, 2019

  • As for single numbers, wenyan-lang will output as Chinese numbers. But when we output an array of numbers, it becomes arabic numerals. This is a bit inconsistent.
  • The myriad scale 1028 (穰) is strangely written in Japanese Shinjitai (穣)
  • 十二 which means 12, is ouput as 一十二, which is not the common form.
  • I have seen some ancient Chinese books, for example 《全晉文》, 《水滸傳》. It seems the 一百一 should be parsed as 101 instead of 110. But wenyan-lang seems to do the latter.
@brynne8
Copy link
Author

brynne8 commented Dec 22, 2019

Since it's an interesting task parsing Chinese numerals, I wrote a simple one in PEG using LPeg.re.

Link: chinese_number.lua

@LingDong-
Copy link
Member

Thanks for pointing out the issues! The Chinese numerals have always been the hard part.

  • 穣 is 穰's 異體字。But I agree it should be changed to the more common form 穰.
  • 一十二 : Should be easy to fix.
  • 一百一: 101 was the original behavior, but changed as requested by this issue: An Error In wenyan-lang #24 . 二百五=250 sounds more common though.
  • Number rendering: in fact, all the print statements translates to console.log, but on the online IDE, I hijacked/monkey-patched the console.log to print to a <div>, in which I added the feature of rendering numbers as hanzi. For arrays, technically I can traverse all the datastructre and recursively change everything to hanzi, but it creates some display issues when the output Array is very long - I'll correct for that in the next online IDE update.

Thank you!

@antfu
Copy link
Member

antfu commented Dec 22, 2019

I would propose a new approach.

How about we implemented a print function in the standard library and print numbers and others to hanzi. And by default, 書之 will call that function. This can outputs numbers to hanzi without hijack in the ide and will work everywhere. Besides, another syntax may be needed to be introduced as 記之 or something for the raw output of the target language( works as the current 書之).

I am not very good at wenyan so please feel free to make suggestions to the wording.

@SaltfishAmi
Copy link

* 一百一: 101 was the original behavior, but changed as requested by this issue: #24 . 二百五=250 sounds more common though.

Surely it sounds more common, but it's in spoken language. Actually too spoken.
Formally, 二百五 should be 205

@oovm
Copy link
Contributor

oovm commented Dec 22, 2019

我这有个算法不知道有没有漏洞:

从左往右读, 每一读一位乘十加上后一位, 但如果是倍数词那得乘上相应的倍数

然后读到 <EOS> 额外检测, 如果是不是十那么乘十.

因为只有 二百五, 没有 二百五万, 只能读成 二百五十万.

这个算法好处是同时支持 一零九九一千零九十九 两种读法.

一个 python 的示例实现如下:

https://github.com/GalAster/WenyanLanguage/blob/master/packages/wenyan-parser-py/source/hanzi2num.py

@LingDong- LingDong- added the numeral Issues related to converting Chinese numerals label Dec 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
numeral Issues related to converting Chinese numerals
Projects
None yet
Development

No branches or pull requests

5 participants