On Chinese numerals #338

brynne8 · 2019-12-22T10:35:45Z

As for single numbers, wenyan-lang will output as Chinese numbers. But when we output an array of numbers, it becomes arabic numerals. This is a bit inconsistent.
The myriad scale 10²⁸ (穰) is strangely written in Japanese Shinjitai (穣)
十二 which means 12, is ouput as 一十二, which is not the common form.
I have seen some ancient Chinese books, for example 《全晉文》, 《水滸傳》. It seems the 一百一 should be parsed as 101 instead of 110. But wenyan-lang seems to do the latter.

The text was updated successfully, but these errors were encountered:

brynne8 · 2019-12-22T12:34:44Z

Since it's an interesting task parsing Chinese numerals, I wrote a simple one in PEG using LPeg.re.

Link: chinese_number.lua

LingDong- · 2019-12-22T15:30:53Z

Thanks for pointing out the issues! The Chinese numerals have always been the hard part.

穣 is 穰's 異體字。But I agree it should be changed to the more common form 穰.
一十二 : Should be easy to fix.
一百一: 101 was the original behavior, but changed as requested by this issue: An Error In wenyan-lang #24 . 二百五=250 sounds more common though.
Number rendering: in fact, all the print statements translates to console.log, but on the online IDE, I hijacked/monkey-patched the console.log to print to a <div>, in which I added the feature of rendering numbers as hanzi. For arrays, technically I can traverse all the datastructre and recursively change everything to hanzi, but it creates some display issues when the output Array is very long - I'll correct for that in the next online IDE update.

Thank you!

antfu · 2019-12-22T16:37:47Z

I would propose a new approach.

How about we implemented a print function in the standard library and print numbers and others to hanzi. And by default, 書之 will call that function. This can outputs numbers to hanzi without hijack in the ide and will work everywhere. Besides, another syntax may be needed to be introduced as 記之 or something for the raw output of the target language( works as the current 書之).

I am not very good at wenyan so please feel free to make suggestions to the wording.

SaltfishAmi · 2019-12-22T16:51:24Z

* 一百一: 101 was the original behavior, but changed as requested by this issue: #24 . 二百五=250 sounds more common though.

Surely it sounds more common, but it's in spoken language. Actually too spoken.
Formally, 二百五 should be 205

oovm · 2019-12-22T17:10:23Z

我这有个算法不知道有没有漏洞:

从左往右读, 每一读一位乘十加上后一位, 但如果是倍数词那得乘上相应的倍数

然后读到 <EOS> 额外检测, 如果是不是十那么乘十.

因为只有 二百五, 没有 二百五万, 只能读成 二百五十万.

这个算法好处是同时支持 一零九九 和 一千零九十九 两种读法.

一个 python 的示例实现如下:

https://github.com/GalAster/WenyanLanguage/blob/master/packages/wenyan-parser-py/source/hanzi2num.py

LingDong- added the numeral Issues related to converting Chinese numerals label Dec 24, 2019

LingDong- mentioned this issue Dec 24, 2019

含 0 数字转换出错 #369

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

On Chinese numerals #338

On Chinese numerals #338

brynne8 commented Dec 22, 2019

brynne8 commented Dec 22, 2019

LingDong- commented Dec 22, 2019

antfu commented Dec 22, 2019

SaltfishAmi commented Dec 22, 2019

oovm commented Dec 22, 2019 •

edited

Loading

On Chinese numerals #338

On Chinese numerals #338

Comments

brynne8 commented Dec 22, 2019

brynne8 commented Dec 22, 2019

LingDong- commented Dec 22, 2019

antfu commented Dec 22, 2019

SaltfishAmi commented Dec 22, 2019

oovm commented Dec 22, 2019 • edited Loading

oovm commented Dec 22, 2019 •

edited

Loading