Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with Japanese abbreviations. #2

Open
uhho opened this issue Apr 7, 2015 · 18 comments
Open

Problem with Japanese abbreviations. #2

uhho opened this issue Apr 7, 2015 · 18 comments

Comments

@uhho
Copy link

uhho commented Apr 7, 2015

(moved from foretagsplatsen/Numeral-js/issues/9)

I've already reported this issue here adamwdraper/Numeral-js/issues/248, but as the original branch is not maintained anymore, I think it would be better to focus on your branch.

Now, Numeral.js is grouping numbers into 4 groups: thousands, milions, billions and trillions.

But, in case of Japanese (and I suppose it also applies to Chinese) numbers are grouped in slightly different way: every hundred, thousand and ten thousand.

100 -> 1百
1,000 -> 1千
10,000 -> 1万
100,000 -> 10万
1,000,000 -> 1百万
10,000,000 -> 1千万
100,000,000 -> 1億

Before I start fixing this issue, I'd like to discuss how to approach that problem.
Ideas:

  • add more abbreviations in library core
  • let languages define their own formatting/unformatting functions

Sample formatting function here: http://jsfiddle.net/tuknLbz8/1/

Anyway, I think we need more flexible architecture if we want to support even more complicated numeral systems like Indian numbering system.

@NadyaNayme
Copy link

Please correct me if I'm wrong, seeing as you're the one living in Japan and my Japanese is rusty from lack-of-use.

Shouldn't "1百" simply be "百"? In similar fashion, wouldn't 10万 be 十万? While I'm sure both are understandable I'm thinking "正しい文法"

Per the "languages" section of the website I think letting languages define their own formatting is the better idea.

@uhho
Copy link
Author

uhho commented Apr 14, 2015

@kyokou Thank you very much for comment!
Yes - both are correct and commonly used here, depending on the context.

To sum up, there are three systems:

  • Arabic numerals => 10,000 100,000
  • Arabic numerals + kanji (ideographs) => 1万 10万
  • Kanji only => 一万 十万

As you mentioned, one thousand is an exception - you can ommit number in front of kanji (千 = 1千). It's similar to "one hundred" and "a hundred" in English.

Particularly, second system is commonly used for counting big amount of money.
So "two hundred twenty million yen" becomes "2億2千万円".

But reading your comment and thinking about that more deeply, made me think that if a language has more than one numeral systems, user should be able to switch between them.

For example:

// Standard usage
numbro(100000).format('$0,0'); // ¥100,000
numbro(100000).format('0a$'); // 10万円 <- currency symbol has to be different!

// Selecting numeral system
numbro(100000).format('0a$', 1); // 十万円

In above case, second argument is an index of selected numeral system (defined in language configuration file), but not sure if it's good idea though.

Any ideas?

@NadyaNayme
Copy link

Thanks for the info! I've never actually seen the second system used. It's still logical and easy to follow though, so I'm not too surprised it exists.

As for the different formatting for individual languages - I'm not sure the best way to go about that. I think it is relatively uncommon for languages to have multiple ways of writing. For completeness it would be best to add them, but it should be planned and thought out in a maintainable fashion.

Your solution seems like it would work fine - so long as it is standardized.

For example:

[0] - Native (一、二、三、四)
[1] - Arabic Numerals (1, 2, 3, 4) 
[2] - Hybrid/Other, if applicable (1百、1千) 

@BenjaminVanRyseghem
Copy link
Owner

I am not a big fan of magic index referring to hidden things 😸
So if we go in this direction (I am still not convinced as I am still not sure I understand everything), maybe introducing special function per language would be better


I do not know Japanese at all, so I will try to summarize what I understand, please correct me if I am wrong.

  1. there is a native way to write currency in Japanese, where everything is in kanji.
    To me, it sounds like the english equivalent would be something like two k€ (I use euro as it's postfixed)

  2. there is a half-arabic/half-japanese way, which english equivalent would be: 2 k€

Am I right? 😄

@NadyaNayme
Copy link

Your example is correct. I'm assuming the "k" is shorthand for "thousand"?

Native: two thousand  // equal to 二千円
Arabic: 2,000  // equal to ¥2,000
Hybrid: 2 thousand  // equal to 2千円

"Native" probably isn't the best way to describe it, since all of the above are natively used... was just the most accurate word I could think of.

Magic index referring to hidden things could be standardized and documented to not make it so hidden or magical. My only issue with magic index referring to hidden things is when it isn't documented; but I can see why you would be against it.

We could make the index less magical by passing a string as the second param instead of an index value.

numeral(100000).format('$0,0', "arabic"); // ¥100,000
numeral(100000).format('0a$', "hybrid"); // 10万円 <- currency symbol has to be different!
numeral(100000).format('0a$', "native"); // 十万円

@BenjaminVanRyseghem
Copy link
Owner

Thanks for the clarifications 😄 (indeed, the k stood for kilo)

Could you please explain why the currency symbol has to be different?

If for all other languages we use something like 2 k€ why not using a similar approach here and use 2万円?

Is ¥2kcompletely weird?
(sorry for all the questions, but I try to have a clear understanding of the situation 😸 )

@NadyaNayme
Copy link

The rules are a little complex. Without trying to get too specific into nuance and semantics, this is the easiest way I can sum it up:

円 can be used anywhere, including price tags
¥
can only be used for prices/value
If the number uses kanji, 円 must be used over ¥.

¥100 // OK
100円 // Also OK
百円 // OK
¥百 // NOT OK (or at least, I've never ever ever seen this) 

When to use 円 depends on context. When writing vertically, kanji + 円 is preferred over the other forms. Same with formal/old documents/literature. There are more scenarios as well and not all of them are "hard rules, must do this or it's wrong".

It's kind of hard to explain since I'm not 100% familiar with all scenarios/contexts or without a mini-Japanese lesson in formality and counters.

@lukaszkrawczyk
If I'm wrong about ¥百, let me know. I'd love to learn more! I've not once seen this (although after you mentioned it, I have seen the "hybrid" style for populations of cities and a few other large numbers)

@BenjaminVanRyseghem
Copy link
Owner

Thanks for the explanations once again 😄

my proposal for currencies (which is in fact not the point of this issue 😸 maybe it should be moved in a separate issue):

  • we keep the currency symbol: $, €, or ¥. We use this symbol when there is no average, therefor the abbreviations or not used (and we are sure not to mix symbol with kanjis)
  • we introduce a translatedSymbol (or we can find a better name): 円. We use this symbol when average is used, so we are sure to have a consistent translated text. Of course, if this symbol is not defined, we fallback to currencySymbol

What do you guys think about that?

@uhho
Copy link
Author

uhho commented Apr 16, 2015

@kyokou Yes, everything is correct. I couldn't explain it better ;)

@BenjaminVanRyseghem , are you OK with @kyokou 's solution?
If yes, I will modify code, write some test, add examples and send a PR during next week.

On the other hand, I've been thinkning about publishing separate library to deal with this problem.
A kind of plug-in to numbro library, where user could define custom formatting function and so on.
I'll think about that little more.

Regarding issues with currency, let's move your proposal to different issue.
Because I've got several ideas how we could improve currency formatting as well. 😈

@BenjaminVanRyseghem
Copy link
Owner

@lukaszkrawczyk I am not sure we agreed on a solution yet 😄 and to be honest, I am still not super convinced about the introduction of a new argument.

But we can continue to discuss it 😸

edit: after reading again the thread, I am not sure we are talking about the same things 😉

@BenjaminVanRyseghem
Copy link
Owner

@lukaszkrawczyk I would rather push things directly into numbro instead of having another layer of external dependencies, don't you think?

@NadyaNayme
Copy link

@BenjaminVanRyseghem
I think he's talking about my "pass a string as second argument" instead of "have a magical index as second argument" solution.

EG:

numeral(100000).format('$0,0', "arabic"); // ¥100,000
numeral(100000).format('$0,0');           // also ¥100,000
numeral(100000).format('0a$', "hybrid");  // 10万円 <- currency symbol has to be different!
numeral(100000).format('0a$', "native");  // 十万円

Without solving the currency issue and leaving just the number:

numeral(100000).format('0,0', "arabic"); // 100,000
numeral(100000).format('0,0');           // also 100,000
numeral(100000).format('0a', "hybrid");  // 10万
numeral(100000).format('0a', "native");  // 十万

Because this may have to change on-the-fly and possibly per-call, I would prefer a second argument for Japanese over a setting in the configuration file.

@BenjaminVanRyseghem
Copy link
Owner

sounds like a good idea for all the languages with a different alphabet 😄

@lukaszkrawczyk if you want to give this a try, I will be very pleased to read your code 😉

@uhho
Copy link
Author

uhho commented Apr 17, 2015

@kyokou Correct. Sorry for being unclear.
@BenjaminVanRyseghem OK, let's do it!

@BenjaminVanRyseghem
Copy link
Owner

@lukaszkrawczyk any progress here? 😄

@ArmorDarks
Copy link

As another solution, GNU uses @ to denote variations in locales, like en-US@euro to display English (USA) with Euro as a currency, so maybe we can use same principle in formatCurrency?

numbro(100).formatCurrency('0,0.00 $')
numbro(100).formatCurrency('0,0.00 $@arabic')

@BenjaminVanRyseghem
Copy link
Owner

@ArmorDarks I would like to keep locales out of the format string

@ghost
Copy link

ghost commented Mar 30, 2019

@BenjaminVanRyseghem I would like to helping with this issue. But I think OP is not how a large numeric should be display.

Let's see what it should be look like:

Powers of 10 Number English Japanese
1 10 10 10
2 100 100 100
3 1,000 1k 1,000
4 10,000 10k 1万
5 100,000 100k 10万
6 1,000,000 1m 100万
7 10,000,000 10m 1,000万
8 100,000,000 100m 1億
9 1,000,000,000 1b 10億
10 100,000,000,000 100b 100億
12 1,000,000,000,000 1tr 1兆

This is more reasonable and I think is easier to implement. Since the only different of these two system is how we separate large number (that means, >= 1,000), the problem of "百" or "1百" doesn't exist. Japanese numeric system (which is the same as Chinese), are based on 10^4, while western system are based on 10^3.

I have found this web page might help: https://www.trussel.com/jnumbers.htm

When the OP mentions we should write "100" as "百", it just like saying we should write "1 hundred" in English, doesn't make sense. Because it is how we read the number, not how we write them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants