Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add European Number Styling Theme #557

Closed
ddsjoberg opened this issue Jun 27, 2020 · 13 comments
Closed

Add European Number Styling Theme #557

ddsjoberg opened this issue Jun 27, 2020 · 13 comments

Comments

@ddsjoberg
Copy link
Owner

  • We can use the theme element tbl_summary-fn:N_fun to style the numbers in gtsummary to European style, e.g. 1 000,84.
  • Need to incorporate the theme element to the integers in the table header of tbl_summary()
  • Should this update be incorporated into the language theme? For example, the French use the number styling, but do the French-Canadian (or the many other countries that speak French)?
  • Perhaps, we could add an argument to the language themes that can optionally turn on the Euro formatting?
@ddsjoberg
Copy link
Owner Author

@larmarange when would you like a space vs a period to separate/style large numbers

vv <- c(0.01, 1e5, 107453.354)

# using period to separate big numbers
format(vv, big.mark = ".", decimal.mark = ",", scientific = FALSE)
#> [1] "      0,01" "100.000,00" "107.453,35"

# using space to separate big numbers
format(vv, big.mark = " ", decimal.mark = ",", scientific = FALSE)
#> [1] "      0,01" "100 000,00" "107 453,35"

Created on 2020-06-29 by the reprex package (v0.3.0)

@larmarange
Copy link
Collaborator

larmarange commented Jun 29, 2020

Just a quick link about a similar discussion for scales::number() r-lib/scales#142 (comment)

I'm personally not in favour of using a comma as the default thousands separator, for the following reasons:

  • using comma as the thousands separator is the purpose of comma_format specific formatter;
  • in an international perspective, it would be more relevant to use a space as a thousands separator, as used for example by Lancet journal (a space for thousands separator and a dot for decimal separator being a good compromise for a good understanding for an international audience);
  • since 2003, the use of spaces as separators (for example: 20 000 and 1 000 000 for "twenty thousand" and "one million") has been officially endorsed by SI/ISO 31-0 standard, as well as by the International Bureau of Weights and Measures and the International Union of Pure and Applied Chemistry (IUPAC), the American Medical Association's widely followed AMA Manual of Style, and the Metrication Board, among others. (cf. https://www.wikiwand.com/en/Decimal_separator#/Digit_grouping)

Therefore, I would say that any default behaviour should be a space for separating thousands and a dot for decimals (i.e. default corresponds to international standard).
The use of a comma as a thousands separator should be in the case that you apply a specific English theme.

@larmarange
Copy link
Collaborator

larmarange commented Jun 29, 2020

  • for SI: space for thousand, dot for decimals
  • for English: comma for thousands and dot for decimals
  • for French: space for thousands and comma for decimals
  • for Spanish: dot for thousands and comma for decimals

According to Wikipedia: https://www.wikiwand.com/en/Decimal_separator#Examples_of_use

Style Countries and Regions
1,234,567.89 Canada (English-speaking; unofficial), China, Hong Kong, India, Iran, Ireland, Israel, Japan, Korea, Malaysia, Malta, Mexico, New Zealand, Pakistan, Philippines, Singapore, South Africa, Taiwan, Thailand, United Kingdom and other Commonwealth states, United States.
1 234 567.89 SI style (English version), Australia, Canada (English-speaking), China, South Africa, Sri Lanka, Switzerland (officially encouraged for currency numbers only[43]), United Kingdom.
1 234 567,89 SI style (French version), Albania, Belgium (French), Bulgaria, Canada (French-speaking), Croatia[44] the Czech Republic, Estonia, Finland, France, Hungary, Kosovo, Latin Europe, Latvia, Lithuania, Norway, Peru, Poland, Portugal, Russia, Slovakia, South Africa, Sweden, Switzerland (officially encouraged, except currency numbers[43]), Ukraine, Vietnam (in education).
1,234,567·89 Ireland, Malaysia, Philippines, Singapore, Taiwan, United Kingdom (older, typically hand written; in education)
1,234.567,89 Croatia (alternative to spaces; commas and periods alternate with powers of 1,000)[44]
1.234.567,89 Argentina, Austria, Belgium (Dutch), Bosnia and Herzegovina, Brazil, Chile, Colombia, Costa Rica, Croatia (informal), Denmark, Germany, Greece, Indonesia, Italy, Netherlands, Romania, Slovenia, Serbia, Spain,[45] Turkey, Vietnam.
12,34,567.89 Bangladesh, India, Nepal, Pakistan (see Indian Numbering System).
1'234'567.89 Switzerland (computing), Liechtenstein.
1'234'567,89 Switzerland (handwriting).
1.234.567'89 Spain (handwriting, used until 1980's, inadvisable use according to the RAE).
123,4567.89 China (based on powers of 10 000—see Chinese numerals, also 1234567.89[citation needed]).

@larmarange
Copy link
Collaborator

As you can see, there is no "Europen style" as it is language dependant.

This is why I would say that the default is the SI system, and that it will be changed by language theme.

It seems that all French speaking counries are using the same "1 234 567,89" but some English speaking countries are not using the US system (e.g. official Canadian approach is the SI one, and Ireland is using a middle-dot and not a dot (like Lancet journal). In that case, if requested by users, you could consider the possibility of theme_gtsummary_language("en-ie").

@larmarange
Copy link
Collaborator

If you are using format, note that you need to round the results and to specficy nsmall

myformat <- function(x, digits) {
  format(round(x, digits), big.mark = " ", decimal.mark = ",", scientific = FALSE, nsmall = digits)
}

An alternative is to use scales::number who was designed to manage most of cases. But it will require an additional import.

@oranwutang
Copy link
Contributor

Hi!

I agree with @larmarange, in the specific case for spanish language the correct format is: 1.234.567,89 (not 1.234.567'89, as is also stated in the previous table)

@ddsjoberg
Copy link
Owner Author

I'll see if I can make some progress/reading for this update this week. It seems there are many official styles and unofficial styles, and they can be different for countries speaking the same language. Perhaps as a starting point, I can create a theme where users can select the number formatting throughout the package. The language theme and the decimal theme can be set separately.

@larmarange
Copy link
Collaborator

larmarange commented Jul 5, 2020 via email

@ddsjoberg ddsjoberg mentioned this issue Jul 8, 2020
9 tasks
@ddsjoberg
Copy link
Owner Author

ddsjoberg commented Jul 8, 2020

FYI @larmarange @oranwutang

I added control in the entire package for the formatting of numbers printed, e.g. 1,022.234 vs 1 022,234 vs 1.022,234. I also added arguments to theme_gtsummary_language() to control formatting when the language is set, e.g. theme_gtsummary_language("fr", big.mark = " ", decimal.mark = ","). I haven't made any default behavior based on the language selected yet. I want to let the ideas peculate a bit longer before implementing something.

Pull Request: #566

@oranwutang
Copy link
Contributor

I think that the best default option is the "english format", and, as you stated, offer the option to the user for changing formatting as desired

@larmarange
Copy link
Collaborator

I would say that the best default format is the "international" or SI format, i.e. decimal.mark = ".", big.mark = " ".

@larmarange
Copy link
Collaborator

Thanks a lot for having added the option to customize directly in theme_gtsummary_langue(). I think it is OK for users to explicitly define that they want.

Regards

@ddsjoberg
Copy link
Owner Author

Hey @larmarange and @oranwutang ! Thank you for your valuable input here. I am going to go ahead and close this issue.

  1. The default formatting for numbers in the package are the US convention, e.g. 1,234.45

  2. big.mark= and decimal.mark= are now settable in the language theme function. Even if you want to produce English tables with different styling, you can use theme_gtsummary_language("en", big.mark= " ", decimal.mark= ",")

  3. Apart from language styling there is a base R default that is now honored. There is a base R option called "OutDec" used in the format() and as.character() functions (and others I believe). This is used as the default for the decimal mark. If a user has specified that the decimal mark is comma using this option, gtsummary will honor this setting AND set the big.mark to be a space for within gtsummary (example below).

    image

Again many thanks for all your thoughtful comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants