Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] String to_title, to_lower, to_upper kernels #29474

Closed
asfimport opened this issue Sep 1, 2021 · 4 comments
Closed

[R] String to_title, to_lower, to_upper kernels #29474

asfimport opened this issue Sep 1, 2021 · 4 comments

Comments

@asfimport
Copy link

asfimport commented Sep 1, 2021

ARROW-12714 added the str_to_title kernel and a basic mapping, but we should add a test. Also the stringr function takes a "locale" argument which is not handled here; we should either pass it to Arrow C++ if it supports it (which I doubt) or error if a value is provided in R.

This also applies to str_to_lower and str_to_upper kernels.

Reporter: Neal Richardson / @nealrichardson
Assignee: Eduardo Ponce / @edponce

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-13853. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Eduardo Ponce / @edponce:
Arrow string compute functions do not support a locale setting and use the default POSIX locale which is "C".
Arrow provides UTF-8 variants of the string functions for localization standardization.
Currently, only strftime uses a locale option for formatting the resulting string.

Localization would also need to be considered for kernels that use/change text casing, compare strings, and format numbers.

Created ARROW-14126 to further investigate localization support for string functions.

@asfimport
Copy link
Author

Eduardo Ponce / @edponce:
If (for now) we are going to not accept locale option but still want to match stringr API, I can think of any of these approaches:

  • str_to_lower(x, locale = "en") -> only accept "en" locale and error if other

  • str_to_lower(x, locale = NULL) -> only accept NULL locale (ie., not supported) and error if another value is given

  • str_to_lower(x, locale) -> use missing() function to R to detect locale arg, if it is provided then error out

  • str_to_lower(x ) -> non-matching API

    cc @nealrichardson Any suggestions?

@asfimport
Copy link
Author

Neal Richardson / @nealrichardson:
I made a suggestion on the PR

@asfimport
Copy link
Author

Neal Richardson / @nealrichardson:
Issue resolved by pull request 11232
#11232

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant