Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] Bindings for sub/gsub #27390

Closed
asfimport opened this issue Feb 5, 2021 · 3 comments
Closed

[R] Bindings for sub/gsub #27390

asfimport opened this issue Feb 5, 2021 · 3 comments

Comments

@asfimport
Copy link
Collaborator

asfimport commented Feb 5, 2021

Reporter: Neal Richardson / @nealrichardson
Assignee: Ian Cook / @ianmcook

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-11513. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Neal Richardson / @nealrichardson:
Looking at the options struct and the re2 syntax, here are some notes for how to map to R concepts:

  • gsub/str_replace_all is -1 max_replacements (the default); sub/str_replace is 1 max_replacements
  • fixed = FALSE (default) means to use the "replace_substring_regex" function; fixed = TRUE means to use "replace_substring"
  • if ignore.case = TRUE and fixed = FALSE, can wrap pattern with a flag like paste0("(?i", pattern, ")") (or maybe it is actually paste0("(?i)", pattern), see stringi docs; unclear that we have a case-insensitive, non-regex option
  • useBytes: unclear that this is an option, or if it is relevant (per the docs for sub, "The main effect of ‘useBytes = TRUE’ is to avoid errors/warnings about invalid inputs and spurious matches in multibyte locales")
  • perl: unclear that this is an option, or if it is relevant
  • stringr handles options including case insensitivity differently, using a stringi options list, and we won't be able to support all of them. See stringr vignette

@asfimport
Copy link
Collaborator Author

Ian Cook / @ianmcook:

  • For ignore.case = TRUE && fixed == FALSE, the re2 syntax is paste0("(?i)", pattern)
  • We can support the ignore.case = TRUE && fixed == TRUE by using re2 with paste0("(?i)\Q", pattern, "\E") (except double the backslashes which Jira markup can't do)
  • I don't think it's worth handling useBytes or perl. I believe the only practical result of handling those arguments would be to detect conditions that R doesn't support when they are FALSE and throw various errors when they're set to FALSE. Since they both default to FALSE, I suspect this would be more annoying than valuable.

@asfimport
Copy link
Collaborator Author

Neal Richardson / @nealrichardson:
Issue resolved by pull request 9878
#9878

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants