Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace non-ASCII characters with unicode escapes in clessnverse::translate_text() #102

Closed
Tracked by #66
judith-bourque opened this issue Apr 7, 2023 · 2 comments · Fixed by #110
Closed
Tracked by #66
Assignees
Labels
warning Warning message

Comments

@judith-bourque
Copy link
Member

judith-bourque commented Apr 7, 2023

Issue

> tools::showNonASCIIfile("R/dev_t.R")
988:     text <- stringr::str_replace_all(text, "\\<c2><ab>", "")
989:     text <- stringr::str_replace_all(text, "\\<c2><bb>", "")
990:     text <- stringr::str_replace_all(text, "\\<c2><ab>", "")
991:     text <- stringr::str_replace_all(text, "\\<c2><bb>", "")
992:     text <- stringr::str_replace_all(text, "\\<e2><80><99>", "\\\\'")

Source code

 # There atr characters that need to be escaped (or even removed) in order for the translator to
    # be able to take them
    text <- stringr::str_replace_all(text, "\\'", "\\\\'")
    text <- stringr::str_replace_all(text, "\\«", "")
    text <- stringr::str_replace_all(text, "\\»", "")
    text <- stringr::str_replace_all(text, "\\«", "")
    text <- stringr::str_replace_all(text, "\\»", "")
    text <- stringr::str_replace_all(text, "\\’", "\\\\'")
@judith-bourque judith-bourque added the warning Warning message label Apr 7, 2023
@judith-bourque judith-bourque changed the title Replace non-ASCII characters with ASCII in clessnverse::translate_text() Replace non-ASCII characters with unicode escapes in clessnverse::translate_text() Apr 8, 2023
@judith-bourque
Copy link
Member Author

990 and 991 seem to be a duplicate of 988 and 989. Remove?

@judith-bourque
Copy link
Member Author

judith-bourque commented Apr 8, 2023

To get the unicode escape equivalent: stringi::stri_escape_unicode("«")

Reprex to test the code:

# Initial code
remove_char_for_translator <- function(text) {
  text <- stringr::str_replace_all(text, "\\'", "\\\\'")
  text <- stringr::str_replace_all(text, "\\«", "")
  text <- stringr::str_replace_all(text, "\\»", "")
  text <- stringr::str_replace_all(text, "\\«", "")
  text <- stringr::str_replace_all(text, "\\»", "")
  text <- stringr::str_replace_all(text, "\\’", "\\\\'")
  
  return(text)
}

# Unicode escapes
remove_char_for_translator <- function(text) {
  text <- stringr::str_replace_all(text, "\\'", "\\\\'")
  text <- stringr::str_replace_all(text, "\\\u00ab", "")
  text <- stringr::str_replace_all(text, "\\\u00bb", "")
  text <- stringr::str_replace_all(text, "\\\u00ab", "")
  text <- stringr::str_replace_all(text, "\\\u00bb", "")
  text <- stringr::str_replace_all(text, "\\\u2019", "\\\\'")
  
  return(text)
}

test_that("removing characters for translator works", {
  expect_equal(remove_char_for_translator("\\' Hello World.\\'"), "\\\\' Hello World.\\\\'")
  expect_equal(remove_char_for_translator("\\« Hello World. \\»"), "\\ Hello World. \\")
  expect_equal(remove_char_for_translator("\\’ Hello World.\\’"), "\\\\' Hello World.\\\\'")
  }
)

This could be made into a function.

Tests pass with unicode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
warning Warning message
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant