Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RCPP_USING_UTF8_ERROR_STRING macro to use UTF-8 encoding exception string in R. #493

Merged
merged 2 commits into from
Jun 21, 2016

Conversation

qinwf
Copy link
Contributor

@qinwf qinwf commented Jun 17, 2016

Hi, I posted on Rcpp mailing list last week about Rcpp exception with UTF-8 strings on Windows.

This PR adds RCPP_USING_UTF8_ERROR_STRING to use UTF-8 encoding exception string in R.

Here is an example for this on Windows.

By default, not using RCPP_USING_UTF8_ERROR_STRING,

sourceCpp(code = "
#include <Rcpp.h>

// [[Rcpp::export]]
void error_with_unicode(std::string input){
    Rcpp::stop(input);
}
")
res = try(error_with_unicode(enc2utf8("测试")))
#> Error : 娴嬭瘯
Encoding(res)
#> [1] "unknown"

Using RCPP_USING_UTF8_ERROR_STRING,

sourceCpp(code = "
#define RCPP_USING_UTF8_ERROR_STRING
#include <Rcpp.h>

// [[Rcpp::export]]
void error_with_unicode(std::string input){
    Rcpp::stop(input);
}
")
res = try(error_with_unicode(enc2utf8("测试")))
#> Error : 测试
Encoding(res)
#> [1] "UTF-8"

@eddelbuettel
Copy link
Member

Thanks. The PR is nicely minimal, and even contains ChangeLog and NEWS. Tests would be nice too.

As the Contributing.md file suggests, we generally prefer discussion prior to submission but we'll try to review in due course.

@kevinushey
Copy link
Contributor

I feel like we shouldn't even use this macro define and always create our strings with UTF-8 encoding. (Ie, we shouldn't use mkChar(); rather, always mkCharLenCE() with UTF-8 encoding as in this PR.

@eddelbuettel
Copy link
Member

Not a bad idea. Any possible side effects? Sounds like I may need a rev.deps run for this.

@thirdwing
Copy link
Member

I agree with Kevin. I will try to replace mkChar this weekend.

BTW, it seems the error fixed by this PR can only be reproduced on Windows.

@eddelbuettel
Copy link
Member

Right on.

But ... didn't we once try this and failed with a different / older PR?

[ Goes off looking ... ]

What about #280 by @thirdwing ?

@thirdwing
Copy link
Member

In the old PR, I try to convert strings into UTF-8 when passing strings from R to C++.

Now, this error happens when we throw strings from C++ to R...

A good solution is needed for encoding problems...

@kevinushey
Copy link
Contributor

kevinushey commented Jun 17, 2016

FWIW, this PR is a bit more targeted in that we only change the representation of error strings returned to R, not all Rcpp String objects (which may or may not be UTF-8 encoded). Because of that, I think this is safer than #280.

(Ie, this PR doesn't change any of the Rcpp internals; just how R error objects are constructed / thrown)

@eddelbuettel
Copy link
Member

Yep, yep. And that we could indeed make unconditionally UTF-8.

@qinwf
Copy link
Contributor Author

qinwf commented Jun 21, 2016

We can remove the macro and make it unconditionally UTF-8, and this may change error messages of some existing packages.

Or leave the macro as now, and developers will only be able to change the result with the macro in the new code.

@eddelbuettel
Copy link
Member

That is a good point.

@eddelbuettel
Copy link
Member

I think I'll merge as-is. We can still discuss

  • actually setting the #define ourselves (either now or later) to make it a default
  • revert the logic, ie have UTF-8 the default with an opt-out for packages that cannot/will not change

but the current suggestion is good as it is pretty minimal.

@kevinushey
Copy link
Contributor

Sounds good to me.

@eddelbuettel eddelbuettel merged commit c6a600a into RcppCore:master Jun 21, 2016
eddelbuettel added a commit that referenced this pull request Jun 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants