Rcpp::String ctor update(close #263) #529

Merged
merged 1 commit into from Aug 3, 2016

Projects

None yet

3 participants

@thirdwing
Member

As discussed in #263

(1) remove the string representation of encoding (I am not sure if any pkgs really use this, we might need a rev.dep);

(2) use CE_UTF8 as default encoding (not sure if this will cause problems with no-English string).

@kevinushey @hadley @jjallaire

@kevinushey kevinushey and 1 other commented on an outdated diff Aug 2, 2016
inst/include/Rcpp/String.h
@@ -53,7 +52,7 @@ namespace Rcpp {
typedef internal::const_string_proxy<STRSXP> const_StringProxy;
/** default constructor */
- String(): data(Rf_mkChar("")), buffer(), valid(true), buffer_ready(true), enc(CE_NATIVE) {
+ String(): data(Rf_mkChar("")), buffer(), valid(true), buffer_ready(true), enc(CE_UTF8) {
@kevinushey
kevinushey Aug 2, 2016 edited Contributor

Should the data element be initialized with Rf_mkCharCE()? Do we need to explicitly set the encoding? (probably doesn't matter for the empty string but we should be consistent)

@kevinushey kevinushey and 1 other commented on an outdated diff Aug 2, 2016
inst/include/Rcpp/String.h
@@ -64,10 +63,10 @@ namespace Rcpp {
RCPP_STRING_DEBUG("String(const String&)");
}
- String(const String& other, const std::string& enc) : data(other.get_sexp()), valid(true), buffer_ready(false) {
+ String(const String& other, cetype_t enc) : data(other.get_sexp()), valid(true), buffer_ready(false) {
@kevinushey
kevinushey Aug 2, 2016 edited Contributor

This method feels a little weird:

  1. It accepts a String object, which should have its own encoding; but
  2. It overrides that encoding with the encoding passed in using set_encoding.

I'm curious where / how this is used; it feels a little strange. (doesn't necessarily need to be addressed in this PR).

Or perhaps the intention is to use this with Strings that have been created with unknown encoding, but we know what the correct encoding is supposed to be?

@thirdwing
thirdwing Aug 2, 2016 Member

Actually this is mostly to make our ctor matrix complete. These two ctors can be removed. If we want to change the encoding of String s, we can just s.set_encoding(). set_encoding is public.

@thirdwing
thirdwing Aug 2, 2016 Member

So remove or keep them? @kevinushey

@kevinushey
kevinushey Aug 2, 2016 Contributor

I would be in favor of removing them -- I think it's better for users to be explicit with set_encoding if they wanted to force a different encoding.

@thirdwing
thirdwing Aug 2, 2016 Member

Removed.

@kevinushey kevinushey commented on an outdated diff Aug 2, 2016
inst/include/Rcpp/String.h
@@ -88,7 +87,7 @@ namespace Rcpp {
RCPP_STRING_DEBUG("String(SEXP)");
}
- String(SEXP charsxp, const std::string& enc) : data(R_NilValue) {
+ String(SEXP charsxp, cetype_t enc) : data(R_NilValue) {
@kevinushey
kevinushey Aug 2, 2016 Contributor

Same as above -- overriding encoding on a passed-in charsxp.

@thirdwing thirdwing changed the title from Rcpp::String ctor update to Rcpp::String ctor update(close #263) Aug 2, 2016
@kevinushey
Contributor

PR looks good for me; let's see if anyone else has thoughts.

@eddelbuettel
Member

Same here. Lemme sleep it over and give a glance over morning coffee in 6 to 7 hours and it'll go in.

@eddelbuettel eddelbuettel merged commit 0bf8e97 into RcppCore:master Aug 3, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@eddelbuettel eddelbuettel added a commit that referenced this pull request Aug 3, 2016
@eddelbuettel eddelbuettel Merging #529 (Close #263)
From branch 'string_ctor2' of https://github.com/thirdwing/Rcpp into feature/pr529
0460f08
@eddelbuettel eddelbuettel added a commit that referenced this pull request Aug 3, 2016
@eddelbuettel eddelbuettel Second step of merging #529 (close #263) c0ad307
@eddelbuettel
Member

Merged by hand

@thirdwing thirdwing deleted the thirdwing:string_ctor2 branch Aug 4, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment