Skip to content

Conversation

@nealrichardson
Copy link
Member

Sprinkles Rf_translateCharUTF8 a few places. I tried to add tests for all of the different scenarios I could think of where we could have non-UTF strings.

Also includes $ and [[ methods for Schema objects.

Copy link
Member Author

@nealrichardson nealrichardson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions of my own @romainfrancois

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Rf_mkCharCE(Rf_translateCharUTF8(s), CE_UTF8) is dropped in several places; should we factor this out to a macro or something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah maybe some sort of Rf_mkCharUtf8() or Rf_mkUtf8()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few places where I Rf_mkCharCE() and then immediately call CHAR(), which IIUC is boxing in a SEXP and then immediately unboxing it. Maybe that can be eliminated some places or done better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think R offers api for this

@github-actions
Copy link

names = function() Schema__field_names(self),
names = function() {
out <- Schema__field_names(self)
# Hack: Rcpp should set the encoding
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a more general problem, would affect the names() method of any objects where they return a std::vector<std::string>. Those must (as I understand it) always be UTF-8 in Arrow, but if you don't declare them as UTF-8 in R, then they get displayed all mangled on Windows (default/unknown encoding treated as latin1).

Rather than relying on the default Rcpp::wrap method for this, we should probably wrap ourselves. I could naively write this (create CharacterVector, iterate over the std::vector<std::string> and insert Rcpp::String with CE_UTF8) but maybe that's not great?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe cpp11 will rescue us from that sort of trouble.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nealrichardson
Copy link
Member Author

@romainfrancois this is ready for (and seriously needs) your review. Tests should be passing now.

Copy link
Contributor

@romainfrancois romainfrancois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from following up on your hint, it LGTM.

names = function() Schema__field_names(self),
names = function() {
out <- Schema__field_names(self)
# Hack: Rcpp should set the encoding
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe cpp11 will rescue us from that sort of trouble.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah maybe some sort of Rf_mkCharUtf8() or Rf_mkUtf8()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think R offers api for this

@nealrichardson nealrichardson deleted the r-utf8 branch June 26, 2020 15:19
@nealrichardson
Copy link
Member Author

Thanks, I think we should just revisit further work once cpp11 is available, see how much of that is handled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants