Using UTF-8 in String ctor #280

thirdwing · 2015-03-19T22:54:26Z

This is for #189 . String ctor now uses UTF-8.

There seems much more work if we want to support different encoding in String.

Feel free to close this PR.

eddelbuettel · 2015-03-21T18:35:51Z

Looks good to me though utf-8 support probably needs a lot more work in more places.

Any thoughts on how to get from here to there?

Any seconds on whether to fold this in, @kevinushey or @jjallaire ?

kevinushey · 2015-03-21T21:22:08Z

I think this is a step in the right direction, although I think we need to find a compromise between:

String objects that just store some bytes, alongside whatever declared encoding they came with, and
String objects that are always internally stored as UTF-8, but are converted back to the appropriate locale on demand.

I think performance-intensive applications will want to avoid round-trip translations between various encodings, so I am somewhat hesitant to accept this PR right away. Thoughts, @jjallaire?

jjallaire · 2015-03-22T10:29:28Z

The other problem with this is that on Windows the OS interfaces, R itself,
and many other libraries assume that single-character (char) strings use
the system encoding. If we start auto-converting to UTF-8 we will surely
break things that currently work.

So any conversion to UTF8 will need to explicit (e.g. an encoding parameter
on the constructor, a static construction function, etc.)

On Sat, Mar 21, 2015 at 5:22 PM, Kevin Ushey notifications@github.com
wrote:

I think this is a step in the right direction, although I think we need to
find a compromise between:

String objects that just store some bytes, alongside whatever
declared encoding they came with, and

String objects that are always internally stored as UTF-8, but are
converted back to the appropriate locale on demand.

I think performance-intensive applications will want to avoid round-trip
translations between various encodings, so I am somewhat hesitant to accept
this PR right away. Thoughts, @jjallaire https://github.com/jjallaire?

—
Reply to this email directly or view it on GitHub
#280 (comment).

thirdwing · 2015-03-22T14:24:29Z

Thanks @jjallaire and @kevinushey .There are many things I don't know before.

Qiang Kou added 2 commits March 19, 2015 12:16

String with UTF-8

5fd31ce

string utf-8

c7b58b2

thirdwing closed this Mar 22, 2015

eddelbuettel mentioned this pull request Jun 17, 2016

Add RCPP_USING_UTF8_ERROR_STRING macro to use UTF-8 encoding exception string in R. #493

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Using UTF-8 in String ctor #280

Using UTF-8 in String ctor #280

Uh oh!

thirdwing commented Mar 19, 2015

Uh oh!

eddelbuettel commented Mar 21, 2015

Uh oh!

kevinushey commented Mar 21, 2015

Uh oh!

jjallaire commented Mar 22, 2015

Uh oh!

thirdwing commented Mar 22, 2015

Uh oh!

Uh oh!

Uh oh!

Using UTF-8 in String ctor #280

Using UTF-8 in String ctor #280

Uh oh!

Conversation

thirdwing commented Mar 19, 2015

Uh oh!

eddelbuettel commented Mar 21, 2015

Uh oh!

kevinushey commented Mar 21, 2015

Uh oh!

jjallaire commented Mar 22, 2015

Uh oh!

thirdwing commented Mar 22, 2015

Uh oh!

Uh oh!