Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign updiffobj doesn't ignore encodings #144
Comments
|
Thanks for reporting. Comparisons are done the string pool memory addresses, so anything that gets a new address will be considered different. We could translate to a common encoding, although that has a cost. Since |
|
I think you should consider always re-encode to UTF-8 (or some other matching encoding) since it is very rare for R to distinguish between the same string in different encodings (ie see |
|
Certainly that identical does the re-encoding is a strong argument in favor of doing the same. I'll look into it next time I update the package; if you have a pressing need for this change let me know (I guess worst case you can re-encode yourself first in the meantime). |
|
I don't think we can avoid enc2utf8 b/c there is no cheap way to distinguish between "unknown" and ASCII encoding, and if we have strings with both "latin1" and ASCII in a non-latin1/UTF-8 locale we are must assume there could be some non-ASCII in the "unknown". However cost is minimal, seemingly:
|
Created on 2020-03-29 by the reprex package (v0.3.0)