-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
GetStringUTFChars() does not actually return modified UTF-8 #283
Comments
It can be that the native debugger in Android Studio is showing you the wrong thing. I have written a small C++ function to test it:
And I am using it from Java:
I see the following in the log:
Putting a breakpoint in my C++ code, I get about the same picture: By the way, I am using NDK r10e. |
I don't think the native debugger would ever do something like that, no matter how buggy it could be: it's just a
|
(sorry, i thought i'd commented on this when it was first raised and i asked our Java runtime folks to improve the docs...) This is actually WAI for Android. Android has always done the standard UTF-8 thing for emoji. It was a long time ago, but iirc they tried both and "standard" broke fewer apps' assumptions than "modified". I ping that team again about getting the docs improved... |
i've raised internal bug http://b/35469153 to track improving the documentation. |
Could you please fix the link? 馃槙 |
it's internal to google. that link is for our benefit when we're trying to see whether the docs have actually been fixed next time we're pinged on this bug... :-) |
i don't think so, though it might differ between platform versions. i don't know what dalvik did, but art should aiui always use regular UTF-8. |
@enh : unfortunately "has always done the standard UTF-8 for emoji" sounds like a citation from G.Orwell. This commit suggests that until 2014 4-byte UTF-8 sequences could not work, and after this they were internally converted to surrogate pairs. Even today JNI checks whether |bytes| is valid modified UTF-8 but also accept 4 byte UTF sequences in place of encoded surrogate pairs. @kneth : which platform did you use for testing? |
I was running in an x86 emulator API 22. |
(doc issue isn't fixed, but that's tracked elsewhere and there's nothing to do from the NDK side) |
Description
I defined a String in Java like this:
And passed it to a native function... Inside it I called
env->GetStringUTFChars(myJstring)
and inspected the return value in Android Studio:According to the docs, the string should be in Modified UTF-8 format, so the cat character should take 6 bytes... but it's taking 4 instead, as in Standard UTF-8.
JNI docs on Modified UTF-8: https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/types.html#modified_utf_8_strings
JNI docs on GetStringUTFChars: https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/functions.html#GetStringUTFChars
Environment Details
Not all of these will be relevant to every bug, but please provide as much
information as you can.
The text was updated successfully, but these errors were encountered: