New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
iOS app: can't use the Emoji keyboard #733
Comments
Looking into this now. I think also non-BMP characters and even flags (which actually are two code points) used to work at some time, but this has regressed. |
Would be interesting to know what ends up in the hidden text area. |
@mmeeks I am on it already. The handling of non-BMP characters is fairly broken in the code. Confusion between code points and UTF-16 code units etc. Am fixing. |
The problem is that getValueAsCodePoints(), as its name says, returns an array of integers that are Unicode code points (thus can be larger than 65536, especially for emojis and interesting scripts). Not UTF-16 code units (which are always less than 65536). Still the code uses the JS function fromCharCode() on these numbers. That function expects UTF-16 code units, and it apparently simply truncates the argument to 16 bits. |
Another bug is that the code tries to convert from a surrogate pair to code point using the expression |
Our existing function getValueAsCodePoints() returns an array of integers that are Unicode code points, as its name says. (Code points can be larger than 65535, especially for emojis and interesting scripts). It does not return an array of UTF-16 code units (which are always less than 65536). Still, the code used the JavaScript function String.fromCharCode() on the elements of the returned array. That function expects UTF-16 code units, and it simply truncates the argument to 16 bits. This obviously leads to very wrong results. There is a function String.fromCodePoint() that would work, but it isn't present in MSIE so we can't use it. Instead, introduce a new function codePointsToString() that works properly, producing surrogate pairs as necessary. Additionally, the code in getValueAsCodePoints() that combines a surrogate pair to a code point used 0x100000 instead of 0x10000. Had this code been tested at all, one wonders. Also, add some more debug output to the affected functions, bypassed with if (false). Fixes #733 Change-Id: Id50d05ac95285edc93f1e3f5a2538a0732186476
Our existing function getValueAsCodePoints() returns an array of integers that are Unicode code points, as its name says. (Code points can be larger than 65535, especially for emojis and interesting scripts). It does not return an array of UTF-16 code units (which are always less than 65536). Still, the code used the JavaScript function String.fromCharCode() on the elements of the returned array. That function expects UTF-16 code units, and it simply truncates the argument to 16 bits. This obviously leads to very wrong results. There is a function String.fromCodePoint() that would work, but it isn't present in MSIE so we can't use it. Instead, introduce a new function codePointsToString() that works properly, producing surrogate pairs as necessary. Additionally, the code in getValueAsCodePoints() that combines a surrogate pair to a code point used 0x100000 instead of 0x10000. Had this code been tested at all, one wonders. Also, add some more debug output to the affected functions, bypassed with if (false). Fixes #733 Change-Id: Id50d05ac95285edc93f1e3f5a2538a0732186476 Signed-off-by: Tor Lillqvist <tml@collabora.com>
Our existing function getValueAsCodePoints() returns an array of integers that are Unicode code points, as its name says. (Code points can be larger than 65535, especially for emojis and interesting scripts). It does not return an array of UTF-16 code units (which are always less than 65536). Still, the code used the JavaScript function String.fromCharCode() on the elements of the returned array. That function expects UTF-16 code units, and it simply truncates the argument to 16 bits. This obviously leads to very wrong results. There is a function String.fromCodePoint() that would work, but it isn't present in MSIE so we can't use it. Instead, introduce a new function codePointsToString() that works properly, producing surrogate pairs as necessary. Additionally, the code in getValueAsCodePoints() that combines a surrogate pair to a code point used 0x100000 instead of 0x10000. Had this code been tested at all, one wonders. Also, add some more debug output to the affected functions, bypassed with if (false). Fixes #733 Change-Id: Id50d05ac95285edc93f1e3f5a2538a0732186476 Signed-off-by: Tor Lillqvist <tml@collabora.com>
Thanks a lot for the improvements @tml1024! 6.4.1 (3) is on par with the functionality we had in older versions! One remaining issue I see is, that some emoticons are split up into their individual parts so the "male painter" (which is one emotion) becomes two emoticons after inserting: guy + color palette. |
For the future me: |
Looking into the "male artist" issue now. I fear that is something that will need to be implemented in LibreOffice core, though, and probably should be seen as a missing feature, not a bug. |
But sure, to the end-user it looks like a bug, as that clever emoji that is a combination of multiple code points with a zero-width-joiner looks like any other emoji when they are choosing one. The end-user can not know which emojis are implemented in which way. it is very surprising to the end-user when instead of a painter they get a head and a palette... |
And it can be even more complicated, one can add in a skin tone, for instance: https://emojipedia.org/man-artist-medium-light-skin-tone/ . What looks like a single emoji (glyph) 👨🏼🎨 consists of four code points: 👨 Man, 🏼 Medium-Light Skin Tone, Zero Width Joiner and 🎨 Artist Palette. See https://emojipedia.org/man-artist-medium-light-skin-tone/ |
For web-based Collabora Online, where LibreOffice core runs on Linux, the required but possibly missing functionality would probably be in some external library, perhaps cairo. (Not entirely clear to me.) On iOS (and macOS) we don't use cairo, but system APIs, so there actually it might be easier to make it "just work". But as it doesn't "just work" on iOS currently, it is possible that some changes are required in LibreOffice core, too. And maybe for instance in the ICU external library. |
Filed upstream bug https://bugs.documentfoundation.org/show_bug.cgi?id=138481 |
Interesting: In LibreOffice, on macOS, if I use the Apple Color Emoji font, it works! So possibly it will work in the iOS app, too, if I just in some way make sure that font gets used. (The user is of course not expected to explicitly select an emoji font for emojis; it should happen automagically.) |
Suggested fix in https://gerrit.libreoffice.org/c/core/+/106632 . Let's see what the reviewers say, if anything. |
(See above-mentioned Gerrit change for more detailed technical discussion. This is a very complex issue, and digging into it reveals some very dirty bits in LibreOffice...) |
Our existing function getValueAsCodePoints() returns an array of integers that are Unicode code points, as its name says. (Code points can be larger than 65535, especially for emojis and interesting scripts). It does not return an array of UTF-16 code units (which are always less than 65536). Still, the code used the JavaScript function String.fromCharCode() on the elements of the returned array. That function expects UTF-16 code units, and it simply truncates the argument to 16 bits. This obviously leads to very wrong results. There is a function String.fromCodePoint() that would work, but it isn't present in MSIE so we can't use it. Instead, introduce a new function codePointsToString() that works properly, producing surrogate pairs as necessary. Additionally, the code in getValueAsCodePoints() that combines a surrogate pair to a code point used 0x100000 instead of 0x10000. Had this code been tested at all, one wonders. Also, add some more debug output to the affected functions, bypassed with if (false). Fixes #733 Change-Id: Id50d05ac95285edc93f1e3f5a2538a0732186476 Signed-off-by: Tor Lillqvist <tml@collabora.com>
Our existing function getValueAsCodePoints() returns an array of integers that are Unicode code points, as its name says. (Code points can be larger than 65535, especially for emojis and interesting scripts). It does not return an array of UTF-16 code units (which are always less than 65536). Still, the code used the JavaScript function String.fromCharCode() on the elements of the returned array. That function expects UTF-16 code units, and it simply truncates the argument to 16 bits. This obviously leads to very wrong results. There is a function String.fromCodePoint() that would work, but it isn't present in MSIE so we can't use it. Instead, introduce a new function codePointsToString() that works properly, producing surrogate pairs as necessary. Additionally, the code in getValueAsCodePoints() that combines a surrogate pair to a code point used 0x100000 instead of 0x10000. Had this code been tested at all, one wonders. Also, add some more debug output to the affected functions, bypassed with if (false). Fixes #733 Change-Id: Id50d05ac95285edc93f1e3f5a2538a0732186476 Signed-off-by: Tor Lillqvist <tml@collabora.com>
Our existing function getValueAsCodePoints() returns an array of integers that are Unicode code points, as its name says. (Code points can be larger than 65535, especially for emojis and interesting scripts). It does not return an array of UTF-16 code units (which are always less than 65536). Still, the code used the JavaScript function String.fromCharCode() on the elements of the returned array. That function expects UTF-16 code units, and it simply truncates the argument to 16 bits. This obviously leads to very wrong results. There is a function String.fromCodePoint() that would work, but it isn't present in MSIE so we can't use it. Instead, introduce a new function codePointsToString() that works properly, producing surrogate pairs as necessary. Additionally, the code in getValueAsCodePoints() that combines a surrogate pair to a code point used 0x100000 instead of 0x10000. Had this code been tested at all, one wonders. Also, add some more debug output to the affected functions, bypassed with if (false). Fixes #733 Change-Id: Id50d05ac95285edc93f1e3f5a2538a0732186476 Signed-off-by: Tor Lillqvist <tml@collabora.com>
Our existing function getValueAsCodePoints() returns an array of integers that are Unicode code points, as its name says. (Code points can be larger than 65535, especially for emojis and interesting scripts). It does not return an array of UTF-16 code units (which are always less than 65536). Still, the code used the JavaScript function String.fromCharCode() on the elements of the returned array. That function expects UTF-16 code units, and it simply truncates the argument to 16 bits. This obviously leads to very wrong results. There is a function String.fromCodePoint() that would work, but it isn't present in MSIE so we can't use it. Instead, introduce a new function codePointsToString() that works properly, producing surrogate pairs as necessary. Additionally, the code in getValueAsCodePoints() that combines a surrogate pair to a code point used 0x100000 instead of 0x10000. Also, add some more debug output to the affected functions, bypassed with if (false). Fixes #733 Change-Id: Id50d05ac95285edc93f1e3f5a2538a0732186476 Signed-off-by: Tor Lillqvist <tml@collabora.com>
Our existing function getValueAsCodePoints() returns an array of integers that are Unicode code points, as its name says. (Code points can be larger than 65535, especially for emojis and interesting scripts). It does not return an array of UTF-16 code units (which are always less than 65536). Still, the code used the JavaScript function String.fromCharCode() on the elements of the returned array. That function expects UTF-16 code units, and it simply truncates the argument to 16 bits. This obviously leads to very wrong results. There is a function String.fromCodePoint() that would work, but it isn't present in MSIE so we can't use it. Instead, introduce a new function codePointsToString() that works properly, producing surrogate pairs as necessary. Additionally, the code in getValueAsCodePoints() that combines a surrogate pair to a code point used 0x100000 instead of 0x10000. Also, add some more debug output to the affected functions, bypassed with if (false). Fixes #733 Change-Id: Id50d05ac95285edc93f1e3f5a2538a0732186476 Signed-off-by: Tor Lillqvist <tml@collabora.com>
Our existing function getValueAsCodePoints() returns an array of integers that are Unicode code points, as its name says. (Code points can be larger than 65535, especially for emojis and interesting scripts). It does not return an array of UTF-16 code units (which are always less than 65536). Still, the code used the JavaScript function String.fromCharCode() on the elements of the returned array. That function expects UTF-16 code units, and it simply truncates the argument to 16 bits. This obviously leads to very wrong results. There is a function String.fromCodePoint() that would work, but it isn't present in MSIE so we can't use it. Instead, introduce a new function codePointsToString() that works properly, producing surrogate pairs as necessary. Additionally, the code in getValueAsCodePoints() that combines a surrogate pair to a code point used 0x100000 instead of 0x10000. Also, add some more debug output to the affected functions, bypassed with if (false). Fixes #733 Change-Id: Id50d05ac95285edc93f1e3f5a2538a0732186476 Signed-off-by: Tor Lillqvist <tml@collabora.com>
Our existing function getValueAsCodePoints() returns an array of integers that are Unicode code points, as its name says. (Code points can be larger than 65535, especially for emojis and interesting scripts). It does not return an array of UTF-16 code units (which are always less than 65536). Still, the code used the JavaScript function String.fromCharCode() on the elements of the returned array. That function expects UTF-16 code units, and it simply truncates the argument to 16 bits. This obviously leads to very wrong results. There is a function String.fromCodePoint() that would work, but it isn't present in MSIE so we can't use it. Instead, introduce a new function codePointsToString() that works properly, producing surrogate pairs as necessary. Additionally, the code in getValueAsCodePoints() that combines a surrogate pair to a code point used 0x100000 instead of 0x10000. Had this code been tested at all, one wonders. Also, add some more debug output to the affected functions, bypassed with if (false). Fixes #733 Change-Id: Id50d05ac95285edc93f1e3f5a2538a0732186476 Signed-off-by: Tor Lillqvist <tml@collabora.com>
Our existing function getValueAsCodePoints() returns an array of integers that are Unicode code points, as its name says. (Code points can be larger than 65535, especially for emojis and interesting scripts). It does not return an array of UTF-16 code units (which are always less than 65536). Still, the code used the JavaScript function String.fromCharCode() on the elements of the returned array. That function expects UTF-16 code units, and it simply truncates the argument to 16 bits. This obviously leads to very wrong results. There is a function String.fromCodePoint() that would work, but it isn't present in MSIE so we can't use it. Instead, introduce a new function codePointsToString() that works properly, producing surrogate pairs as necessary. Additionally, the code in getValueAsCodePoints() that combines a surrogate pair to a code point used 0x100000 instead of 0x10000. Had this code been tested at all, one wonders. Also, add some more debug output to the affected functions, bypassed with if (false). Fixes CollaboraOnline#733 Change-Id: Id50d05ac95285edc93f1e3f5a2538a0732186476 Signed-off-by: Tor Lillqvist <tml@collabora.com>
Describe the bug
On the Apple smart keyboard (attached hardware keyboard) there is a "globe" key in the lower left corner. Using this key, the keyboard selection on the screen pops up. When choosing the "Emoji" keyboard, a dialog pops up on the screen where the user can choose an emoticon which will then be inserted into the text.
So far this works; however after choosing an emoticon a wrong text input happens - not the emoticon is inserted, but the byte sequence is split up into multiple characters which are inserted (not sure how to explain that better).
To Reproduce
Steps to reproduce the behavior:
Expected behavior
The chosen emoticon should be inserted into the text.
Actual behavior
The multibyte emoticon is instead split up into multiple characters which are inserted into the text
Screenshots
Tablet
Additional context
This used to work in previous versions of the app.
Maybe one of those bugs and their corresponding commits are useful:
https://bugs.documentfoundation.org/show_bug.cgi?id=128069
https://bugs.documentfoundation.org/show_bug.cgi?id=124350
The text was updated successfully, but these errors were encountered: