iOS app: can't use the Emoji keyboard #733

nikslor · 2020-11-20T06:40:43Z

Describe the bug
On the Apple smart keyboard (attached hardware keyboard) there is a "globe" key in the lower left corner. Using this key, the keyboard selection on the screen pops up. When choosing the "Emoji" keyboard, a dialog pops up on the screen where the user can choose an emoticon which will then be inserted into the text.

So far this works; however after choosing an emoticon a wrong text input happens - not the emoticon is inserted, but the byte sequence is split up into multiple characters which are inserted (not sure how to explain that better).

To Reproduce
Steps to reproduce the behavior:

Open a writer document using the iOS app
Push the "globe" key on the attached Apple smart keyboard
Use the arrow keys to select the "Emoji" keyboard
Choose an emoticon from the list

Expected behavior
The chosen emoticon should be inserted into the text.

Actual behavior
The multibyte emoticon is instead split up into multiple characters which are inserted into the text

Screenshots

Tablet

OS: iPad OS 14.2
Version: 6.4.1 (1)

Additional context
This used to work in previous versions of the app.
Maybe one of those bugs and their corresponding commits are useful:
https://bugs.documentfoundation.org/show_bug.cgi?id=128069
https://bugs.documentfoundation.org/show_bug.cgi?id=124350

tml1024 · 2020-11-24T08:46:36Z

Looking into this now. I think also non-BMP characters and even flags (which actually are two code points) used to work at some time, but this has regressed.

mmeeks · 2020-11-24T11:04:32Z

Would be interesting to know what ends up in the hidden text area.
Nicolas - can you tripple click on help->about and select 'Show Clipboard' - and see what shows up in the text area that should become visible as an overlay nearby the cursor =)
Quite possibly the hand-crafted UTF-16 parser we need for IE11 has not had enough testing ;-) cf. TextInput.js getValueAsCodePoints - and would appreciate that.
Also - quite possibly - we need a richer text-input area, that can accept different fonts - I notice some other code around the place can accept font change via the input field (!?) which is a bit of an odd concept, but may be needed - although, clearly transferring any missing fonts from client -> server is a massive can of worms ... hmm =)

tml1024 · 2020-11-24T11:26:36Z

@mmeeks I am on it already. The handling of non-BMP characters is fairly broken in the code. Confusion between code points and UTF-16 code units etc. Am fixing.

tml1024 · 2020-11-24T11:30:16Z

The problem is that getValueAsCodePoints(), as its name says, returns an array of integers that are Unicode code points (thus can be larger than 65536, especially for emojis and interesting scripts). Not UTF-16 code units (which are always less than 65536). Still the code uses the JS function fromCharCode() on these numbers. That function expects UTF-16 code units, and it apparently simply truncates the argument to 16 bits.

tml1024 · 2020-11-24T13:28:45Z

Another bug is that the code tries to convert from a surrogate pair to code point using the expression code = high + code - 0xdc00 + 0x100000; The 0x100000 should be 0x10000.

Our existing function getValueAsCodePoints() returns an array of integers that are Unicode code points, as its name says. (Code points can be larger than 65535, especially for emojis and interesting scripts). It does not return an array of UTF-16 code units (which are always less than 65536). Still, the code used the JavaScript function String.fromCharCode() on the elements of the returned array. That function expects UTF-16 code units, and it simply truncates the argument to 16 bits. This obviously leads to very wrong results. There is a function String.fromCodePoint() that would work, but it isn't present in MSIE so we can't use it. Instead, introduce a new function codePointsToString() that works properly, producing surrogate pairs as necessary. Additionally, the code in getValueAsCodePoints() that combines a surrogate pair to a code point used 0x100000 instead of 0x10000. Had this code been tested at all, one wonders. Also, add some more debug output to the affected functions, bypassed with if (false). Fixes #733 Change-Id: Id50d05ac95285edc93f1e3f5a2538a0732186476

Our existing function getValueAsCodePoints() returns an array of integers that are Unicode code points, as its name says. (Code points can be larger than 65535, especially for emojis and interesting scripts). It does not return an array of UTF-16 code units (which are always less than 65536). Still, the code used the JavaScript function String.fromCharCode() on the elements of the returned array. That function expects UTF-16 code units, and it simply truncates the argument to 16 bits. This obviously leads to very wrong results. There is a function String.fromCodePoint() that would work, but it isn't present in MSIE so we can't use it. Instead, introduce a new function codePointsToString() that works properly, producing surrogate pairs as necessary. Additionally, the code in getValueAsCodePoints() that combines a surrogate pair to a code point used 0x100000 instead of 0x10000. Had this code been tested at all, one wonders. Also, add some more debug output to the affected functions, bypassed with if (false). Fixes #733 Change-Id: Id50d05ac95285edc93f1e3f5a2538a0732186476 Signed-off-by: Tor Lillqvist <tml@collabora.com>

nikslor · 2020-11-24T19:05:18Z

Thanks a lot for the improvements @tml1024! 6.4.1 (3) is on par with the functionality we had in older versions!

One remaining issue I see is, that some emoticons are split up into their individual parts so the "male painter" (which is one emotion) becomes two emoticons after inserting: guy + color palette.

nikslor · 2020-11-24T19:05:56Z

For the future me:
ffmpeg -i foo.mp4 -vf "fps=10,scale=800:-1:flags=lanczos,split[s0][s1];[s0]palettegen[p];[s1][p]paletteuse" -loop 0 foo.gif

tml1024 · 2020-11-25T09:47:57Z

Looking into the "male artist" issue now. I fear that is something that will need to be implemented in LibreOffice core, though, and probably should be seen as a missing feature, not a bug.

tml1024 · 2020-11-25T09:58:07Z

But sure, to the end-user it looks like a bug, as that clever emoji that is a combination of multiple code points with a zero-width-joiner looks like any other emoji when they are choosing one. The end-user can not know which emojis are implemented in which way. it is very surprising to the end-user when instead of a painter they get a head and a palette...

tml1024 · 2020-11-25T10:01:31Z

And it can be even more complicated, one can add in a skin tone, for instance: https://emojipedia.org/man-artist-medium-light-skin-tone/ . What looks like a single emoji (glyph) 👨🏼‍🎨 consists of four code points: 👨 Man, 🏼 Medium-Light Skin Tone, ‍ Zero Width Joiner and 🎨 Artist Palette. See https://emojipedia.org/man-artist-medium-light-skin-tone/

tml1024 · 2020-11-25T10:03:42Z

For web-based Collabora Online, where LibreOffice core runs on Linux, the required but possibly missing functionality would probably be in some external library, perhaps cairo. (Not entirely clear to me.) On iOS (and macOS) we don't use cairo, but system APIs, so there actually it might be easier to make it "just work". But as it doesn't "just work" on iOS currently, it is possible that some changes are required in LibreOffice core, too. And maybe for instance in the ICU external library.

tml1024 · 2020-11-25T10:50:56Z

Filed upstream bug https://bugs.documentfoundation.org/show_bug.cgi?id=138481

tml1024 · 2020-11-25T13:10:57Z

Interesting: In LibreOffice, on macOS, if I use the Apple Color Emoji font, it works! So possibly it will work in the iOS app, too, if I just in some way make sure that font gets used. (The user is of course not expected to explicitly select an emoji font for emojis; it should happen automagically.)

tml1024 · 2020-11-25T20:18:22Z

Suggested fix in https://gerrit.libreoffice.org/c/core/+/106632 . Let's see what the reviewers say, if anything.

tml1024 · 2020-11-26T12:17:27Z

(See above-mentioned Gerrit change for more detailed technical discussion. This is a very complex issue, and digging into it reveals some very dirty bits in LibreOffice...)

Our existing function getValueAsCodePoints() returns an array of integers that are Unicode code points, as its name says. (Code points can be larger than 65535, especially for emojis and interesting scripts). It does not return an array of UTF-16 code units (which are always less than 65536). Still, the code used the JavaScript function String.fromCharCode() on the elements of the returned array. That function expects UTF-16 code units, and it simply truncates the argument to 16 bits. This obviously leads to very wrong results. There is a function String.fromCodePoint() that would work, but it isn't present in MSIE so we can't use it. Instead, introduce a new function codePointsToString() that works properly, producing surrogate pairs as necessary. Additionally, the code in getValueAsCodePoints() that combines a surrogate pair to a code point used 0x100000 instead of 0x10000. Had this code been tested at all, one wonders. Also, add some more debug output to the affected functions, bypassed with if (false). Fixes #733 Change-Id: Id50d05ac95285edc93f1e3f5a2538a0732186476 Signed-off-by: Tor Lillqvist <tml@collabora.com>

Our existing function getValueAsCodePoints() returns an array of integers that are Unicode code points, as its name says. (Code points can be larger than 65535, especially for emojis and interesting scripts). It does not return an array of UTF-16 code units (which are always less than 65536). Still, the code used the JavaScript function String.fromCharCode() on the elements of the returned array. That function expects UTF-16 code units, and it simply truncates the argument to 16 bits. This obviously leads to very wrong results. There is a function String.fromCodePoint() that would work, but it isn't present in MSIE so we can't use it. Instead, introduce a new function codePointsToString() that works properly, producing surrogate pairs as necessary. Additionally, the code in getValueAsCodePoints() that combines a surrogate pair to a code point used 0x100000 instead of 0x10000. Also, add some more debug output to the affected functions, bypassed with if (false). Fixes #733 Change-Id: Id50d05ac95285edc93f1e3f5a2538a0732186476 Signed-off-by: Tor Lillqvist <tml@collabora.com>

Our existing function getValueAsCodePoints() returns an array of integers that are Unicode code points, as its name says. (Code points can be larger than 65535, especially for emojis and interesting scripts). It does not return an array of UTF-16 code units (which are always less than 65536). Still, the code used the JavaScript function String.fromCharCode() on the elements of the returned array. That function expects UTF-16 code units, and it simply truncates the argument to 16 bits. This obviously leads to very wrong results. There is a function String.fromCodePoint() that would work, but it isn't present in MSIE so we can't use it. Instead, introduce a new function codePointsToString() that works properly, producing surrogate pairs as necessary. Additionally, the code in getValueAsCodePoints() that combines a surrogate pair to a code point used 0x100000 instead of 0x10000. Had this code been tested at all, one wonders. Also, add some more debug output to the affected functions, bypassed with if (false). Fixes #733 Change-Id: Id50d05ac95285edc93f1e3f5a2538a0732186476 Signed-off-by: Tor Lillqvist <tml@collabora.com>

Our existing function getValueAsCodePoints() returns an array of integers that are Unicode code points, as its name says. (Code points can be larger than 65535, especially for emojis and interesting scripts). It does not return an array of UTF-16 code units (which are always less than 65536). Still, the code used the JavaScript function String.fromCharCode() on the elements of the returned array. That function expects UTF-16 code units, and it simply truncates the argument to 16 bits. This obviously leads to very wrong results. There is a function String.fromCodePoint() that would work, but it isn't present in MSIE so we can't use it. Instead, introduce a new function codePointsToString() that works properly, producing surrogate pairs as necessary. Additionally, the code in getValueAsCodePoints() that combines a surrogate pair to a code point used 0x100000 instead of 0x10000. Had this code been tested at all, one wonders. Also, add some more debug output to the affected functions, bypassed with if (false). Fixes CollaboraOnline#733 Change-Id: Id50d05ac95285edc93f1e3f5a2538a0732186476 Signed-off-by: Tor Lillqvist <tml@collabora.com>

nikslor added bug Something isn't working ios Specific to the Collabora Office iOS app, or to online accessed from i(Pad)OS devices unconfirmed labels Nov 20, 2020

nikslor assigned tml1024 and thebearon Nov 20, 2020

thebearon removed their assignment Nov 24, 2020

thebearon removed the unconfirmed label Nov 24, 2020

tml1024 mentioned this issue Nov 24, 2020

Fix various problems in handling of non-BMP code points #760

Merged

6 tasks

tml1024 mentioned this issue Nov 27, 2020

Fix various problems in handling of non-BMP code points (cherry-pick from co-6-4) #798

Merged

6 tasks

tml1024 closed this as completed in #798 Nov 27, 2020

tml1024 mentioned this issue Nov 27, 2020

Fix various problems in handling of non-BMP code points #806

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

iOS app: can't use the Emoji keyboard #733

iOS app: can't use the Emoji keyboard #733

nikslor commented Nov 20, 2020

tml1024 commented Nov 24, 2020

mmeeks commented Nov 24, 2020

tml1024 commented Nov 24, 2020

tml1024 commented Nov 24, 2020

tml1024 commented Nov 24, 2020

nikslor commented Nov 24, 2020

nikslor commented Nov 24, 2020

tml1024 commented Nov 25, 2020

tml1024 commented Nov 25, 2020 •

edited

tml1024 commented Nov 25, 2020

tml1024 commented Nov 25, 2020 •

edited

tml1024 commented Nov 25, 2020

tml1024 commented Nov 25, 2020 •

edited

tml1024 commented Nov 25, 2020

tml1024 commented Nov 26, 2020 •

edited

iOS app: can't use the Emoji keyboard #733

iOS app: can't use the Emoji keyboard #733

Comments

nikslor commented Nov 20, 2020

tml1024 commented Nov 24, 2020

mmeeks commented Nov 24, 2020

tml1024 commented Nov 24, 2020

tml1024 commented Nov 24, 2020

tml1024 commented Nov 24, 2020

nikslor commented Nov 24, 2020

nikslor commented Nov 24, 2020

tml1024 commented Nov 25, 2020

tml1024 commented Nov 25, 2020 • edited

tml1024 commented Nov 25, 2020

tml1024 commented Nov 25, 2020 • edited

tml1024 commented Nov 25, 2020

tml1024 commented Nov 25, 2020 • edited

tml1024 commented Nov 25, 2020

tml1024 commented Nov 26, 2020 • edited

tml1024 commented Nov 25, 2020 •

edited

tml1024 commented Nov 25, 2020 •

edited

tml1024 commented Nov 25, 2020 •

edited

tml1024 commented Nov 26, 2020 •

edited