Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iOS app: can't use the Emoji keyboard #733

Closed
nikslor opened this issue Nov 20, 2020 · 15 comments · Fixed by #798
Closed

iOS app: can't use the Emoji keyboard #733

nikslor opened this issue Nov 20, 2020 · 15 comments · Fixed by #798
Assignees
Labels
bug Something isn't working ios Specific to the Collabora Office iOS app, or to online accessed from i(Pad)OS devices

Comments

@nikslor
Copy link

nikslor commented Nov 20, 2020

Describe the bug
On the Apple smart keyboard (attached hardware keyboard) there is a "globe" key in the lower left corner. Using this key, the keyboard selection on the screen pops up. When choosing the "Emoji" keyboard, a dialog pops up on the screen where the user can choose an emoticon which will then be inserted into the text.

So far this works; however after choosing an emoticon a wrong text input happens - not the emoticon is inserted, but the byte sequence is split up into multiple characters which are inserted (not sure how to explain that better).

To Reproduce
Steps to reproduce the behavior:

  1. Open a writer document using the iOS app
  2. Push the "globe" key on the attached Apple smart keyboard
  3. Use the arrow keys to select the "Emoji" keyboard
  4. Choose an emoticon from the list

Expected behavior
The chosen emoticon should be inserted into the text.

Actual behavior
The multibyte emoticon is instead split up into multiple characters which are inserted into the text

Screenshots
emoticons_split_up

Tablet

  • OS: iPad OS 14.2
  • Version: 6.4.1 (1)

Additional context
This used to work in previous versions of the app.
Maybe one of those bugs and their corresponding commits are useful:
https://bugs.documentfoundation.org/show_bug.cgi?id=128069
https://bugs.documentfoundation.org/show_bug.cgi?id=124350

@nikslor nikslor added bug Something isn't working ios Specific to the Collabora Office iOS app, or to online accessed from i(Pad)OS devices unconfirmed labels Nov 20, 2020
@tml1024
Copy link
Contributor

tml1024 commented Nov 24, 2020

Looking into this now. I think also non-BMP characters and even flags (which actually are two code points) used to work at some time, but this has regressed.

@thebearon thebearon removed their assignment Nov 24, 2020
@mmeeks
Copy link
Contributor

mmeeks commented Nov 24, 2020

Would be interesting to know what ends up in the hidden text area.
Nicolas - can you tripple click on help->about and select 'Show Clipboard' - and see what shows up in the text area that should become visible as an overlay nearby the cursor =)
Quite possibly the hand-crafted UTF-16 parser we need for IE11 has not had enough testing ;-) cf. TextInput.js getValueAsCodePoints - and would appreciate that.
Also - quite possibly - we need a richer text-input area, that can accept different fonts - I notice some other code around the place can accept font change via the input field (!?) which is a bit of an odd concept, but may be needed - although, clearly transferring any missing fonts from client -> server is a massive can of worms ... hmm =)

@tml1024
Copy link
Contributor

tml1024 commented Nov 24, 2020

@mmeeks I am on it already. The handling of non-BMP characters is fairly broken in the code. Confusion between code points and UTF-16 code units etc. Am fixing.

@tml1024
Copy link
Contributor

tml1024 commented Nov 24, 2020

The problem is that getValueAsCodePoints(), as its name says, returns an array of integers that are Unicode code points (thus can be larger than 65536, especially for emojis and interesting scripts). Not UTF-16 code units (which are always less than 65536). Still the code uses the JS function fromCharCode() on these numbers. That function expects UTF-16 code units, and it apparently simply truncates the argument to 16 bits.

@tml1024
Copy link
Contributor

tml1024 commented Nov 24, 2020

Another bug is that the code tries to convert from a surrogate pair to code point using the expression code = high + code - 0xdc00 + 0x100000; The 0x100000 should be 0x10000.

tml1024 pushed a commit that referenced this issue Nov 24, 2020
Our existing function getValueAsCodePoints() returns an array of
integers that are Unicode code points, as its name says. (Code points
can be larger than 65535, especially for emojis and interesting
scripts). It does not return an array of UTF-16 code units (which are
always less than 65536).

Still, the code used the JavaScript function String.fromCharCode() on
the elements of the returned array. That function expects UTF-16 code
units, and it simply truncates the argument to 16 bits. This obviously
leads to very wrong results.

There is a function String.fromCodePoint() that would work, but it
isn't present in MSIE so we can't use it. Instead, introduce a new
function codePointsToString() that works properly, producing surrogate
pairs as necessary.

Additionally, the code in getValueAsCodePoints() that combines a
surrogate pair to a code point used 0x100000 instead of 0x10000. Had
this code been tested at all, one wonders.

Also, add some more debug output to the affected functions, bypassed
with if (false).

Fixes #733

Change-Id: Id50d05ac95285edc93f1e3f5a2538a0732186476
tml1024 pushed a commit that referenced this issue Nov 24, 2020
Our existing function getValueAsCodePoints() returns an array of
integers that are Unicode code points, as its name says. (Code points
can be larger than 65535, especially for emojis and interesting
scripts). It does not return an array of UTF-16 code units (which are
always less than 65536).

Still, the code used the JavaScript function String.fromCharCode() on
the elements of the returned array. That function expects UTF-16 code
units, and it simply truncates the argument to 16 bits. This obviously
leads to very wrong results.

There is a function String.fromCodePoint() that would work, but it
isn't present in MSIE so we can't use it. Instead, introduce a new
function codePointsToString() that works properly, producing surrogate
pairs as necessary.

Additionally, the code in getValueAsCodePoints() that combines a
surrogate pair to a code point used 0x100000 instead of 0x10000. Had
this code been tested at all, one wonders.

Also, add some more debug output to the affected functions, bypassed
with if (false).

Fixes #733

Change-Id: Id50d05ac95285edc93f1e3f5a2538a0732186476
Signed-off-by: Tor Lillqvist <tml@collabora.com>
tml1024 pushed a commit that referenced this issue Nov 24, 2020
Our existing function getValueAsCodePoints() returns an array of
integers that are Unicode code points, as its name says. (Code points
can be larger than 65535, especially for emojis and interesting
scripts). It does not return an array of UTF-16 code units (which are
always less than 65536).

Still, the code used the JavaScript function String.fromCharCode() on
the elements of the returned array. That function expects UTF-16 code
units, and it simply truncates the argument to 16 bits. This obviously
leads to very wrong results.

There is a function String.fromCodePoint() that would work, but it
isn't present in MSIE so we can't use it. Instead, introduce a new
function codePointsToString() that works properly, producing surrogate
pairs as necessary.

Additionally, the code in getValueAsCodePoints() that combines a
surrogate pair to a code point used 0x100000 instead of 0x10000. Had
this code been tested at all, one wonders.

Also, add some more debug output to the affected functions, bypassed
with if (false).

Fixes #733

Change-Id: Id50d05ac95285edc93f1e3f5a2538a0732186476
Signed-off-by: Tor Lillqvist <tml@collabora.com>
@nikslor
Copy link
Author

nikslor commented Nov 24, 2020

Thanks a lot for the improvements @tml1024! 6.4.1 (3) is on par with the functionality we had in older versions!

One remaining issue I see is, that some emoticons are split up into their individual parts so the "male painter" (which is one emotion) becomes two emoticons after inserting: guy + color palette.

emoticons_split_up_2

@nikslor
Copy link
Author

nikslor commented Nov 24, 2020

For the future me:
ffmpeg -i foo.mp4 -vf "fps=10,scale=800:-1:flags=lanczos,split[s0][s1];[s0]palettegen[p];[s1][p]paletteuse" -loop 0 foo.gif

@tml1024
Copy link
Contributor

tml1024 commented Nov 25, 2020

Looking into the "male artist" issue now. I fear that is something that will need to be implemented in LibreOffice core, though, and probably should be seen as a missing feature, not a bug.

@tml1024
Copy link
Contributor

tml1024 commented Nov 25, 2020

But sure, to the end-user it looks like a bug, as that clever emoji that is a combination of multiple code points with a zero-width-joiner looks like any other emoji when they are choosing one. The end-user can not know which emojis are implemented in which way. it is very surprising to the end-user when instead of a painter they get a head and a palette...

@tml1024
Copy link
Contributor

tml1024 commented Nov 25, 2020

And it can be even more complicated, one can add in a skin tone, for instance: https://emojipedia.org/man-artist-medium-light-skin-tone/ . What looks like a single emoji (glyph) 👨🏼‍🎨 consists of four code points: 👨 Man, 🏼 Medium-Light Skin Tone, ‍ Zero Width Joiner and 🎨 Artist Palette. See https://emojipedia.org/man-artist-medium-light-skin-tone/

@tml1024
Copy link
Contributor

tml1024 commented Nov 25, 2020

For web-based Collabora Online, where LibreOffice core runs on Linux, the required but possibly missing functionality would probably be in some external library, perhaps cairo. (Not entirely clear to me.) On iOS (and macOS) we don't use cairo, but system APIs, so there actually it might be easier to make it "just work". But as it doesn't "just work" on iOS currently, it is possible that some changes are required in LibreOffice core, too. And maybe for instance in the ICU external library.

@tml1024
Copy link
Contributor

tml1024 commented Nov 25, 2020

@tml1024
Copy link
Contributor

tml1024 commented Nov 25, 2020

Interesting: In LibreOffice, on macOS, if I use the Apple Color Emoji font, it works! So possibly it will work in the iOS app, too, if I just in some way make sure that font gets used. (The user is of course not expected to explicitly select an emoji font for emojis; it should happen automagically.)

@tml1024
Copy link
Contributor

tml1024 commented Nov 25, 2020

Suggested fix in https://gerrit.libreoffice.org/c/core/+/106632 . Let's see what the reviewers say, if anything.

@tml1024
Copy link
Contributor

tml1024 commented Nov 26, 2020

(See above-mentioned Gerrit change for more detailed technical discussion. This is a very complex issue, and digging into it reveals some very dirty bits in LibreOffice...)

tml1024 pushed a commit that referenced this issue Nov 27, 2020
Our existing function getValueAsCodePoints() returns an array of
integers that are Unicode code points, as its name says. (Code points
can be larger than 65535, especially for emojis and interesting
scripts). It does not return an array of UTF-16 code units (which are
always less than 65536).

Still, the code used the JavaScript function String.fromCharCode() on
the elements of the returned array. That function expects UTF-16 code
units, and it simply truncates the argument to 16 bits. This obviously
leads to very wrong results.

There is a function String.fromCodePoint() that would work, but it
isn't present in MSIE so we can't use it. Instead, introduce a new
function codePointsToString() that works properly, producing surrogate
pairs as necessary.

Additionally, the code in getValueAsCodePoints() that combines a
surrogate pair to a code point used 0x100000 instead of 0x10000. Had
this code been tested at all, one wonders.

Also, add some more debug output to the affected functions, bypassed
with if (false).

Fixes #733

Change-Id: Id50d05ac95285edc93f1e3f5a2538a0732186476
Signed-off-by: Tor Lillqvist <tml@collabora.com>
tml1024 pushed a commit that referenced this issue Nov 27, 2020
Our existing function getValueAsCodePoints() returns an array of
integers that are Unicode code points, as its name says. (Code points
can be larger than 65535, especially for emojis and interesting
scripts). It does not return an array of UTF-16 code units (which are
always less than 65536).

Still, the code used the JavaScript function String.fromCharCode() on
the elements of the returned array. That function expects UTF-16 code
units, and it simply truncates the argument to 16 bits. This obviously
leads to very wrong results.

There is a function String.fromCodePoint() that would work, but it
isn't present in MSIE so we can't use it. Instead, introduce a new
function codePointsToString() that works properly, producing surrogate
pairs as necessary.

Additionally, the code in getValueAsCodePoints() that combines a
surrogate pair to a code point used 0x100000 instead of 0x10000. Had
this code been tested at all, one wonders.

Also, add some more debug output to the affected functions, bypassed
with if (false).

Fixes #733

Change-Id: Id50d05ac95285edc93f1e3f5a2538a0732186476
Signed-off-by: Tor Lillqvist <tml@collabora.com>
tml1024 pushed a commit that referenced this issue Nov 27, 2020
Our existing function getValueAsCodePoints() returns an array of
integers that are Unicode code points, as its name says. (Code points
can be larger than 65535, especially for emojis and interesting
scripts). It does not return an array of UTF-16 code units (which are
always less than 65536).

Still, the code used the JavaScript function String.fromCharCode() on
the elements of the returned array. That function expects UTF-16 code
units, and it simply truncates the argument to 16 bits. This obviously
leads to very wrong results.

There is a function String.fromCodePoint() that would work, but it
isn't present in MSIE so we can't use it. Instead, introduce a new
function codePointsToString() that works properly, producing surrogate
pairs as necessary.

Additionally, the code in getValueAsCodePoints() that combines a
surrogate pair to a code point used 0x100000 instead of 0x10000. Had
this code been tested at all, one wonders.

Also, add some more debug output to the affected functions, bypassed
with if (false).

Fixes #733

Change-Id: Id50d05ac95285edc93f1e3f5a2538a0732186476
Signed-off-by: Tor Lillqvist <tml@collabora.com>
tml1024 pushed a commit that referenced this issue Nov 27, 2020
Our existing function getValueAsCodePoints() returns an array of
integers that are Unicode code points, as its name says. (Code points
can be larger than 65535, especially for emojis and interesting
scripts). It does not return an array of UTF-16 code units (which are
always less than 65536).

Still, the code used the JavaScript function String.fromCharCode() on
the elements of the returned array. That function expects UTF-16 code
units, and it simply truncates the argument to 16 bits. This obviously
leads to very wrong results.

There is a function String.fromCodePoint() that would work, but it
isn't present in MSIE so we can't use it. Instead, introduce a new
function codePointsToString() that works properly, producing surrogate
pairs as necessary.

Additionally, the code in getValueAsCodePoints() that combines a
surrogate pair to a code point used 0x100000 instead of 0x10000.

Also, add some more debug output to the affected functions, bypassed
with if (false).

Fixes #733

Change-Id: Id50d05ac95285edc93f1e3f5a2538a0732186476
Signed-off-by: Tor Lillqvist <tml@collabora.com>
tml1024 pushed a commit that referenced this issue Nov 27, 2020
Our existing function getValueAsCodePoints() returns an array of
integers that are Unicode code points, as its name says. (Code points
can be larger than 65535, especially for emojis and interesting
scripts). It does not return an array of UTF-16 code units (which are
always less than 65536).

Still, the code used the JavaScript function String.fromCharCode() on
the elements of the returned array. That function expects UTF-16 code
units, and it simply truncates the argument to 16 bits. This obviously
leads to very wrong results.

There is a function String.fromCodePoint() that would work, but it
isn't present in MSIE so we can't use it. Instead, introduce a new
function codePointsToString() that works properly, producing surrogate
pairs as necessary.

Additionally, the code in getValueAsCodePoints() that combines a
surrogate pair to a code point used 0x100000 instead of 0x10000.

Also, add some more debug output to the affected functions, bypassed
with if (false).

Fixes #733

Change-Id: Id50d05ac95285edc93f1e3f5a2538a0732186476
Signed-off-by: Tor Lillqvist <tml@collabora.com>
tml1024 pushed a commit that referenced this issue Nov 28, 2020
Our existing function getValueAsCodePoints() returns an array of
integers that are Unicode code points, as its name says. (Code points
can be larger than 65535, especially for emojis and interesting
scripts). It does not return an array of UTF-16 code units (which are
always less than 65536).

Still, the code used the JavaScript function String.fromCharCode() on
the elements of the returned array. That function expects UTF-16 code
units, and it simply truncates the argument to 16 bits. This obviously
leads to very wrong results.

There is a function String.fromCodePoint() that would work, but it
isn't present in MSIE so we can't use it. Instead, introduce a new
function codePointsToString() that works properly, producing surrogate
pairs as necessary.

Additionally, the code in getValueAsCodePoints() that combines a
surrogate pair to a code point used 0x100000 instead of 0x10000.

Also, add some more debug output to the affected functions, bypassed
with if (false).

Fixes #733

Change-Id: Id50d05ac95285edc93f1e3f5a2538a0732186476
Signed-off-by: Tor Lillqvist <tml@collabora.com>
thais-dev pushed a commit that referenced this issue Dec 2, 2020
Our existing function getValueAsCodePoints() returns an array of
integers that are Unicode code points, as its name says. (Code points
can be larger than 65535, especially for emojis and interesting
scripts). It does not return an array of UTF-16 code units (which are
always less than 65536).

Still, the code used the JavaScript function String.fromCharCode() on
the elements of the returned array. That function expects UTF-16 code
units, and it simply truncates the argument to 16 bits. This obviously
leads to very wrong results.

There is a function String.fromCodePoint() that would work, but it
isn't present in MSIE so we can't use it. Instead, introduce a new
function codePointsToString() that works properly, producing surrogate
pairs as necessary.

Additionally, the code in getValueAsCodePoints() that combines a
surrogate pair to a code point used 0x100000 instead of 0x10000. Had
this code been tested at all, one wonders.

Also, add some more debug output to the affected functions, bypassed
with if (false).

Fixes #733

Change-Id: Id50d05ac95285edc93f1e3f5a2538a0732186476
Signed-off-by: Tor Lillqvist <tml@collabora.com>
Ezinnem pushed a commit to Ezinnem/online that referenced this issue Dec 22, 2020
Our existing function getValueAsCodePoints() returns an array of
integers that are Unicode code points, as its name says. (Code points
can be larger than 65535, especially for emojis and interesting
scripts). It does not return an array of UTF-16 code units (which are
always less than 65536).

Still, the code used the JavaScript function String.fromCharCode() on
the elements of the returned array. That function expects UTF-16 code
units, and it simply truncates the argument to 16 bits. This obviously
leads to very wrong results.

There is a function String.fromCodePoint() that would work, but it
isn't present in MSIE so we can't use it. Instead, introduce a new
function codePointsToString() that works properly, producing surrogate
pairs as necessary.

Additionally, the code in getValueAsCodePoints() that combines a
surrogate pair to a code point used 0x100000 instead of 0x10000. Had
this code been tested at all, one wonders.

Also, add some more debug output to the affected functions, bypassed
with if (false).

Fixes CollaboraOnline#733

Change-Id: Id50d05ac95285edc93f1e3f5a2538a0732186476
Signed-off-by: Tor Lillqvist <tml@collabora.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ios Specific to the Collabora Office iOS app, or to online accessed from i(Pad)OS devices
Projects
None yet
4 participants