Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emoji sequences counted as multiple characters instead of a single character #4945

Closed
p6rt opened this issue Dec 27, 2015 · 3 comments
Closed

Emoji sequences counted as multiple characters instead of a single character #4945

p6rt opened this issue Dec 27, 2015 · 3 comments
Labels

Comments

@p6rt
Copy link

@p6rt p6rt commented Dec 27, 2015

Migrated from rt.perl.org#127047 (status was 'resolved')

Searchable as RT127047$

@p6rt
Copy link
Author

@p6rt p6rt commented Dec 27, 2015

From @lizmat

[22​:52​:03] <AlexDaniel> folks, so am I getting it right that emoji skin color modifier should kinda combine with the previous character? So that we get 1 when we do .chars?
[22​:52​:33] <lizmat> AlexDaniel​: it probably should, yes
[22​:53​:54] <AlexDaniel> what does the unicode say about this? Should it actually be treated as one grapheme? That would make sense
[22​:55​:10] <ShimmerFairy> AlexDaniel​: AFAICT Unicode's current definition of "extended grapheme cluster" may not cover the skin color modifiers; if so, feel free to file a bug with them :)
[22​:55​:35] <AlexDaniel> “When a human emoji is not immediately followed by a emoji modifier character, it should use a generic, non-realistic skin tone, such as​: #​3399CC”
[22​:55​:41] <AlexDaniel> this is just nuts…
[22​:56​:17] <AlexDaniel> Ox0dea​: they are wrong. Throw a combiner into them and they'll say the same thing
[22​:56​:29] <AlexDaniel> RabidGravy​: it's fucking blue
[22​:57​:19] <chansen_> AlexDaniel​: why do you think I ask you? Unicode is insane! ;o)
[22​:57​:39] <Ox0dea> AlexDaniel​: You're right.
[22​:57​:58] <blub> thats a pretty blue
[22​:58​:55] <AlexDaniel> “As to hair color, dark hair tends to be more neutral, because people of every skin tone can have black (or very dark brown) hair—however, there is no requirement for any particular hair color. One exception is PERSON WITH BLOND HAIR, which needs to have blond hair regardless of skin tone.”
[22​:59​:06] <AlexDaniel> Just read this, it is hilarious​: http://unicode.org/reports/tr51/#Emoji_Modifiers_Table
[22​:59​:35] <Zoffix> lol
[22​:59​:35] <Skarsnik> rofl unicode
[23​:01​:25] <ShimmerFairy> it looks like the skin tone modifiers are Grapheme_Base, and not Grapheme_Extend like I'd expect, for some odd reason
[23​:01​:28] <mort96> yes, actually
[23​:01​:31] <ChoHag> That's a rhetorical question.
[23​:01​:42] <Zoffix> Skarsnik, what's the use case?
[23​:01​:51] <lizmat> m​: my $f = "\x1F466\x1F3FE"; say "$f $f.chars()" # this feels like a bug
[23​:01​:52] <+camelia> rakudo-moar 9441bb​: OUTPUT«👦🏾 2␤»
[23​:01​:52] <AlexDaniel> “In real multi-person groupings, the members may have a variety of skin tones.” – does it mean that KISS character should have one person black and another one white?
[23​:04​:20] <AlexDaniel> ok, I give up, I have no idea how this could possibly work
[23​:04​:22] <mort96> plus, how are you supposed to get notifications? You could have something listen to a bell character, but you wouldn't get the text and sender in the notification
[23​:04​:30] <lizmat> AlexDaniel​: will submit rakudobug
[23​:04​:45] <AlexDaniel> lizmat​: oh, nice

FWIW, on http://unicode.org/reports/tr51/ , paragraph 2.2.3 states​: "A supported emoji modifier sequence should be treated as a single grapheme cluster for editing purposes (cursor moment, deletion, etc.); word break, line break, etc.” so it feels to me that this should say 1 instead of 2​:

my $f = "\x1F466\x1F3FE"; say "$f $f.chars()”

Liz

@p6rt
Copy link
Author

@p6rt p6rt commented Jul 8, 2017

From @samcv

The mentioned Emoji now pass, though there are some emoji with skin modifiers which do not yet pass though these ones include ZWJ so I am going to close this issue.

We pass all of the Emoji v4 which are emoji-sequences.

@p6rt
Copy link
Author

@p6rt p6rt commented Jul 8, 2017

@samcv - Status changed from 'new' to 'resolved'

@p6rt p6rt closed this as completed Jul 8, 2017
@p6rt p6rt added the uni label Jan 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant