Support For Emoji Modifiers #3975

carloabelli · 2020-07-15T14:49:19Z

Now that there is emoji support, support for emoji modifiers would be nice as well. For example the 👋🏻 emoji does not render correctly (instead renders as its component parts):

System

OS: Linux
Version: alacritty 0.4.3
Linux: wayland (sway)

The text was updated successfully, but these errors were encountered:

kchibisov · 2020-07-15T17:00:27Z

In general combining such characters doesn't work in terminal emulators, since you can't have the same width after combining them. If anything, it should be done in a way to somehow combine them, and then center glyphs in the old(before combination) amount of cells.

So, the only thing we can do is to implement that "hack", other than that, nothing we can do about it.

I'll keep this issue open to for implementing the workaround I'm suggesting, since it could make sense, since certain emojis do combine preserving total width, like flags.

carloabelli · 2020-07-15T17:06:56Z

It seems that at least some terminal emulators support this behavior. See neovim/neovim#7151. I'm not exactly sure how they do it, but it does not appear to be the hack you described.

kchibisov · 2020-07-15T17:24:45Z

The only terminal that supports such things, I'm aware of is kitty, and it's broken in unusable way, when it comes to such things. The thing is that such terminals do combine them without preserving the width, and so break the applications you're using.

For example in kitty.

I've moved my cursor the the start of the input in a prompt with ctrl + a, and you may see that it's over prompt itself, which is unusable.

On the other hand in alacritty the emojis are not combined, but you have a usable state.

That's more noticeable in application like weechat, since once someone posted such emoji into it everything goes wild, and content is not being aligned anymore.

The thing way it's happening, because applications rely on wcwidth like functions, and so using wcswidth like functions will break implicit synchronization. It could also be broken by mismatches of wcwidth but it happens way less frequent than wcswidth and wcwidth mismatches, and the usage of wcwidth is way less simpler and performant.

christianparpart · 2020-07-15T18:21:47Z

wc(s)width MUST die! If you look at a single codepoints only then you are doomed to not support unicode. Unicode (user perceived) characters can be more than one codepoint. And kitty is doing a decent job at finding out what Codepoint belong to which grid cell.

The problem is that many apps rely on wcwidth and this API is broken by definition.
Everyone will tell you that you must use wcwidth to not break other apps. And while they are right on this one, they prevent moving on and keeping up with the (unicode) standards. If we all keep insisting on using old broken API's, just to not break other apps, then NOONE will ever move forward. But who will be pushing?

It's a job nobody will thank you for. And everyone thinks they know more than everyone else, but in the end, we keep still relying on old APIs. That's a sad story.

What we need to fix this?

Terminals need to advertise that they are doing unicode conformant grapheme segmentation, such as an added ID to the response of a DA1 VT sequences.

Apps could request that to know how to count (by Codepoint or by grapheme cluster). But that's just one way or a few to at least start tackling this issue

p.s. the unicode spec recommends to render emoji symbols in square form to keep them consistent to where they where coming from (Japanese telcos). That means basically 2 columns in the terminal, regardless of how many Codepoint your emoji codepoint sequencess needed to be complete. But implementing that is certainly not a one-day task, I know that firsthand. Also, it breaks the alacritty mantra of only caring of being the fastest terminal, and such a Feature teures proper text shaping (see: harfbuzz), that is a costly operation. Check "blink's text stack" to read on how web browsers deal with that without losing too much of performance.

kchibisov · 2020-07-15T19:07:26Z

wc(s)width MUST die! If you look at a single codepoints only then you are doomed to not support unicode. Unicode (user perceived) characters can be more than one codepoint. And kitty is doing a decent job at finding out what Codepoint belong to which grid cell.

Yeah, but wcwidth is making so much things simple. If you treat everything like unicode grapheme clusters it'll be a bit of a pain to work tbqh.

The problem is that many apps rely on wcwidth and this API is broken by definition.
Everyone will tell you that you must use wcwidth to not break other apps. And while they are right on this one, they prevent moving on and keeping up with the (unicode) standards. If we all keep insisting on using old broken API's, just to not break other apps, then NOONE will ever move forward. But who will be pushing?

You're free to push and standardize, but the thing is that due to a fact that you work on a gird you'll never have proper unicode support no matter how hard you try, and updating rendering of each app on earth will take years, so yeah, you're free to push and come up with some standards, however there's no real benefit for the majority of users in proper unicode, since you work with ASCII most of the time (you don't program in emojis with skin tons, don't you?).

Also, most folks care about work being done, and not about restarting their terminal due to stupid chars in a prompt, because you went completely out-of-sync.

Terminals need to advertise that they are doing unicode conformant grapheme segmentation, such as an added ID to the response of a DA1 VT sequences.

Yeah, but I'm not sure that there's a thing to do that? Also, what terminals should do when certain application doesn't work on a grapheme cluster level? I'd assume everything will went to hell or you should recompute rereflow your entire grid? I'm not talking about alt-screen applications here fwiw. In alt-screen you don't reflow and rely on applications reflow (at least that's how the majority of apps are doing so), but you should have a switch to do a 'different unicode handling'. Which could be hell of annoying.

Apps could request that to know how to count (by Codepoint or by grapheme cluster). But that's just one way or a few to at least start tackling this issue

Yeah, and you should likely have a different approach to storing text then. In general grid based storage won't work if you want to support toggle between grapheme clusters and normal wcwidth approach. it's possible to do so, just not something that trivial. One more point is that you should change the rendering based off that, but it's not a big deal, I'd assume.

That means basically 2 columns in the terminal, regardless of how many Codepoint your emoji codepoint sequencess needed to be complete.

Yeah, and they define certain characters at 2 width in their table, however certain emojis do even have 1 width, but they are not emojis by the history, it's just some fonts that decided to use make certain glyphs colored.

Also, it breaks the alacritty mantra of only caring of being the fastest terminal, and such a Feature teures proper text shaping (see: harfbuzz), that is a costly operation.

The point is that there's always a way to make things performant, it just requires time and effort to do so.

kchibisov · 2020-07-15T19:13:25Z

wc(s)width MUST die! If you look at a single codepoints only then you are doomed to not support unicode. Unicode (user perceived) characters can be more than one codepoint. And kitty is doing a decent job at finding out what Codepoint belong to which grid cell.

And also, just to clarify. My point is that you should move forward and what kitty is trying to do is likely what everyone can start doing in a future, but don't enable it by default or provide a way to disable that behavior. Having certain things as a 'tech preview' in a codebase isn't something bad. I don't like that though, but in that particular case it's nice for major applications to have an environment to test a migration or addition of a new ways to handle different approach to rendering. If you force that behavior on users everyone will be mad and you just scare folks away because their applications are no longer capable to work with a 'fancy terminal'.

christianparpart · 2020-07-15T19:31:49Z

I didn't mean to push for full unicode supports but I would love to see us at least stop fearing moving away from insisting to stick with wcwidth as you will never be able to move forward with that as legacy in your back.

Moving forward in small steps is good rather forcing users to be Kat at you, that's why I said and still think, it's best if the VTE can advertise that feature that is by default disabled but can be enabled (like you said) like a tech preview, so others can get a feel for it.
Also there is some kind of communication in terminal-wg going on already - just very slowly. :)

chrisduerr · 2020-07-15T20:12:54Z

If you want to support grapheme clusters right now, you shouldn't change the widths of glyphs or you're just causing more trouble than it's worth really. There are some terminals already doing it, I know macOS is way too much into that iirc, but it just makes the entire situation worse and not better.

If you look for ways to detect and standardize, then the feature reporting proposal is probably your best idea. I've actually requested that handling of this is removed from the initial proposal, since it's certainly not an easy topic, but I'm sure if the feature reporting spec is ever supported by terminals, then having proper feature detection for it will be easier at least.

But of course feature detection is not the only problem, since you absolutely want to be backwards compatible here. There's no way all applications will support this and there's also no way all terminals will get patches for this. Having grapheme cluster support also shouldn't be a requirement anyways, since it's such an unnecessarily complicated topic. So you'd probably need something more like a private mode to enable grapheme clustering and feature detection to figure out if it is available.

But even with all that you haven't yet figured out how to synchronize the wcwidth functions between terminal and client, since they might not always update at the same schedule. So now one would have to downgrade to the version of the other to get perfect support, which would likely be a two way communication which is always a terrible idea. You could of course also query the terminal, but that is terribly, terribly slow.

justinlovinger · 2020-10-13T22:48:00Z

Could we treat emoji modifiers like ligatures (once that is supported)? One cell for emoji and a second cell for modifier, but display the modified emoji.

chrisduerr · 2020-10-13T23:13:43Z

No, but depending on the ligature implementation it might still work. But we do not have ligatures, so it's a bit pointless to speculate on that.

foodornt · 2021-05-17T12:09:36Z

Anything in work?

christianparpart · 2021-06-13T22:40:47Z

Anything in work?

Fun times. I have just found this ticket by accident and realized i was actively talking in it already. :-)

Without any ugly ads, @ authors of alacrity: i have actually implemented multi Codepoint grapheme cluster support (that includes all kinds of emoji, such as those requiring seven Codepoints), ligatures, text reflow and Sixel. So all these are points i feel people try to argue that it is not possible. Kitty also quite a bit improved on emoji variation selectors as well as ligatures. Text reflow does not break because of such emoji. My point is, this and just this (eg. i am not talking about bidi or Arab/Hebrew) actually can work in the terminal quite nicely. Sure, zsh doesn't play nice with such ZWJ sequences, but it didn't play nice with that on other terminals either. I

Also, it is not about filling the text screen with colorful emoji. But i think nobody disagrees that small little icons on the left vertical line in Vim indicating various types of diagnostics is certainly helpful and not distracting :-)

P.s. Up until now i had no negative feedback because i do support that (even though it is enabled by default, cannot be disabled and cannot be detected) :)

chrisduerr · 2021-06-13T22:49:46Z

So all these are points i feel people try to argue that it is not possible.

I'm not sure anyone would ever argue they're not possible to implement. It's just that a lot of people aren't interested in them, myself included. Though I do think that joining grapheme clusters together without leaving space is something that applications at this point should not have to handle.

But i think nobody disagrees that small little icons on the left vertical line in Vim indicating various types of diagnostics is certainly helpful and not distracting :-)

I disagree. In fact I have all diagnostics in vim that show in the gutter disabled for exactly this reason.

christianparpart · 2021-06-14T04:53:37Z

So all these are points i feel people try to argue that it is not possible.

I'm not sure anyone would ever argue they're not possible to implement. It's just that a lot of people aren't interested in them, myself included. Though I do think that joining grapheme clusters together without leaving space is something that applications at this point should not have to handle.

Good morning. I know apps like doing these workarounds. And i absolutely agree here.
Apps that would like to fix that could indeed query a, say, DA1 to find out if they don't need to anymore.
But before apps can do so, TEs should provide that. And as long as we are not willing to, apps won't either. :)
It did a chicken and egg problem.

So if I come up with a little formal spec and two terminals willing to support that, would you then do, too? - as it seems we both don't like apps doing so?

But i think nobody disagrees that small little icons on the left vertical line in Vim indicating various types of diagnostics is certainly helpful and not distracting :-)

I disagree. In fact I have all diagnostics in vim that show in the gutter disabled for exactly this reason.

I would like to be interested in what these are. However, i am sure one can get it generally working as of today (it's working in the vim i use and see) but maybe that is even more the reason for my above proposal that i hope you would join with. :)

chrisduerr · 2021-06-14T08:33:06Z

So if I come up with a little formal spec and two terminals willing to support that, would you then do, too? - as it seems we both don't like apps doing so?

As I've said, I have little interest in this feature.

christianparpart · 2021-06-14T08:56:47Z

So if I come up with a little formal spec and two terminals willing to support that, would you then do, too? - as it seems we both don't like apps doing so?

As I've said, I have little interest in this feature.

I respect that. But just for political correctness: because you think it might (actually: will, to some degree) degrade performance, or because you simply don't need that?

chrisduerr · 2021-06-14T08:59:00Z

Both. I don't need it and doing it would not be straight-forward since it's a bit tricky because of performance. Obviously the easier something is to do, the more likely I'd be to look at it despite not needing it myself. But it's not like it would be that difficult to perform some form of grapheme clustering.

Iron-E · 2021-10-23T13:03:25Z

i have actually implemented multi Codepoint grapheme cluster support

@christianparpart would that cover #50 as well? That feature definitely would be a popular use of such an implementation.

christianparpart · 2021-10-27T21:34:08Z

Hey @Iron-E. Sorry for the delayed reply. So in a TE all codepoints come in sequentially, these can be grouped into grapheme clusters and every grapheme cluster in a TE should be put into a single grid cell. And no matter how complex your emoji regardless of how many ZWJ combinators it used, it'll still be a single grapheme cluster and hence land in the same grid cell. So yeah, that does cover the motivation behind #50. :)

EDIT: Ah, sorry, the other way around. I thought #50 was about emoji, sorry. The thing with ligatures is that they are - as far as programming ligatures go at least - NOT a single grapheme cluster. <= e.g. are two clusters, one for < and another one for =, thus, occupying two grid cells. But that does not hold you back in rendering them as a single glyph (!).

What you'd need to do is to adjust the rendering stack to not treat every grid cell individually and render them in a dumb way but rather segment each line by at least word and unicode script (and emoji presentation), so that you can feed such so called runs to your text shaper (e.g. via harfbuzz). What comes out is a sequence of glyph indices into your font you also passed to the harbuzz call (e.g. hb_shape) - now, if that font does contain ligatures, harfbuzz will know about it and yield you one single glyph that is (in the context of programming ligatures) double the width. Since you provided cluster indices to your hb_shape call you are very well aware that after such a programming ligature glyph you need to advance the rendering pen by 2 grid cell widths instead of 1. So sorry, grapheme clustering does not help here in this case, but I gave a good explanation (hopefully!) on how to implement this. I was writing this down in a blog article in much more detail than in here. You may want to have a read on it?

Reference: https://dev.to/christianparpart/look-into-a-terminal-emulator-s-text-stack-3poe

Iron-E · 2021-10-27T21:39:04Z

~~That's awesome to hear! I wonder if Chris would be more inclined to include such a feature knowing it could resolve on of this repo's highly rated issues, though he may have already known about it.~~

~~Thanks for making some headway on #50 either way— that's been one of my most-wanted for a while~~

Edit: saw below. Thanks for the pointers! I'll do some reading.

If it's performance that's the concern, I suppose there isn't that much to be done on changing minds on the matter (since, as you said, any additional processing will make performance worse).

christianparpart · 2021-10-27T21:43:02Z

@Iron-E sorry, i just updated my text. please F5 :-)

christianparpart · 2021-10-27T21:46:57Z

That's awesome to hear! I wonder if Chris would be more inclined to include such a feature knowing it could resolve on of this repo's highly rated issues, though he may have already known about it.

I quite don't think so, as it will in fact drain rendering performance (as far as ligatures and emoji handling goes). He was quite clear on his standpoint enough already. You can however minimize the impact with caching and also always optimize for the standard case.

Thanks for making some headway on #50 either way— that's been one of my most-wanted for a while

ZWJ emoji as well as programming ligatures are no rocket science at all. They do require head-work, yes, but it's not cognitively hard. Multiple well known TEs can render these already. You may want to look over the fence then? :-)

senpai now correctly computes the width of unicode strings, such as sequences of emoji interleaved with Zero-Width-Joiners, and the editor goes from cluster to cluster, instead of codepoint (rune) to codepoint. This breaks users on terminal that do not support grapheme clusters, such as alacritty[0]. See http://unicode.org/reports/tr29/ and https://github.com/rivo/uniseg [0] alacritty/alacritty#3975

kchibisov added P - low S - render enhancement S - font labels Jul 15, 2020

hhirtz mentioned this issue Nov 6, 2021

Support grapheme clusters hhirtz/senpai#6

Draft

4 tasks

EdmundsEcho mentioned this issue Dec 26, 2021

Uncertain if the font prop accepts a list/sequence of font family values; attempt to fix emoji for macos #5715

Closed

akrifari mentioned this issue Dec 28, 2021

Search results overflowing into search input field ibhagwan/fzf-lua#275

Closed

ibhagwan mentioned this issue Dec 28, 2021

Search results overflow into search input field on some terminals when text contains emojis junegunn/fzf#2697

Open

10 tasks

kevenwyld mentioned this issue Aug 4, 2022

FZF list buggy sometimes pystardust/ytfzf#379

Closed

kchibisov mentioned this issue Jul 22, 2023

Incorrect positioning of combining diacritical marks #7103

Closed

chrisduerr mentioned this issue Jul 24, 2023

Not all emojis render correctly #7114

Closed

fee1-dead mentioned this issue Dec 19, 2023

Support for Font Ligatures using harfbuzz #5696

Draft

shreevatsa mentioned this issue Jun 23, 2024

Support for Indic scripts contour-terminal/contour#1533

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support For Emoji Modifiers #3975

Support For Emoji Modifiers #3975

carloabelli commented Jul 15, 2020

kchibisov commented Jul 15, 2020

carloabelli commented Jul 15, 2020 •

edited

Loading

kchibisov commented Jul 15, 2020 •

edited

Loading

christianparpart commented Jul 15, 2020

kchibisov commented Jul 15, 2020 •

edited

Loading

kchibisov commented Jul 15, 2020

christianparpart commented Jul 15, 2020

chrisduerr commented Jul 15, 2020

justinlovinger commented Oct 13, 2020

chrisduerr commented Oct 13, 2020 •

edited

Loading

foodornt commented May 17, 2021

christianparpart commented Jun 13, 2021 •

edited

Loading

chrisduerr commented Jun 13, 2021

christianparpart commented Jun 14, 2021

chrisduerr commented Jun 14, 2021

christianparpart commented Jun 14, 2021

chrisduerr commented Jun 14, 2021

Iron-E commented Oct 23, 2021

christianparpart commented Oct 27, 2021 •

edited

Loading

Iron-E commented Oct 27, 2021 •

edited

Loading

christianparpart commented Oct 27, 2021

christianparpart commented Oct 27, 2021

Support For Emoji Modifiers #3975

Support For Emoji Modifiers #3975

Comments

carloabelli commented Jul 15, 2020

System

kchibisov commented Jul 15, 2020

carloabelli commented Jul 15, 2020 • edited Loading

kchibisov commented Jul 15, 2020 • edited Loading

christianparpart commented Jul 15, 2020

kchibisov commented Jul 15, 2020 • edited Loading

kchibisov commented Jul 15, 2020

christianparpart commented Jul 15, 2020

chrisduerr commented Jul 15, 2020

justinlovinger commented Oct 13, 2020

chrisduerr commented Oct 13, 2020 • edited Loading

foodornt commented May 17, 2021

christianparpart commented Jun 13, 2021 • edited Loading

chrisduerr commented Jun 13, 2021

christianparpart commented Jun 14, 2021

chrisduerr commented Jun 14, 2021

christianparpart commented Jun 14, 2021

chrisduerr commented Jun 14, 2021

Iron-E commented Oct 23, 2021

christianparpart commented Oct 27, 2021 • edited Loading

Iron-E commented Oct 27, 2021 • edited Loading

christianparpart commented Oct 27, 2021

christianparpart commented Oct 27, 2021

carloabelli commented Jul 15, 2020 •

edited

Loading

kchibisov commented Jul 15, 2020 •

edited

Loading

kchibisov commented Jul 15, 2020 •

edited

Loading

chrisduerr commented Oct 13, 2020 •

edited

Loading

christianparpart commented Jun 13, 2021 •

edited

Loading

christianparpart commented Oct 27, 2021 •

edited

Loading

Iron-E commented Oct 27, 2021 •

edited

Loading