Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize Emoji variant selectors in identifiers? #31588

Open
Keno opened this issue Apr 2, 2019 · 8 comments
Open

Normalize Emoji variant selectors in identifiers? #31588

Keno opened this issue Apr 2, 2019 · 8 comments
Labels
domain:unicode Related to unicode characters and encodings

Comments

@Keno
Copy link
Member

Keno commented Apr 2, 2019

Today I came across the existence of emoji variant selectors (basically what happened was that the unicode standard already had a bunch of symbols that were a little like emoji but not colorful, so they added a combining character to make those colorful and similarly to make emoji non-colorful). At the moment, we consider these significant in identifiers, so we allow things like:
Screen Shot 2019-04-02 at 5 03 27 PM

We should make a decision on whether we want to normalize out this distinction in our identifier normalization.

@Keno
Copy link
Member Author

Keno commented Apr 3, 2019

A related issue: Should we change the \:emoji: completions to include the emoji variant selectors? E.g. \:phone: completes to the non-emoji variant.

@JeffBezanson JeffBezanson added the domain:unicode Related to unicode characters and encodings label Apr 3, 2019
@stevengj
Copy link
Member

stevengj commented Apr 3, 2019

I feel like emoji normalization is something we should leave to the Unicode consortium — it seems like they should really fix this in NFC, and it's not worth the effort for us to use a custom normalization here.

For tab completion we can do whatever we want, of course.

@StefanKarpinski
Copy link
Sponsor Member

I agree. The main concern for identifier normalization is when two different identifiers are both easy to input and hard to distinguish. That doesn’t seem to be the case here. If the Unicode consortium decides to normalize these then we can follow suit.

@Keno
Copy link
Member Author

Keno commented Apr 3, 2019

That doesn’t seem to be the case here.

Isn't it? iTerm2 has decent unicode support, but half the other software I tried (including all editors we support) either render them the same or render one or the other as a replacement character. As for inputting them, if I google "telephone emoji" I get to https://emojipedia.org/black-telephone/ and if I copy that, I get the emoji variant, which is different from what you get by doing \:phone:<tab>. Worse, if I accidentally backspace on one of the emoji variant identifiers in sublime, I get the non-emoji variant (which may or may not be rendered the same).

@StefanKarpinski
Copy link
Sponsor Member

The most conservative option would be to reject modified emoji altogether.

@Keno
Copy link
Member Author

Keno commented Apr 3, 2019

True, but for a number of emoji (☎️ being an example), the rendering that people identify with the emoji is the one that has the variant selector.

@stevengj
Copy link
Member

stevengj commented Apr 3, 2019

These are hardly the confusable characters that I would worry about most (as opposed to, say, Α vs. A), given that the emoji tab completion was added as an April Fool's joke (#10709) and emoji variables are mostly a party trick in Julia rather than a practical programming style. Let the Unicode Consortium Emoji Emporium worry about this.

xkcd comic

@StefanKarpinski
Copy link
Sponsor Member

Emoji Emporium

😂 too true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:unicode Related to unicode characters and encodings
Projects
None yet
Development

No branches or pull requests

4 participants