Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FR: use CLDR Character Annotation “Keywords” too when searching characters/emoji #11

Closed
blueset opened this issue Dec 18, 2019 · 2 comments

Comments

@blueset
Copy link

blueset commented Dec 18, 2019

Unicode CLDR Character Annotation has provided a list of keywords for some characters (especially emoji) that is to enhance the search experience of them.

The remaining phrases are keywords (labels), separated by “|”. The keywords plus the words in the short name are typically used for search and predictive typing.
— CLDR Character Annotations description

I would like to suggest to include these keywords too when searching for both Unicode characters and emojis.

A List of these annotations can be found here:
https://www.unicode.org/cldr/charts/36/annotations/romance.html

Computer-friendly character annotation data in XML for each language can be found here: https://github.com/unicode-org/cldr/tree/master/common/annotations

@arp242
Copy link
Owner

arp242 commented Dec 18, 2019

Yeah it's in the TODO: https://github.com/arp242/uni/blob/master/TODO#L5

It's not so easy as "just include CLDR data", since a lot of it is kinda junky IMHO. Many of the basic smileys contain keywords such as "mouth", "eye", etc. so the list needs some filtering. Maybe there's a better list of keywords somewhere; I don't know what GitHub uses for their :emoji-style emojis (pretty sure I saw a list for that somewhere at some point).

I probably won't work on this any time soon, but will happily review and merge PRs if anyone contributes.

@arp242 arp242 mentioned this issue Jan 11, 2020
@arp242 arp242 closed this as completed in f802368 Jan 2, 2021
@arp242
Copy link
Owner

arp242 commented Jan 2, 2021

It now includes the CLDR data in the default output, which duplicate words omitted (i.e. no point in adding "face" if the emoji's name is already "grinning face", but you can add %(cldr_full) if you want it anyway. This is also searched by default now with uni e smile; use uni e name:smile to search in the name specifically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants