Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hexcode compatible with OpenMoji #190

Closed
lucianmarin opened this issue Oct 9, 2021 · 3 comments
Closed

Hexcode compatible with OpenMoji #190

lucianmarin opened this issue Oct 9, 2021 · 3 comments

Comments

@lucianmarin
Copy link

lucianmarin commented Oct 9, 2021

def hexcode(emoji):
  codes = [hex(ord(e))[2:].upper() for e in emoji]
  return "-".join(codes)

Can we add hexcode to emoji.EMOJI_DATA?

https://github.com/hfg-gmuend/openmoji are indentied by their hexcode.

I use the emoji package in production for Subreply. I intend to add OpenMoji as soon as they are production ready.

@cvzi
Copy link
Contributor

cvzi commented Oct 13, 2021

Does that mean you would like to replace a :emoji: with something like <img src="openmoji/HEXCODE.png">?

I wonder if we should hardcode the hex codes in the EMOJI_DATA dict. They are very easy to generate on runtime with your function, so maybe generating them on runtime makes more sense.
I see they have a JSON file with all their emoji at https://github.com/hfg-gmuend/openmoji/blob/master/data/openmoji.json
The question is how similar are our emoji to the OpenMoji data. Will the hexcode() function work for every emoji or are there some emoji that need adjustment or need to be matched by hand. Especially emoji that contain invisible characters or modifiers like skin color could be a problem.

@lucianmarin
Copy link
Author

That's what I mean. emoji.hexcode(string) is a better implementation indeed. A test can be run on openmoji.json.

@cvzi
Copy link
Contributor

cvzi commented Oct 15, 2021

I created a script to test it:
https://replit.com/@cuzi/emoji-to-Openmoji#main.py

main.py
print("############## main.py ###############")
import emoji
import requests
import html

def hexcode(emoji):
    #  rjust(4, '0') is necessary to convert "2A" to "002A"
    codes = [hex(ord(e))[2:].upper().rjust(4, '0') for e in emoji]
    return "-".join(codes)

# Try to match all emoji from EMOJI_DATA to Openmoji:
openmoji = requests.get("https://github.com/hfg-gmuend/openmoji/raw/master/data/openmoji.json").json()
hexToOpenmoji = {value["hexcode"]: value for value in openmoji}
emojiToOpenmoji = {}
print("Following emoji couldn't be found in Openmoji:")
for emj in emoji.EMOJI_DATA:
    found = False
    if hexcode(emj) in hexToOpenmoji:
        emojiToOpenmoji[emj] = hexToOpenmoji[hexcode(emj)]
        found = True
    elif emj[-1] == '\ufe0f':
         # Remove the emoji variant u+fe0f and try again
        emj_no_variant = emj[0:-1]
        if hexcode(emj_no_variant) in hexToOpenmoji:
            emojiToOpenmoji[emj] = hexToOpenmoji[hexcode(emj_no_variant)]
            found = True
    else:
         # Append the emoji variant u+fe0f and try again
        emj_emoji_variant = emj + '\ufe0f'
        if hexcode(emj_emoji_variant) in hexToOpenmoji:
            emojiToOpenmoji[emj] = hexToOpenmoji[hexcode(emj_emoji_variant)]
            found = True

    if not found and emoji.EMOJI_DATA[emj]['status'] == emoji.STATUS['fully_qualified']:
        print(f"E{emoji.EMOJI_DATA[emj]['E']} {emoji.EMOJI_DATA[emj]['en']} {hexcode(emj)} {emj}")

print("###########################")

def replace_fct(emj, emj_data):
    if emj in emojiToOpenmoji:
        alt = html.escape(emj)
        title = html.escape(emj_data['en'])
        src = html.escape(emojiToOpenmoji[emj]["hexcode"]) + ".svg"
        return f'<img src="{src}" alt="{alt}" title="{title}">'
    else:
        return "Unsupported emoji"

print(emoji.emojize("a lion in html: :lion:", version=-1, handle_version=replace_fct))

For some emoji it is necessary to remove the variant indicator U+FE0F or add it to find the emoji.
With that modification it can match all emoji that are fully-qualified by Unicode except for the newest emojis.
The script lists all emoji it cannot match and they are all part of Unicode 14.0/E14 wich Openmoji doesn't include yet (hfg-gmuend/openmoji#344)
So generating it on runtime is definitely an option instead of hard-coding it.

@cvzi cvzi mentioned this issue Feb 19, 2022
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants