New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Turkish Language #231
Comments
I will check it out. For these cases like 🏴☠️ - the same emoji with different codepoints - we can probably use the translation of the other emoji easily. Edit: Regarding the emoji that only differ in the suffix In general the Turkish Unicode translation data may just be incomplete (I have not checked). I noticed that the Unicode translations lag behind the emoji releases for many languages. Currently the untranslated emoji are skipped: Lines 200 to 201 in b27cf78
But that could be changed. Or replace_emoji() could be used instead of demojize() and then you could fallback to English name (or to a custom translation) if there is no translation in the EMOJI_DATA.
|
@AliNajafi1998 Could you upload the scraped data from Emojiterra (or the merged DB)? I would like to see what is still missing. |
A lot of the emoji that are not translated are the ones with components, that is skin color and hair color. For example: import emoji
# make a list of all the components like 'dark_skin_tone' or 'red_hair'
all_components = []
for emj in emoji.EMOJI_DATA:
if emoji.EMOJI_DATA[emj]["status"] == emoji.STATUS["component"]:
all_components.append(emoji.EMOJI_DATA[emj]["en"][1:-1])
all_components = sorted(all_components, key=len, reverse=True)
def repl_fct(emj, emj_data):
if "tr" in emj_data:
return emj_data["tr"]
# remove the components from the name
# e.g. :person_medium-light_skin_tone_red_hair: - > :person:
name = emj_data["en"][1:-1]
for component in all_components:
name = name.replace(component, "")
name = f":{name.strip('_')}:"
if name != emj_data["en"][1:-1]:
# Check if the name without components has a translation
for emj in emoji.EMOJI_DATA:
if emoji.EMOJI_DATA[emj]["en"] == name and "tr" in emoji.EMOJI_DATA[emj]:
return emoji.EMOJI_DATA[emj]["tr"]
# Return English name as last resort
return emj_data["en"]
text = """
Dark skin climber :man_climbing_dark_skin_tone:
Ginger person :person_medium-light_skin_tone_red_hair:
"""
print(text)
text = emoji.emojize(text)
print(text)
decoded = emoji.replace_emoji(text, repl_fct)
print(decoded)
|
@cvzi
You can download the scraped data from here: emojitera.json |
Good Idea, but for |
Hi @cvzi,
I am trying to demojize the emojis for the Turkish language and based on your doc I added the Turkish language and I am using it; however, some emojis are missing.
For example,
u'\U0001F3F4\U0000200D\U00002620\U0000FE0F'
-> 🏴☠️there is no Turkish equivalent text for that.
But there is tts for
u'\U0001F3F4\U0000200D\U00002620'
-> 🏴☠I am actually working with Twitter data so I need to demojize the tweets.
I want to know what I can do regarding this problem. Need to mention that, I also scraped the emojis from Emojiterra website to merge the DB, and it resolved some issues but still some of them are missing.
All I want is demojizing the Twitter-supported emojis.
Best!
The text was updated successfully, but these errors were encountered: