New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Taking a crack at issue #14 #17
Conversation
""" | ||
|
||
# via Martijn Pieters, http://stackoverflow.com/questions/26568722/remove-unicode-emoji-using-re-in-python | ||
if PY2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BrendanAdkins The u"string"
syntax is supported by both Python 2 and 3 so you can drop the check here and the else
below.
@BrendanAdkins 👍 Looks good so far! I like the @carpedm20 Any comments? How do you feel about the |
@geowurster Good call on the unit tests! You'll be glad to know they fail spectacularly on adjacent emoji. This is going to be a hex math thing, isn't it? I too could use some help with the regex here. I have to say I don't grasp the rationale for adding aliases to the demojize function--I can't see a use case for it, and it seems like it would be violating least-astonishment just for the sake of symmetry. |
Looks like the regex doesn't include some character ranges that are treated as emoji but not technically defined as emoji--working on tracking those down. |
@BrendanAdkins Good point about supporting aliases. The user gets an arbitrary emoji back so there really isn't a need to support the aliases. Unfortunately I can't offer any help with the adjacent emoji issue, but I'll ask some of my co-workers. |
These flag emoji are going to be the death of me. |
@BrendanAdkins Thanks for doing the initial R&D on this! |
I'm pretty new to Python, but this seems to work. Having trouble safely ensuring that a given string sent to demojize is interpreted as Unicode, so right now I've just added that as a requirement going in.