Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support emoji in tweets #43

Open
wants to merge 2 commits into
base: master
from

Conversation

@ndw
Copy link

commented Aug 31, 2019

Emoji are common in tweets and the current version of GetOldTweets3 discards them. This patch (which I confess is a tiny bit crude) finds the emoji images in each tweet and replaces them with the corresponding Unicode character.

@ndw

This comment has been minimized.

Copy link
Author

commented Aug 31, 2019

Hang on. This PR doesn't deal with multi-character emoji (yet).

@ndw

This comment has been minimized.

Copy link
Author

commented Sep 8, 2019

Ok, emoji turn out to be tricky. They're encoded as long, Unicode ligatures. That's fine for some applications, but perhaps not all. In the end, I added an --emoji option to the command line.

  • ignore, the default, discards them as the current version does
  • unicode, replaces them with the Unicode encoding
  • named, replaces them with "Emoji[Name of emoji]".

Along the way, I also added code to replace links with their original text. Twitter runs all links and images through their own resolver; I don't think that's either useful or interesting in general.

Hope this is of interest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant
You can’t perform that action at this time.