A set of scripts to capture geocoded tweets and order them by the emojis they use.
Start by running harvest.py
- it will listen to the twitter
firehose 1% stream for geocoded tweets and dump them into
a .twitter_cache
file. Chances are that by the time you
read this, twitter has stopped supporting this API and if not
you need to run this script for quite a while. A week would
be a good start.
Next up, run summarize.py
to get produce the tweets.txt
file, which has one tweet per line. This makes for slightly
faster processing.
Then run split_by_emoji.py
to extract a huge json document
called split_emojis.json
which has for each emoji a list of
coordinates and timestamps - the coordinates are on a 3600x1800
grid while the timestamps are in