A set of scripts to capture geocoded tweets and order them by the emojis they use.
Start by running
harvest.py - it will listen to the twitter
firehose 1% stream for geocoded tweets and dump them into
.twitter_cache file. Chances are that by the time you
read this, twitter has stopped supporting this API and if not
you need to run this script for quite a while. A week would
be a good start.
Next up, run
summarize.py to get produce the
file, which has one tweet per line. This makes for slightly
split_by_emoji.py to extract a huge json document
split_emojis.json which has for each emoji a list of
coordinates and timestamps - the coordinates are on a 3600x1800
grid while the timestamps are in