Download scripts for distributing twitter data.
Python
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
LICENSE
README.md
download_tweets.py
download_tweets_api.py
testIndices.py

README.md

Semeval Twitter data download script

For downloading tweets distributed using IDs to protect privacy. Uses the format of the Semeval Twitter sentiment analysis dataset

Prerequisites:

sixohsix/twitter tqdm/tqdm

easy_install twitter
easy_install tqdm

Usage:

The first time you run this, it should open up a web browser, have you log into twitter, and show a PIN number for you to enter into a prompt generated by the script.

  1. Login to Twitter with your user name in your default browser.
  2. Run the script like this to download your credentials: python download_tweets_api.py --dist=tweeti-a.dist.tsv
  3. Download tweets like so:
python download_tweets_api.py --dist=tweeti-a.dist.tsv --output=downloaded.tsv

-Note that it takes about 18 hours to download the Semeval sentiment analysis training dataset.

Restarting after a partial download:

In case the script hangs in the middle of the download for whatever reason, use the --partial argument to specify the file containing partially downloaded results.
This way you won't have to start from scratch again:

python download_tweets_api.py --dist=tweeti-a.dist.tsv --partial=downloaded.tsv --output=downloaded2.tsv

Task A Mention Test Script

To print out the mentions and annotations from task A you can use the testIndices.py script like so:

python testIndices.py downloaded.tsv

This just prints out the mentions with sentiment annotations for easier inspection.

Notes:

  • You may need to manually change the link that is printed out for authorization to use https:// instead of http://
  • The time on your computer needs to be set accurately. Thanks to Canberk for noting this on the email list.