Semeval Twitter data download script
For downloading tweets distributed using IDs to protect privacy. Uses the format of the Semeval Twitter sentiment analysis dataset
easy_install twitter easy_install tqdm
The first time you run this, it should open up a web browser, have you log into twitter, and show a PIN number for you to enter into a prompt generated by the script.
- Login to Twitter with your user name in your default browser.
- Run the script like this to download your credentials:
python download_tweets_api.py --dist=tweeti-a.dist.tsv
- Download tweets like so:
python download_tweets_api.py --dist=tweeti-a.dist.tsv --output=downloaded.tsv
-Note that it takes about 18 hours to download the Semeval sentiment analysis training dataset.
Restarting after a partial download:
In case the script hangs in the middle of the download for whatever reason, use the --partial argument to specify the file containing partially downloaded results.
This way you won't have to start from scratch again:
python download_tweets_api.py --dist=tweeti-a.dist.tsv --partial=downloaded.tsv --output=downloaded2.tsv
Task A Mention Test Script
To print out the mentions and annotations from task A you can use the
testIndices.py script like so:
python testIndices.py downloaded.tsv
This just prints out the mentions with sentiment annotations for easier inspection.
- You may need to manually change the link that is printed out for authorization to use https:// instead of http://
- The time on your computer needs to be set accurately. Thanks to Canberk for noting this on the email list.