Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.

README.md

Authors:

adrian dot benton at gmail dot com

mpaul39 at gmail dot com

hancock dot braden gmail dot com

mark at dredze dot com

This distribution contains tweet IDs for each dataset reported in:

Adrian Benton, Michael J. Paul, Braden Hancock, Mark Dredze.
Collective Supervision of Topic Models for Predicting Surveys with Social Media.
Thirtieth AAAI Conference on Artificial Intelligence, 2016.

as well as predictions of support for universal background checks as displayed in Figure 4. Mapping from file to dataset described in the paper:

  • input.guncontrol.allfeatures.onlyids.txt.gz: Guns
  • input.tobacco.allfeatures.onlyids.txt.gz: Smoking
  • input.vaccine.allfeatures.onlyids.txt.gz: Vaccines

Each file is tab-separated with the following columns:

  • tweet ID
  • hashtag-based PRO/ANTI-issue score (not used in the paper)
  • state-level survey score
  • county-level census score

All of these scores are z-score normalized. Refer to the paper for semantics of each form of supervision.

Predictions of proportion supporting universal background checks per state are in ubc_regression_predictions.txt. These are the values used to generate Figure 4.

If you would like access to the text associated with each of these tweet IDs, please email adrian dot benton at gmail dot com

Due to the Twitter terms of service, we can only make the text of 50K tweets available per day (and cannot have you clone the entire repository). If you use these data, please cite:

Adrian Benton, Michael J. Paul, Braden Hancock, Mark Dredze.
Collective Supervision of Topic Models for Predicting Surveys with Social Media.
Thirtieth AAAI Conference on Artificial Intelligence, 2016.

About

Data used in "Collective Supervision of Topic Models for Predicting Surveys with Social Media". AAAI-16

Resources

Releases

No releases published

Packages

No packages published
You can’t perform that action at this time.