No description, website, or topics provided.
Python
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
brat_format
README.rst
sentences_with_tags.py
split_train_test.py
test.txt
train.txt

README.rst

Movie reviews, annotated for emotion classification

This repository holds annotated training and test data used in the experiments reported on in

L. Buitinck, J. van Amerongen, E. Tan and M. de Rijke (2015). Multi-emotion detection in user-generated reviews. Proc. 37th European Conference on Information Retrieval (ECIR).

The data consists of amateur movie reviews, scraped from IMDB and annotated for the project Searching Public Discourse of the University of Amsterdam (UvA) and the Netherlands eScience Center (NLeSC). We (the authors, UvA and NLeSC) do not hold the copyrights to the original texts. You may use this data for academic/research purposes.

Format

The revies are in the directory brat_format/. The original text is in the *.txt files. Annotations are in the corresponding .ann files, in the offset format produced by Brat.

Preprocessed versions of the data, split for training and testing a classifier, can be found in the files train.txt and test.txt. These contain one sentence per line, with labels at the end of each line. A single space separates the labels from the text. Multiple labels are separated by underscores. Where a sentence received no label, the string None appears. (No label means no emotions assigned by the annotator; all sentences have been annotated.)

The files were converted to the final format by:

python $BRAT/tools/sentencesplit.py < ${review}.txt > ${review}.sentences
python sentences_with_tags.py ${review}.sentences > ${review}.senttag

(where $BRAT is the Brat source directory).

Training/test set splitting is done by split_train_test.py.

Citing

If you use this data for your own experiments, please cite us:

@inproceedings{buitinck2015multi,
  author = {Lars Buitinck and Jesse van Amerongen and Ed Tan
            and Maarten de Rijke},
  title = {Multi-emotion detection in user-generated reviews},
  booktitle = {Proc. European Conference on Information Retrieval (ECIR)},
  year = 2015
}