A dataset of learner sentences with ordinal labels for grammaticality. More detail about the creation of the data set can be found in Heilman et al. (2014).
Annotation File Format
gug_annotations.tsv file contains the annotated dataset. There are 6 columns:
- Id: the id for the annotated sentence
- Sentence: the sentence that was annotated
- Expert Judgement: the annotation assigned by the expert judge
- Crowd Flower Judgements: the 5 annotations assigned through crowd-sourcing
- Average: the average of all 6 judgements
- Dataset: whether the sentence and its annotation were part of the train, dev or test set. The sentences annotated as "Other" by our expert were not submitted for crowd-sourcing.
gug_instructions_with_sample_survey.pdf file contains the full set of instructions given to annotators, as well as an example annotation task. The annotation scheme contains 5 categories:
- Somewhat Comprehensible
- O. Other/Incomplete
The following paper should be cited in any publications that use this dataset:
Michael Heilman, Aoife Cahill, Nitin Madnani, Melissa Lopez, Matthew Mulholland and Joel Tetreault (2014) Predicting Grammaticality on an Ordinal Scale (2014) In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics Baltimore, MD.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.