Phase 1 Development Corpus
This repository contains the development corpus for SANTA, the shared task for systematic analysis of narrative texts through annotation. Details about this project can be found here.
The corpus has been compiled to cover as much relevant phenomena as possible. It is heterogeneous with respect to genre, publication date and text length. Still, representativity (whatever that means for literature) was not a guiding principle. All texts are available in English and German. Some texts are translations from a third language.
The maximal length of the texts in this corpus is 2000 words. Since this limitation entails a bias with respect to the use of narrative levels, we also have included longer texts, which we make available in a shortened version. For the latter we removed passages that do not affect the overall narrative level structure in a substantial manner.