The Genoese Ligurian Treebank is a small, manually annotated collection of contemporary Ligurian prose. The focus of the treebank is written Genoese, the koiné variety of Ligurian which is associated with today's literary, journalistic and academic ligurophone sphere.
This dataset represents the first dependency treebank of Ligurian ever collected. The materials included span several genres, and have been extracted from the most varied sources in order to reflect variation in syntax and register.
The largest source of material is the fiction domain, represented by excerpts from three novels by contemporary authors or translators. We also include a news article, a current affairs article, a passage from the Bible, two entries from the Ligurian Wikipedia, a number of example sentences from a grammar book, and the transcript of a short radio broadcast. All these documents make up the test split of the dataset. The training split consists of translations of the sentences from the Cairo CICLing Corpus.
We are deeply grateful to the publisher De Ferrari Editore and the editor of the newspaper O Stafî for allowing the computational use of some of their written materials for this treebank.
- (citation)
- 2021-05-15 v2.8
- Initial release in Universal Dependencies.
=== Machine-readable metadata (DO NOT REMOVE!) ================================ Data available since: UD v2.9 License: C-UDA 1.0 Includes text: yes Genre: nonfiction fiction news wiki bible spoken grammar-examples Lemmas: manual native UPOS: manual native XPOS: not available Features: manual native Relations: manual native Contributors: Lusito, Stefano; Maillard, Jean Contributing: elsewhere Contact: stefano.lusito@uibk.ac.at ===============================================================================