The Greek UD treebank is derived from the Greek Dependency Treebank (http://gdt.ilsp.gr), a resource developed and maintained by researchers at the Institute for Language and Speech Processing/Athena R.C. (http://www.ilsp.gr).
The Greek UD treebank consists of 2,521 sentences (61,673 tokens). The data in the current release derive from primary texts that are in the public domain, including wikinews articles and european parliament sessions. The treebank is licensed under the terms of Creative Commons Attribution-NonCommercial-ShareAlike, CC BY-NC-SA 3.0.
The morphological and syntactic annotation of the Greek UD treebank was originally created through a semi-automatic conversion of PDT-style annotations in GDT data. The syntactic annotation of the 2.1 release was generated by manual corrections of several constructions of the UD annotation, which is now the only manual syntactic annotation used for new data added to the resource. The harmonization with UD v2 is work in progress.
We wish to thank all contributors to the original annotation efforts. A large part of those annotations was work by students of the postgraduate programme Technoglossia IV, organised by the Institute for Language and Speech Processing, the University of Athens and the National Technical University of Athens.
Prokopis Prokopidis and Haris Papageorgiou. Universal Dependencies for Greek. In Proceedings of the NoDaLiDa 2017 Workshop on Universal Dependencies (UDW 2017), pages 102-106, Gothenburg, Sweden, May 2017.
Prokopis Prokopidis, Elina Desypri, Maria Koutsombogera, Haris Papageorgiou, and Stelios Piperidis. Theoretical and Practical Issues in the Construction of a Greek Dependency Treebank. In Montserrat Civit, Sandra Kubler, and Ma. Antonia Marti, editors, Proceedings of The Fourth Workshop on Treebanks and Linguistic Theories (TLT 2005), pages 149-160, Barcelona, Spain, December 2005. Universitat de Barcelona.
- Train/dev/test sets: 1662/403/456 sentences, 41212/10139/10422 tokens
- Repository renamed from UD_Greek to UD_Greek-GDT.
- Fixed issues concerning specific constructions (most of them related to
flatin multi-token names and to
parataxisin reported speech)
- Fixed issues concerning specific constructions (most of them related to deprels of pronouns and to oblique dependents in constructions with copulas)
- Semi-automatic conversion to UD v2.0
- New data split into train/dev/test sets
- Fixed issues in conversion of POS for articles
- Added PronType and NumType attribute/value pairs to certain types of determiners and numerals
- Improved conversion of abbreviations
- Fixed issues in conversion of determiner pos and adjective degree
- Initial release of automatic conversion to UD