Skip to content

Latest commit

 

History

History
58 lines (46 loc) · 1.79 KB

README.md

File metadata and controls

58 lines (46 loc) · 1.79 KB

Summary

UD Welsh-CCG (Corpws Cystrawennol y Gymraeg) is a treebank of Welsh, annotated according to the Universal Dependencies guidelines.

Introduction

The main part of the annotated sentences come from the Welsh Wikipedia. Some sentences have been taken from the Corpus of the Welsh Assembly, from websites of Welsh speaking organisations (Cymdeithas yr Iaith Gymraeg, University of Wales), News (y Golwg, local Welsh language newspapers, BBC Cymru) and Welsh language blogs. A few example sentences are taken from Welsh Grammars (Gramaded Cymraeg Cyfoes: Gareth King, Modern Welsh).

Acknowledgements

If you use this treebank in your work, please cite:

@inproceedings{heinecke2019,
  author = {Heinecke, Johannes and Tyers, Francis M.},
  title = {{Development of a Universal Dependencies treebank for Welsh}},
  year = {2019},
  booktitle = {{Proceedings of the Celtic Language Technology Workshop}},
  publisher = {European Association for Machine Translation},
  address = {Dublin},
  pages = {21--31},
  url = {https://www.aclweb.org/anthology/W19-6904},
}

Changelog

2019-05-15 v2.4

  • initial version 2019-05-30
  • mutations corrected (or feature Mutation=) 2019-09-15
  • lemma for conjunction "ac" normalised to "a", number + "o" + noun corrected
=== Machine-readable metadata (DO NOT REMOVE!) ================================
Data available since: UD v2.4
License: CC BY-SA 4.0
Includes text: yes
Genre: grammar-examples wiki nonfiction fiction news
Lemmas: converted from manual
UPOS: converted from manual
XPOS: manual native
Features: converted from manual
Relations: manual native
Contributors: Heinecke, Johannes; Tyers, Francis; 
Contributing: elsewhere
Contact: johannes.heinecke@orange.com
===============================================================================