The Uyghur UD treebank is based on the Uyghur Dependency Treebank (UDT), created at the Xinjiang University in Ürümqi, China.
The sentences come from literature texts / reading material for primary and middle school, including stories, records and reports.
- 2018-04-15 v2.2
- Repository renamed from UD_Uyghur to UD_Uyghur-UDT.
- Added new manually checked data (Marhaba Eli); dev cut at 900 sentences, additional 1656 sentences go to train.
- Added morphological analysis from Apertium (Fran Tyers and Dan Zeman); OOV = 26%.
- 2017-03-01 v2.0
- Converted to UD v2 guidelines (Dan Zeman).
- Added new manually checked data (Marhaba Eli).
- Re-split to achieve 10K test tokens (first 900 sentences); rest is dev, no train now.
- 2016-11-01 v1.4
- Initial release.
=== Machine-readable metadata (DO NOT REMOVE!) ================================ Data available since: UD v1.4 License: CC BY-SA 4.0 Includes text: yes Genre: fiction Lemmas: automatic UPOS: manual native XPOS: automatic with corrections Features: automatic Relations: manual native Contributors: Eli, Marhaba; Zeman, Daniel; Tyers, Francis Contributing: elsewhere Contact: email@example.com ===============================================================================