The Uyghur UD treebank is based on the Uyghur Dependency Treebank (UDT), created at the Xinjiang University in Ürümqi, China.


The sentences come from literature texts / reading material for primary and middle school, including stories, records and reports.


  • 2018-04-15 v2.2
    • Repository renamed from UD_Uyghur to UD_Uyghur-UDT.
    • Added new manually checked data (Marhaba Eli); dev cut at 900 sentences, additional 1656 sentences go to train.
    • Added morphological analysis from Apertium (Fran Tyers and Dan Zeman); OOV = 26%.
  • 2017-03-01 v2.0
    • Converted to UD v2 guidelines (Dan Zeman).
    • Added new manually checked data (Marhaba Eli).
    • Re-split to achieve 10K test tokens (first 900 sentences); rest is dev, no train now.
  • 2016-11-01 v1.4
    • Initial release.
=== Machine-readable metadata (DO NOT REMOVE!) ================================
Data available since: UD v1.4
License: CC BY-SA 4.0
Includes text: yes
Genre: fiction
Lemmas: automatic
UPOS: manual native
XPOS: automatic with corrections
Features: automatic
Relations: manual native
Contributors: Eli, Marhaba; Zeman, Daniel; Tyers, Francis
Contributing: elsewhere
