Skip to content
Go to file
Cannot retrieve contributors at this time
43 lines (33 sloc) 1.28 KB


The Uyghur UD treebank is based on the Uyghur Dependency Treebank (UDT), created at the Xinjiang University in Ürümqi, China.


The sentences come from literature texts / reading material for primary and middle school, including stories, records and reports.


  • 2018-04-15 v2.2
    • Repository renamed from UD_Uyghur to UD_Uyghur-UDT.
    • Added new manually checked data (Marhaba Eli); dev cut at 900 sentences, additional 1656 sentences go to train.
    • Added morphological analysis from Apertium (Fran Tyers and Dan Zeman); OOV = 26%.
  • 2017-03-01 v2.0
    • Converted to UD v2 guidelines (Dan Zeman).
    • Added new manually checked data (Marhaba Eli).
    • Re-split to achieve 10K test tokens (first 900 sentences); rest is dev, no train now.
  • 2016-11-01 v1.4
    • Initial release.
=== Machine-readable metadata (DO NOT REMOVE!) ================================
Data available since: UD v1.4
License: CC BY-SA 4.0
Includes text: yes
Genre: fiction
Lemmas: automatic
UPOS: manual native
XPOS: automatic with corrections
Features: automatic
Relations: manual native
Contributors: Eli, Marhaba; Zeman, Daniel; Tyers, Francis
Contributing: elsewhere
You can’t perform that action at this time.