Permalink
Switch branches/tags
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
44 lines (33 sloc) 1.28 KB

Summary

The Uyghur UD treebank is based on the Uyghur Dependency Treebank (UDT), created at the Xinjiang University in Ürümqi, China.

Introduction

The sentences come from literature texts / reading material for primary and middle school, including stories, records and reports.

Changelog

  • 2018-04-15 v2.2
    • Repository renamed from UD_Uyghur to UD_Uyghur-UDT.
    • Added new manually checked data (Marhaba Eli); dev cut at 900 sentences, additional 1656 sentences go to train.
    • Added morphological analysis from Apertium (Fran Tyers and Dan Zeman); OOV = 26%.
  • 2017-03-01 v2.0
    • Converted to UD v2 guidelines (Dan Zeman).
    • Added new manually checked data (Marhaba Eli).
    • Re-split to achieve 10K test tokens (first 900 sentences); rest is dev, no train now.
  • 2016-11-01 v1.4
    • Initial release.
=== Machine-readable metadata (DO NOT REMOVE!) ================================
Data available since: UD v1.4
License: CC BY-SA 4.0
Includes text: yes
Genre: fiction
Lemmas: automatic
UPOS: manual native
XPOS: automatic with corrections
Features: automatic
Relations: manual native
Contributors: Eli, Marhaba; Zeman, Daniel; Tyers, Francis
Contributing: elsewhere
Contact: marhaba@xju.edu.cn
===============================================================================