Uyghur data.
Switch branches/tags
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
not-to-release
.gitignore
CONTRIBUTING.md
LICENSE.txt
README.md
eval.log
stats.xml
ug_udt-ud-dev.conllu
ug_udt-ud-test.conllu
ug_udt-ud-train.conllu

README.md

Summary

The Uyghur UD treebank is based on the Uyghur Dependency Treebank (UDT), created at the Xinjiang University in Ürümqi, China.

Introduction

The sentences come from literature texts / reading material for primary and middle school, including stories, records and reports.

Changelog

  • 2018-04-15 v2.2
    • Repository renamed from UD_Uyghur to UD_Uyghur-UDT.
    • Added new manually checked data (Marhaba Eli); dev cut at 900 sentences, additional 1656 sentences go to train.
    • Added morphological analysis from Apertium (Fran Tyers and Dan Zeman); OOV = 26%.
  • 2017-03-01 v2.0
    • Converted to UD v2 guidelines (Dan Zeman).
    • Added new manually checked data (Marhaba Eli).
    • Re-split to achieve 10K test tokens (first 900 sentences); rest is dev, no train now.
  • 2016-11-01 v1.4
    • Initial release.
=== Machine-readable metadata (DO NOT REMOVE!) ================================
Data available since: UD v1.4
License: CC BY-SA 4.0
Includes text: yes
Genre: fiction
Lemmas: automatic
UPOS: manual native
XPOS: automatic with corrections
Features: automatic
Relations: manual native
Contributors: Eli, Marhaba; Zeman, Daniel; Tyers, Francis
Contributing: elsewhere
Contact: marhaba@xju.edu.cn
===============================================================================