Skip to content

Estonian Treebank in form of Universal Dependencies

Notifications You must be signed in to change notification settings

EstSyntax/EstUD

Repository files navigation

EstUD

Estonian Treebank in form of Universal Dependencies

License: CC BY-SA 4.0

Version 2.8 Enhanced

This is an experimental version of official UD2.8 Estonian UD treebanks. Enhanced dependencies have been added as following:

  • Empty nodes for elided predicates - manually
  • Propagation of incoming dependencies to conjuncts - automatically using Treex software
  • Propagation of outgoing dependencies from conjuncts - automatically using Treex
  • Additional subject relations for control and raising constructions - automatically using Treex
  • Coreference in relative clause constructions - manually
  • Modifier labels that contain the preposition or other case-marking information - automatically using Treex

Version 2.2

Versions 2.0 and upper are available at http://universaldependencies.org/ (https://github.com/UniversalDependencies/UD_Estonian-EDT/ , master branch has a stable version and dev branch is the newest)

Version 1.4

The Estonian Universal Dependencies' Treebank is developed at University of Tartu. The release currently available for download (as of December 2016) comprises 23 documents, ca 339,000 tokens in 24,752 sentences.

Treebank has been created by automatically converting Estonian Dependency Treebank (https://github.com/EstSyntax/EDT) to Universal Dependencies'. Later, some relations have been reannotated manually. Some sentences which belong to the test set have been removed temporally. Please contact the contributor if you need then before February 2017.

The documentation is available in English (Estonian_UD_2016.pdf) and in Estonian (Eesti_UD_2016.pdf).


Eesti universaalsete sõltuvustega märgendatud puudepanka arendatakse Tartu Ülikoolis. Hetkel saadaolev versioon (detsember 2016) koosneb 23 tekstifailist, ca 339000 sõnast, 24752 lausest.

Eesti universaalsete sõltuvuste puudepank on saadud Eesti sõltuvuspuude panga (EDT) automaatselt konverteerimisel ning seejärel on käsitsi parandatud osalausete vahelisi sõltuvusmärgendeid .

Süntaktilised teisendused tehakse automaatselt spetsiaalsete kitsenduste grammatika reeglite abil (320 reeglit). Teisenduste kvaliteeti hinnati 3000-sõnelise ilu_indrikson.tasak korpusefailil: LA 0.991, LAS 0.985, UA 0.992

Värskeim dokumentatsioon paikneb failis Eesti_UD_2016.pdf

Algne dokumentatsioon paikneb kataloogis vers1_0:

  • UDmargendus_est.pdf - märgenduse kirjedus eesti keeles
  • Estonianrelations.pdf - märgenduse kirjeldus inglise keeles
  • udteisendustekirjeldus.pdf - teisenduste kirjeldus ja kvaliteedi hinnang
  • puudepangaallikad.pdf - lähtefailide kirjeldus ja maht

About

Estonian Treebank in form of Universal Dependencies

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages