Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
73 lines (56 sloc) 1.66 KB

Summary

Lithuanian treebank annotated manually (dependencies) using the Morphological Annotator by CCL, Vytautas Magnus University (http://tekstynas.vdu.lt/) and manual disambiguation. A pilot version which includes news and an essay by Tomas Venclova is available here.

Changelog

  • 2018-04-15 v2.2
    • Repository renamed from UD_Lithuanian to UD_Lithuanian-HSE.
  • 2017-11-15 v2.1
    • No changes.
  • 2017-03-01 v2.0
    • Initial UD release.

BASIC STATISTICS

Tree count: 263 Word count: 5356 Token count: 5356 Dep. relations: 37 of which 5 language specific POS tags: 16 Category=value feature pairs: 43

DATA SETS

The treebank was split into training (60%), testing (20%), and development (20%) sets.

training set:

Tree count: 153 Word count: 3210 Token count: 3210 Dep. relations: 36 of which 5 language specific POS tags: 16 Category=value feature pairs: 43

dev set:

Tree count: 55 Word count: 1086 Token count: 1086 Dep. relations: 33 of which 5 language specific POS tags: 16 Category=value feature pairs: 40

test set:

Tree count: 55 Word count: 1060 Token count: 1060 Dep. relations: 32 of which 3 language specific POS tags: 16 Category=value feature pairs: 39

=== Machine-readable metadata (DO NOT REMOVE!) ================================
Data available since: UD v2.0
License: CC BY-SA 4.0
Includes text: yes
Genre: nonfiction news
Lemmas: converted from manual
UPOS: converted from manual
XPOS: manual native
Features: converted from manual
Relations: manual native
Contributors: Lyashevskaya, Olga; Sichinava, Dmitry
Contributing: elsewhere
Contact: olesar@yandex.ru
===============================================================================
You can’t perform that action at this time.