Skip to content
No description, website, or topics provided.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
CONTRIBUTING.md
LICENSE.txt
README.md
eval.log
orv_torot-ud-dev.conllu
orv_torot-ud-test.conllu
orv_torot-ud-train.conllu
stats.xml

README.md

Summary

UD_Old_Russian-TOROT is a conversion of a selection of the Old East Slavonic and Middle Russian data in the Tromsø Old Russian and OCS Treebank (TOROT), which was originally annotated in PROIEL dependency format.

Introduction

UD-Old-Russian-TOROT is a conversion of a selection of the Old East Slavonic and Middle Russian data in the Tromsø Old Russian and OCS Treebank (TOROT), which is maintained at UiT The Arctic University of Norway. The treebank is manually annotated, with some automatic preprocessing, on PROIEL dependency format. New texts are still being added. The treebank contains texts from a variety of mediaeval and early modern genres, such as chronicles, legal documents, lives of saints and correspondence. Treebank releases are available from https://github.com/torottreebank/treebank-releases. This conversion is based on the 20190505 release.

Data splits

The test set consists of Afanasij Nikitin 5 and 19; birchbark letters 497, 502 and 902; Russkaja pravda 1-6; Novgorod First Chronicle (Synodal ms.), entries for 6642-6646; Primary Chronicle (Laurentian ms.), Introduction, the entry for 6463 and the Instruction of Vladimir Monomakh; Suzdal Chronicle (Laurentian ms.), entry for 6657; Zadonshchina; Tale of the Fall of Constantinople, chapters 1-2; Uspenskij sbornik, Life of Feodosij Pečerskij 25-26.

The development set consists of Afanasij Nikitin 6 and 20; birchbark letters 644 and 682; Russkaja pravda 8-9; Novgorod First Chronicle (Synodal ms.), entries for 6675-6681 and 6717; Primary Chronicle (Laurentian ms.), the two entries for 6453 and the entry for 6494; Suzdal Chronicle (Laurentian ms.), entries for 6659 and 662; The Tale of Dracula; Tale of the Fall of Constantinople, chapter 3; Uspenskij sbornik, Life of Feodosij Pečerskij 27-28.

The training set consists of larger, continuous excerpts from Afanasij Nikitin, Russkaja pravda, Novgorod First Chronicle (Synodal ms.), Primary Chronicle (Laurentian ms.), Suzdal Chronicle (Laurentian ms.), Tale of the Fall of Constantinople, Uspenskij sbornik, as well as a further selection of birchbark letters and The Tale of Igor's campaign.

Acknowledgments

The conversion was performed using a script written by Dag Haug and modified by Hanne Eckhoff. The texts were annotated and reviewed by a brilliant team of annotators, who are acknowledged in the original treebank files and who deserve to be thanked profusely.

References

Hanne Martine Eckhoff and Aleksandrs Berdičevskis. 2015. 'Linguistics vs. digital editions: The Tromsø Old Russian and OCS Treebank'. Scripta & e-scripta 14–15, pp. 9-25.

=== Machine-readable metadata (DO NOT REMOVE!) ================================
Data available since: UD v2.4
License: CC BY-NC-SA 3.0
Includes text: yes
Genre: nonfiction legal
Lemmas: converted from manual
UPOS: converted from manual
XPOS: manual native
Features: converted from manual
Relations: converted from manual
Contributors: Eckhoff, Hanne
Contributing: elsewhere
Contact: hanne.eckhoff@mod-langs.ox.ac.uk
===============================================================================
You can’t perform that action at this time.