Skip to content
No description, website, or topics provided.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
CONTRIBUTING.md
LICENSE.txt
README.md
eval.log
orv_rnc-ud-test.conllu
stats.xml

README.md

Summary

UD_Old_Russian-RNC is a sample of the Middle Russian corpus (1300-1700), a part of the Russian National Corpus. The data were originally annotated according to the RNC and extended UD-Russian morphological schemas and UD 2.4 dependency schema.

Introduction

Middle Russian Corpus (http://ruscorpora.ru/search-mid_rus.html) is part of the Russian National Corpus included in the collection of historical corpora [Sichinava 2014]. The lists of part-of-speech and core grammatical tags is available at: https://github.com/olesar/UD_MidRussian/blob/master/MidRussianUD.md, the document also shows the mapping between the RNC and UD tags. The annotation project is maintained by Vinogradov Institute of the Russian Language RAS (Moscow) in collaboration with researchers and students of the National Research University Higher School of Economics (Moscow) and Lomonosov Moscow State University.

Acknowledgments

We are immensely grateful to Irina Juryeva, Roman Ilushin, Maria Skachedubova, and Elizaveta Bunina who contributed to the annotation of the original Middle Russian Corpus data. We would like to thank Dmitri Sitchinava, Anna Pichhadze, Alexandr Moldovan, Vladimir Plungian, Roman Krivko, Yves Scherrer, Achim Rabus, Hanne Eckhoff for fruitful discussion and advice.

References

  • Lyashevskaya, Olga (submitted), A reusable tagset for the morphologically rich language in change: a case of Middle Russian.

  • [ru] Lyashevskaya, Olga (2018), A test dataset for the automatic morphological analysis of the Middle Russian texts [Testovaja kollektsija dlja zadach avtomaticheskogo morfologicheskogo analiza tekstov starorusskoj pis’mennosti]. In: The academic heritage of V.A. Bogoroditsky and the modern vector of research of the Kazan linguistic school [Nauchnoje nasledije V.A. Bogoroditskogo i sovremennyj vektor issledovanij Kazanskoj lingvisticheskoj shkoly], Works and materials of int. conf., Kazan: Kazan University, pp. 131–135.

  • Lyashevskaya, Olga (2018), A frequency grammatical dictionary of Russian, 1300-1700 (based on the RNC Middle Russian Corpus). Online edition: http://ru-eval.ru/hist/freq-15-17/.

  • [ru] Sichinava D. V. (2014), Historical corpora of the Russian National Corpus as a tool for diachronic grammatical studies [Istoricheskie korpusa Natsional’nogo korpusa russkogo jazyka kak instrument diakhronicheskikh issledovanij grammatiki]. In: Baranov V. A., Zheljazkova V., Lavretiev A. M. (eds.), Textual heritage and information technologies. El’Manuscript–2014 [Pismenoto nasledstvo i informatsionnite tekhnologii. El’Manuscript–2014]. Proceedings of the 5th International research conference. Sofia, Izhevsk, 2014.

=== Machine-readable metadata (DO NOT REMOVE!) ================================
Data available since: UD v2.4
License: CC BY-SA 4.0
Includes text: yes
Genre: legal nonfiction
Lemmas: manual native
UPOS: converted from manual
XPOS: manual native
Features: converted from manual
Relations: manual native
Contributors: Lyashevskaya, Olga
Contributing: elsewhere
Contact: olesar@yandex.ru
===============================================================================
You can’t perform that action at this time.