Skip to content

Latest commit



101 lines (77 loc) · 6.2 KB

File metadata and controls

101 lines (77 loc) · 6.2 KB


Erme Universal Dependencies annotated texts Moksha are the origin of UD_Moksha-JR with annotation (CoNLL-U) for texts in the Moksha language, it originally consists of a sample from a number of fiction authors writing originals in Moksha.


This is a collection of sentences from almost entirely original Moksha-language literary sources dating back to the 1880s with Universal Dependencies (UD) annotations. It has been constructed in alignment with parallel work on Erzya language Universal Dependencies.

There are also about 20 parallel sentences translated by Marina Levina from the Erzya and Russian texts: and

The sent_id attribute value is not randomized in works published earlier than 1938. Developing UD documentation can be found at for Erzya.


The original annotation has been performed by Jack Rueter at the University of Helsinki with the help of Marina Levina at the Mordovian State University im. P.N. Ogariova, Mordvin Languages Department using morphological tools that were originally built with funding from a Kone Foundation «Language Programme» funded project: «Creation of Morphological Parsers for Minority Finno-Ugrian Languages» (2013–2014) with the linguistic work of Merja Salo, and facilitated at the Norwegian Arctic University in Tromsø. Work with the Moksha treebank builds upon previous experience with the UD_Erzya-JR treebank and continued consultations and discussions with Francis Tyers, Tommi Pirinen, Jonathan Washington. Without the Moksha writers themselves, however, we would be no where…

Annotation work is simultaneous to finite-state transducer development by Nadjezhda Kabaeva, Marina Levina and Jack Rueter in the GiellaLT infrastucture, which also works with Constraint Grammar disambiguation of the morphological analysis.


If you use this data set in an academic publication, I would be ever so grateful if you cited it as follows:

Jack Rueter. (2018, January 20). Erme UD Moksha (Version v1.0)


About the authors

  • Кузнецов, Юрий 1975: Сембось ушеткшни киста. Саранск.
  • Mishanina, V. I. (Мишанина, В. И.) 1972: Лиендень очконяса. Мокша №3, 38–39. Саранск. (MishaninaValentina_LiendenyOchkonyasa_Moksha-1972-No2-pp38-39) (Мордовиянь Кадошкина аймаконь Адаж веле)
    • Мокшень кяль. Синтаксис* : учебник / аноклаф-тиф Н. С. Алямкинонь и О. Е. Поляковонь профессорхнень вятемаснон ала. -- Саранск : Изд-во Мордов. ун-та, 2008. -- 200 с. -- На морд.-мокша яз.
  • Pyanzin, Fyodor (Пьянзин, Фёдор) 1991: Седить тердеманц коряс: Повесть. -- Саранск: Мордов. кн. изд-во, 1991. -- 120 с. Мордов.-мокша яз.

In release 2.7 additional example sentences used in the Moksha-language grammar Мокшень кяль, синтаксис: учебник (2008) were included. These sentences are marked with sent_id-s that contain the components MKS:2008:page:n-th sentence:original author. It is hoped that the inclusion of these sentences will help cover various grammatical phenomena in Moksha syntax. When refering to these sentences, we advise you also cite the original source:

  • Алямкин, Н. С. (гл. ред.); Гришунина, В. П.; Иванова, Г. С.; Кабаева, Н. Ф.; Кулакова, Н. А.; Левина, М. З.; Поляков, О. Е. (гл. ред.); Рогожина, В. Ф.; Седова, П. Е. 2008: * Мокшень кяль. Синтаксис : учебник [Moksha language. Syntax: reader]. -- Саранск : Изд-во Мордов. ун-та.
  • Alâmkin, N. S. (chief ed.); Grushinina, V. P.; Ivanova, G. S.; Kabaeva, N. F.; Kulakova, N. A.; Levina, M. Z.; Polâkov, O. E. (chief ed.); Rogozhina, V. F.; Sedova, P. E. 2008: * Mokshen' kâl'. Sintaksis: uchebnik [Moksha language. Syntax: reader]. -- Saransk : Izd-vo Mordov. un-ta.
  • Kehayov, Petar 2020: Between facts and speech acts: The conditional and condictional-conjunctive in Moksha Mordvin. Linguistica Uralica LVI 2020 1 []


  • 2024-04-29
    • Add compound:nn
  • 2023-10-29
    • Add CCC variant to train.conllu for minimal data split
  • 2023-04-29
    • Work with Valency, Diminutive
    • New lines from Moksha syntaxis (2008)
  • 2022-10-31
    • Grammar research input
    • Migrating :lmp, :lto, :lfrom into :lmod, and :comp to :cmp
  • 2022-04-29
    • Deprel correction and documentation
    • Trouble shooting in dependencies
  • 2021-10-31
    • Auxiliary, feature and deprel documentation
    • Example sentences with conditional from Kehayov (2020).
  • 2021-04-29
    • Auxiliary, feature and deprel documentation
    • Continued annotation of Moksha syntaxis (2008).
    • Language tag systematization.
  • 2020-11-15 v2.7
    • Adding more example sentences quoted in Moksha syntaxis (2008).
  • 2020-05-15 v2.6
    • Adding example sentences quoted in Moksha syntaxis (2008).
    • Expanding advmod:mmod, :lmod, :tmod and adding NameTypes.
  • 2019-11-15 v2.5
    • Initial release in Universal Dependencies.
=== Machine-readable metadata (DO NOT REMOVE!) ================================
Data available since: UD v2.5
License: CC BY-SA 4.0
Includes text: yes
Genre: nonfiction news
Lemmas: converted from manual
UPOS: converted from manual
XPOS: manual native
Features: converted from manual
Relations: converted from manual
Contributors: Rueter, Jack; Levina, Maria; Kabaeva, Nadezhda; Molnár, Judit; Alnajjar, Khalid
Contributing: here