Old Church Slavonic data from the PROIEL project.
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
.gitignore
CONTRIBUTING.md
LICENSE.txt
README.md
cu_proiel-ud-dev.conllu
cu_proiel-ud-test.conllu
cu_proiel-ud-train.conllu
eval.log
stats.xml

README.md

Summary

The Old Church Slavonic (OCS) UD treebank is based on the Old Church Slavonic data from the PROIEL treebank and contains the text of the Codex Marianus New Testament translation.

Introduction

The Old Church Slavonic (OCS) UD treebank is based on the Old Church Slavonic data from the PROIEL treebank, which is maintained at the Department of Philosophy, Classics, History of Arts and Ideas at the University of Oslo. The conversion is based on the 20180408 release of the PROIEL treebank available from https://github.com/proiel/proiel-treebank/releases. The original annotators are acknowledged in the files available there. The conversion code is available in the Rubygem proiel-cli, https://github.com/proiel/proiel-cli.

The treebank contains the text of the Codex Marianus New Testament translation. The original annotation guidelines are available at http://folk.uio.no/daghaug/syntactic_guidelines.pdf. The text and tokenization comes from Corpus Cyrillo-Methodianum Helsingiense http://www.helsinki.fi/slaavilaiset/ccmh/marianus.html.

Acknowledgements

The data have been automatically converted to the UD scheme by Dag Haug. Thanks to all the original annotators!

Data splits

The development set consists of Matthew 5, 6, 21, and 24, Mark 5 and 6, Luke 4, 7, 8 and 18, and John 10, 11, 17, 19. The test data consists of Matthew 7, 8, 22, 23 and 25, Mark 7, 8 and 15, Luke 5, 9, 10 and 19, and John 12, 13, 18, 21.

References

Dag T. T. Haug and Marius L. Jøhndal. 2008. 'Creating a Parallel Treebank of the Old Indo-European Bible Translations'. In Caroline Sporleder and Kiril Ribarov (eds.). Proceedings of the Second Workshop on Language Technology for Cultural Heritage Data (LaTeCH 2008) (2008), pp. 27-34.

Changelog

V2.2 Repository renamed from UD_Old_Church_Slavonic to UD_Old_Church_Slavonic-PROIEL. V2.0 The treebank was converted to UDv2 and the data splits were changed.

=== Machine-readable metadata (DO NOT REMOVE!) ================================ Data available since: UD v1.2 License: CC BY-NC-SA 3.0 Includes text: yes Genre: bible Lemmas: converted from manual UPOS: converted from manual XPOS: manual native Features: converted from manual Relations: converted from manual Contributors: Haug, Dag Contributing: elsewhere Contact: daghaug@ifikk.uio.no