RusDraCor

Corpus Description

We are building a Russian Drama Corpus with files encoded in TEI-P5. Our corpus comprises 212 plays to date, originating from ilibrary, Wikisource, РВБ, lib.ru, ФЕБ, СовЛит and Wikilivres, converted to TEI and corrected and enhanced by us. There will be more.

If you want to cite the corpus, please use this publication:

Fischer, Frank, et al. (2019). Programmable Corpora: Introducing DraCor, an Infrastructure for the Research on European Drama. In Proceedings of DH2019: "Complexities", Utrecht University, doi:10.5281/zenodo.4284002.

RusDraCor was first presented on June 29, 2017, at the Corpora 2017 conference in St. Petersburg (our slides here), on July 11, 2017, at the "Digitizing the stage" conference in Oxford and on November 14, 2017, at the TEI 2017 conference in Victoria. The social network data we extract from plays may also be explored on our website dracor.org/rus or via our Shinyapp.

If you just want to download the corpus in its current state in XML-TEI, do this:

svn export https://github.com/dracor-org/rusdracor/trunk/tei

API

An easy way to download the network data (instead of the actual TEI files) is to use our API (documentation here). If you have jq installed, it would work like this:

for play in `curl 'https://dracor.org/api/corpora/rus' | jq -r ".dramas[] .name"`; do
    wget -O "$play".csv https://dracor.org/api/corpora/rus/play/"$play"/networkdata/csv
done

The API info page is at https://dracor.org/api/info.

Simple Visualisation with R

To have a first look at the distribution of the number of speakers per play over time, you could feed the metadata table into R:

library(data.table)
library(ggplot2)
rusdracor <- fread("https://dracor.org/api/corpora/rus/metadata.csv")
ggplot(rusdracor[], aes(x = yearNormalized, y = numOfSpeakers)) + geom_point()

Result:

Here is a barplot showing the number of plays per decade:

(README last updated on July 26, 2021.)

Name		Name	Last commit message	Last commit date
Latest commit History 534 Commits
css		css
tei		tei
README.md		README.md
corpus.xml		corpus.xml
format.conf		format.conf
numOfSpeakers.png		numOfSpeakers.png
playsPerDecade.png		playsPerDecade.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RusDraCor

Corpus Description

API

Simple Visualisation with R

About

Releases

Packages

Contributors 11

Languages

dracor-org/rusdracor

Folders and files

Latest commit

History

Repository files navigation

RusDraCor

Corpus Description

API

Simple Visualisation with R

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 11

Languages

Packages