This repository is the place where the data from the Spoken corpus of Bashkir (Rakhmetovo and Baimovo) is curated. This repository also provides an alternative way to access corpus data locally. The data is stored in data_oral_bashkir_corpus.csv
with 36545 rows and 14 columns:
filename
time_start
time_end
speaker
recorded
sentence_id
text
translation
word_forms
morphonology
gloss
language
dataset_creator
dataset_provider
The corpus contains oral texts in Bashkir recorded in 2011–2017 in the villages of Rakhmetovo and Baimovo of the Abzelilovsky district in the Republic of Bashkortostan. These villages are located in the Kubelyak dialect zone. The Kubelyak dialect belongs to the Southeastern subbranch of the Eastern dialect of Bashkir (with some traits of the Southern dialect). The texts of the corpus are close to standard Bashkir, although they manifest some phonetic and morphophonological dialectal features.
If you use data from the Spoken corpus of Bashkir in your research, please cite as follows:
Maria Ovsjannikova, Sergey Say, Ekaterina Aplonova, Anna Smetina, Elena Sokur. Spoken corpus of Bashkir (Rakhmetovo and Baimovo). St. Petersburg: Institute for linguistic studies; Moscow: Linguistic Convergence Laboratory, NRU HSE. (Available online at http://lingconlab.ru/spoken_bashkir/, accessed on ...)
You may contact with questions about the Corpus data or leave an issue in this repository:
masha.ovsjannikova@gmail.com (Maria Ovsjannikova)
serjozhka@yahoo.com (Sergey Say)
You may contact with questions about the search platform or leave an issue in its own repository:
elena.o.sokur@gmail.com (Elena Sokur)