Skip to content

LingConLab/data_oral_bashkir_corpus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Spoken corpus of Bashkir (Rakhmetovo and Baimovo) Data Repository

DOI

This repository is the place where the data from the Spoken corpus of Bashkir (Rakhmetovo and Baimovo) is curated. This repository also provides an alternative way to access corpus data locally. The data is stored in data_oral_bashkir_corpus.csv with 36545 rows and 14 columns:

  • filename
  • time_start
  • time_end
  • speaker
  • recorded
  • sentence_id
  • text
  • translation
  • word_forms
  • morphonology
  • gloss
  • language
  • dataset_creator
  • dataset_provider

About corpus

The corpus contains oral texts in Bashkir recorded in 2011–2017 in the villages of Rakhmetovo and Baimovo of the Abzelilovsky district in the Republic of Bashkortostan. These villages are located in the Kubelyak dialect zone. The Kubelyak dialect belongs to the Southeastern subbranch of the Eastern dialect of Bashkir (with some traits of the Southern dialect). The texts of the corpus are close to standard Bashkir, although they manifest some phonetic and morphophonological dialectal features.

How to cite the corpus and the data

If you use data from the Spoken corpus of Bashkir in your research, please cite as follows:

Maria Ovsjannikova, Sergey Say, Ekaterina Aplonova, Anna Smetina, Elena Sokur. Spoken corpus of Bashkir (Rakhmetovo and Baimovo). St. Petersburg: Institute for linguistic studies; Moscow: Linguistic Convergence Laboratory, NRU HSE. (Available online at http://lingconlab.ru/spoken_bashkir/, accessed on ...)

You may contact with questions about the Corpus data or leave an issue in this repository:

masha.ovsjannikova@gmail.com (Maria Ovsjannikova)

serjozhka@yahoo.com (Sergey Say)

You may contact with questions about the search platform or leave an issue in its own repository:

elena.o.sokur@gmail.com (Elena Sokur)