Skip to content

LingConLab/data_oral_khakas_corpus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

The Spoken corpus of the dialects of Khakas Data Repository

DOI

This repository is the place where the data from the Spoken corpus of the dialects of Khakas is curated. This repository also provides an alternative way to access corpus data locally. The data is stored in data_oral_khakas_corpus.csv with 85107 rows and 14 columns:

  • filename
  • time_start
  • time_end
  • speaker
  • recorded
  • sentence_id
  • text
  • translation
  • word_forms
  • morphonology
  • gloss
  • language
  • dataset_creator
  • dataset_provider

About corpus

The Spoken corpus of the dialects of Khakas contains transcribed annotated texts, synchronized with the sound. The texts were recorded during the 21st century with speakers born in 1916-1985 in different expeditions from Moscow to the Republic of Khakassia. All texts are translated to Russian. Texts were analyzed using the automatic parser, and then edited and synchronized with the sound with the help of the ELAN software.

How to cite the corpus and the data

If you use data from the Spoken corpus of the dialects of Khakas in your research, please cite as follows:

Vera Maltseva, Elena Sokur. Spoken corpus of the dialects of Khakas. Moscow: Institute of Linguistics; Moscow: Linguistic Convergence Laboratory, NRU HSE. (Available online at http://lingconlab.ru/spoken_khakas/, accessed on ....)

You may contact with questions about the Corpus data or leave an issue in this repository:

malt.wh@gmail.com (Vera Maltseva)

You may contact with questions about the search platform or leave an issue in its own repository:

elena.o.sokur@gmail.com (Elena Sokur)