Skip to content

Bangla cleaned speech corpus, specially developed for Bangla Text to Speech

Notifications You must be signed in to change notification settings

Bangla-Language-Processing/Bangla-Speech-Corpora

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 

Repository files navigation

Bangla Speech Corpora

Bangla cleaned speech corpus, specially developed for Bangla Text to Speech back in 2009. It is orginally hosted in sourceforge.

Characterstics of the corpus

This dataset consists of three different corpora and those were developed for three different purposes.

  • “Corpus for acoustic analysis” was developed for acoustic analysis of Bangla phoneme inventory.
  • “Diphone corpus” was developed for diphone concatenation based speech synthesis.
  • “Continuous speech corpus” was developed for intonation model and unit selection based speech synthesis.

Other characterstics include:

  • Area of speech corpora: Speech synthesis, phonetic research and speech recognition.
  • Spoken content: Two approaches considered such as domain and phonological distribution.
  • Professional recording studio: This is necessary for a clear acoustic signal from which it is possible to get clear acoustic information.
  • Speaking style: Continuous read speech.
  • Manual segmentation: Though this leads to significant amount of effort but it also affirm the accuracy of the labeling.
  • Recording setup: Supervised onsite recording.

Download

Due to the size of the corpora (4.4GB) we uploaded data on mendeley and also kept the data on sourceforge.

Option 1: Please follow mendeley page.

Option 2: sourceforge.

Please Cite this paper:

Firoj Alam, S. M. Murtoza Habib, Dil Afroza Sultana and Mumit Khan, Development of Annotated Bangla Speech Corpora, Spoken Language Technologies for Under-resourced language (SLTU’10), vol 1, pp-35-41, Penang, Malasia, May 3 - 5, 2010.paper

@inproceedings{alam2010development,
  title={Development of annotated Bangla speech corpora},
  author={Alam, Firoj and Habib, SM Murtoza and Sultana, Dil Afroza and Khan, Mumit},
  booktitle={Spoken Languages Technologies for Under-Resourced Languages},
  year={2010}
}

Releases

No releases published

Packages

No packages published