Skip to content

czyzi0/the-mc-speech-dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 

Repository files navigation

The MC Speech Dataset

This is public domain speech dataset consisting of 24018 short audio clips of a single speaker reading sentences in Polish. A transcription is provided for each clip. Clips have total length of more than 22 hours.

Texts are in public domain. The audio was recorded in 2021-22 as a part of my master's thesis and is in public domain.

The dataset is available at:

If you use this dataset, please cite:

@masterthesis{mcspeech,
  title={Analiza porównawcza korpusów nagrań mowy dla celów syntezy mowy w języku polskim},
  author={Czyżnikiewicz, Mateusz},
  year={2022},
  month={December},
  school={Warsaw University of Technology},
  type={Master's thesis},
  doi={10.13140/RG.2.2.26293.24800},
  note={Available at \url{http://dx.doi.org/10.13140/RG.2.2.26293.24800}},
}

Also, if you find this resource helpful, kindly consider leaving a ⭐.

About

Free speech dataset consisting of 24018 short audio clips of a single speaker reading sentences in Polish

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published