The MC Speech Dataset

This is public domain speech dataset consisting of 24018 short audio clips of a single speaker reading sentences in Polish. A transcription is provided for each clip. Clips have total length of more than 22 hours.

Texts are in public domain. The audio was recorded in 2021-22 as a part of my master's thesis and is in public domain.

The dataset is available at:

If you use this dataset, please cite:

@masterthesis{mcspeech,
  title={Analiza porównawcza korpusów nagrań mowy dla celów syntezy mowy w języku polskim},
  author={Czyżnikiewicz, Mateusz},
  year={2022},
  month={December},
  school={Warsaw University of Technology},
  type={Master's thesis},
  doi={10.13140/RG.2.2.26293.24800},
  note={Available at \url{http://dx.doi.org/10.13140/RG.2.2.26293.24800}},
}

Also, if you find this resource helpful, kindly consider leaving a ⭐.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.md

README.md

Repository files navigation

The MC Speech Dataset

About

Releases

License

czyzi0/the-mc-speech-dataset

Folders and files

Latest commit

History

LICENSE

LICENSE

README.md

README.md

Repository files navigation

The MC Speech Dataset

About

Topics

Resources

License

Stars

Watchers

Forks

Releases