This repository contains the official resources for the paper:
"Thai Speech Spoofing Detection Dataset with Variations in Speaking Styles"
The Chula Spoofed Speech (CSS) dataset is a large-scale Thai spoofed speech dataset consisting of 1,332,120 utterances, including both bona fide and synthetic speech. Synthetic samples are generated using five high-quality TTS systems, with all utterances matched to the same text as the real recordings. The dataset covers a wide range of ages and speaking styles, making it suitable for research in anti-spoofing, and robust speech modeling.
Due to agreements with the voice actors, this dataset is only available for research purposes on a case-by-case basis. We provide:
- 📁 Sample Data (available in this repository under
data_samples/
)
Inside the data_samples/
folder, you will find two directories:
Bona fide/
— real human speechSpoofed/
— synthetic speech generated by TTS systems
data_samples/
├── Bona fide/
│ ├── Casual/
│ │ └── *.wav # WAV files in casual speaking style
│ ├── Excited/
│ │ └── *.wav # WAV files in excited speaking style
│ └── Formal/
│ └── *.wav # WAV files in formal speaking style
├── Spoofed/
├── Casual/
│ └── *.wav # Spoofed WAV files in casual speaking style
├── Excited/
│ └── *.wav # Spoofed WAV files in excited speaking style
└── Formal/
└── *.wav # Spoofed WAV files in formal speaking style
If you would like full access to the complete CSS dataset, please contact the author directly with a brief description of your research purpose.
📧 Contact: [ ekapol.c@chula.ac.th ]
📄 Access policy: For academic research only.
This research was jointly supported by the PMU-C grant (C05F660049) and Amity Accentix Co., Ltd.
We would also like to express our sincere gratitude to all the voice actors who generously contributed their time and talent, making this project possible. The following individuals graciously provided their voices: สิรภพ ศรีเสาวนันท์, นิติ ธีรวิโรจน์, โสมฤทัย สอดส่อง, วริศรา เผือกผ่อง, พีรกานต์ สยาม, มธุรส นิ่มวิจิตร, ศิรสา ชลายนานนท์, จตุรงค์ ยามีภักดี, สุธีวัฒน์ ภิญโญ, นนที ถาวรพันธุ์, อนุชิต มณีชัย, ธรรมพร คำเคน, สิรภพ บาฬี, ชาณิช ชโลธรกรธวัช, ณัฐพงศ์ สุทธิไชย, อาทิตยา คำภีระ และ ปิยนุช ตันตระกูล, as well as others who chose to remain anonymous. Their contributions were essential to the success of this work.