CSS - Chula Spoofed Speech Dataset

Overview

This repository contains the official resources for the paper:

"Thai Speech Spoofing Detection Dataset with Variations in Speaking Styles"

The Chula Spoofed Speech (CSS) dataset is a large-scale Thai spoofed speech dataset consisting of 1,332,120 utterances, including both bona fide and synthetic speech. Synthetic samples are generated using five high-quality TTS systems, with all utterances matched to the same text as the real recordings. The dataset covers a wide range of ages and speaking styles, making it suitable for research in anti-spoofing, and robust speech modeling.

Demo

🎧 👉 Demo Page (Audio Samples)

📂 Dataset Access

Due to agreements with the voice actors, this dataset is only available for research purposes on a case-by-case basis. We provide:

📁 Sample Data (available in this repository under data_samples/)

Inside the data_samples/ folder, you will find two directories:

Bona fide/ — real human speech
Spoofed/ — synthetic speech generated by TTS systems

data_samples/
├── Bona fide/
│   ├── Casual/
│   │   └── *.wav           # WAV files in casual speaking style
│   ├── Excited/
│   │   └── *.wav           # WAV files in excited speaking style
│   └── Formal/
│       └── *.wav           # WAV files in formal speaking style
├── Spoofed/
    ├── Casual/
    │   └── *.wav           # Spoofed WAV files in casual speaking style
    ├── Excited/
    │   └── *.wav           # Spoofed WAV files in excited speaking style
    └── Formal/
        └── *.wav           # Spoofed WAV files in formal speaking style

If you would like full access to the complete CSS dataset, please contact the author directly with a brief description of your research purpose.

📧 Contact: [ ekapol.c@chula.ac.th ]

📄 Access policy: For academic research only.

Acknowledgement

This research was jointly supported by the PMU-C grant (C05F660049) and Amity Accentix Co., Ltd.

We would also like to express our sincere gratitude to all the voice actors who generously contributed their time and talent, making this project possible. The following individuals graciously provided their voices: สิรภพ ศรีเสาวนันท์, นิติ ธีรวิโรจน์, โสมฤทัย สอดส่อง, วริศรา เผือกผ่อง, พีรกานต์ สยาม, มธุรส นิ่มวิจิตร, ศิรสา ชลายนานนท์, จตุรงค์ ยามีภักดี, สุธีวัฒน์ ภิญโญ, นนที ถาวรพันธุ์, อนุชิต มณีชัย, ธรรมพร คำเคน, สิรภพ บาฬี, ชาณิช ชโลธรกรธวัช, ณัฐพงศ์ สุทธิไชย, อาทิตยา คำภีระ และ ปิยนุช ตันตระกูล, as well as others who chose to remain anonymous. Their contributions were essential to the success of this work.

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
data_samples		data_samples
images		images
README.md		README.md
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CSS - Chula Spoofed Speech Dataset

Overview

Demo

📂 Dataset Access

Acknowledgement

About

Uh oh!

Releases

Packages

Languages

SLSCU/CSS

Folders and files

Latest commit

History

Repository files navigation

CSS - Chula Spoofed Speech Dataset

Overview

Demo

📂 Dataset Access

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages