Skip to content
/ CSS Public

An official repository of CSS: Chula Spoofed Speech (CSS) dataset — a large-scale Thai spoofed speech dataset.

Notifications You must be signed in to change notification settings

SLSCU/CSS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

90 Commits
 
 
 
 
 
 
 
 

Repository files navigation

CSS - Chula Spoofed Speech Dataset

Overview

This repository contains the official resources for the paper:

"Thai Speech Spoofing Detection Dataset with Variations in Speaking Styles"

The Chula Spoofed Speech (CSS) dataset is a large-scale Thai spoofed speech dataset consisting of 1,332,120 utterances, including both bona fide and synthetic speech. Synthetic samples are generated using five high-quality TTS systems, with all utterances matched to the same text as the real recordings. The dataset covers a wide range of ages and speaking styles, making it suitable for research in anti-spoofing, and robust speech modeling.


Demo

🎧 👉 Demo Page (Audio Samples)


📂 Dataset Access

Due to agreements with the voice actors, this dataset is only available for research purposes on a case-by-case basis. We provide:

  • 📁 Sample Data (available in this repository under data_samples/)

Inside the data_samples/ folder, you will find two directories:

  • Bona fide/ — real human speech
  • Spoofed/ — synthetic speech generated by TTS systems
data_samples/
├── Bona fide/
│   ├── Casual/
│   │   └── *.wav           # WAV files in casual speaking style
│   ├── Excited/
│   │   └── *.wav           # WAV files in excited speaking style
│   └── Formal/
│       └── *.wav           # WAV files in formal speaking style
├── Spoofed/
    ├── Casual/
    │   └── *.wav           # Spoofed WAV files in casual speaking style
    ├── Excited/
    │   └── *.wav           # Spoofed WAV files in excited speaking style
    └── Formal/
        └── *.wav           # Spoofed WAV files in formal speaking style

If you would like full access to the complete CSS dataset, please contact the author directly with a brief description of your research purpose.

📧 Contact: [ ekapol.c@chula.ac.th ]

📄 Access policy: For academic research only.


Acknowledgement

This research was jointly supported by the PMU-C grant (C05F660049) and Amity Accentix Co., Ltd.

We would also like to express our sincere gratitude to all the voice actors who generously contributed their time and talent, making this project possible. The following individuals graciously provided their voices: สิรภพ ศรีเสาวนันท์, นิติ ธีรวิโรจน์, โสมฤทัย สอดส่อง, วริศรา เผือกผ่อง, พีรกานต์ สยาม, มธุรส นิ่มวิจิตร, ศิรสา ชลายนานนท์, จตุรงค์ ยามีภักดี, สุธีวัฒน์ ภิญโญ, นนที ถาวรพันธุ์, อนุชิต มณีชัย, ธรรมพร คำเคน, สิรภพ บาฬี, ชาณิช ชโลธรกรธวัช, ณัฐพงศ์ สุทธิไชย, อาทิตยา คำภีระ และ ปิยนุช ตันตระกูล, as well as others who chose to remain anonymous. Their contributions were essential to the success of this work.

About

An official repository of CSS: Chula Spoofed Speech (CSS) dataset — a large-scale Thai spoofed speech dataset.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages