Free Spoken Digit Dataset (FSDD)
A simple audio/speech dataset consisting of recordings of spoken digits in
wav files at 8kHz. The recordings are trimmed so that they have near minimal silence at the beginnings and ends.
FSDD is an open dataset, which means it will grow over time as data is contributed. Thus in order to enable reproducibility and accurate citation in scientific journals the dataset is versioned using
- 3 speakers
- 1,500 recordings (50 of each digit per speaker)
- English pronunciations
Files are named in the following format:
Please contribute your homemade recordings. All recordings should be mono 8kHz
wav files and be trimmed to have minimal silence. Don't forget to update
metadata.py with the speaker meta-data.
To add your data, follow the recording instructions in
and then run
split_and_label_numbers.py to make your files.
metadata.py contains meta-data regarding the speakers gender and accents.
Trims silences at beginning and end of an audio file. Splits an audio file into multiple audio files by periods of silence.
A simple class that provides an easy to use API to access the data.
Used for creating spectrograms of the audio data. Spectrograms are often a useful pre-processing step.
The test set officially consists of the first 10% of the recordings. Recordings numbered
0-4 (inclusive) are in the test and
5-49 are in the training set.
Made with FSDD
Did you use FSDD in a paper, project or app? Add it here!
- https://adhishthite.github.io/sound-mnist/ by Adhish Thite (https://adhishthite.github.io/)
- C#/.NET. The FSDD dataset can be used in .NET applications using the FreeSpokenDigitsDataset class included withing the Accord.NET Framework. A basic example on how to perform spoken digits classification using audio MFCC features can be found here.