Voice Activity Detection-based Speech Emotion Recognition for children with autism

Please direct any questions or requests to Manuel Milling (milling.manuel@gmail.com).

Citing

If you use the in this repository in your research work, you are kindly asked to acknowledge the use in your publication.

Milling, Manuel, Alice Baird, Katrin D. Bartl-Pokorny, Shuo Liu, Alyssa M. Alcorn, Jie Shen, Teresa Tavassoli et al. "Evaluating the Impact of Voice Activity Detection on Speech Emotion Recognition for Autistic Children." Frontiers in Computer Science 4 (2022): 837269.

Bibtex

@ARTICLE{milling2022_vad-ser,
AUTHOR={Milling, Manuel and Baird, Alice and Bartl-Pokorny, Katrin D. and Liu, Shuo and Alcorn, Alyssa M. and Shen, Jie and Tavassoli, Teresa and Ainger, Eloise and Pellicano, Elizabeth and Pantic, Maja and Cummins, Nicholas and Schuller, Björn W.},   
TITLE={Evaluating the Impact of Voice Activity Detection on Speech Emotion Recognition for Autistic Children},      
JOURNAL={Frontiers in Computer Science},      
VOLUME={4},     
YEAR={2022},      
URL={https://www.frontiersin.org/article/10.3389/fcomp.2022.837269},       
DOI={10.3389/fcomp.2022.837269},      
ISSN={2624-9898},   
}

Overview

This repository offers two components, explained in more detail in the paper referenced above: the first component is a voice activity detection component, which can be trained in particular for vocalisations of children with autism, for general vocalisations or similar tasks. The second component is a continuous speech emotion recognition component, which is supposed to be used after preprocessing of the data with the VAD systems. Due to privacy regulations the data used in the referenced paper cannot be published.

Usage

The implementation is based on python 3.8.10 tensorflow 2.3.1. Additional requirements can be installed with requirements.txt

Voice Activity Detection

The RNN-LSTM model used for the VAD task can be loaded from src.models.audio_feature_models.py. The model expects input features (10ms LLDs recommended), which can be normalised and extracted for instance with the opensmile package https://github.com/audeering/opensmile and extract_features_command_line.py. The script main_VAD contains code for creating the dataset dependent datagenerator and the datageneration as well as the training, evaluation and inference of the models.

Speech emotion recognition

The SER task is implemented in a similar as the previous task, however it uses the main_ser.py for datageneration, training and evaluation of the models. The features (1s eGeMAPs functionals) and sequence-based data generation and according CCC evaluation vary in comparison to the VAD task.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
src		src
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
chunk_audio_from_labels.py		chunk_audio_from_labels.py
extract_features_command_line.py		extract_features_command_line.py
main_VAD.py		main_VAD.py
main_ser.py		main_ser.py
normalise_data.py		normalise_data.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice Activity Detection-based Speech Emotion Recognition for children with autism

Citing

Overview

Usage

Voice Activity Detection

Speech emotion recognition

About

Releases

Packages

Contributors 2

Languages

License

EIHW/VAD_SER_pipeline_ASC

Folders and files

Latest commit

History

Repository files navigation

Voice Activity Detection-based Speech Emotion Recognition for children with autism

Citing

Overview

Usage

Voice Activity Detection

Speech emotion recognition

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages