Personalization for Robust Voice Pathology Detection in Sound Waves

The official repository for paper: Personalization for Robust Voice Pathology Detection in Sound Waves.

Accepted to the 24th INTERSPEECH Conference.

Overview

Automatic voice pathology detection is promising for non-invasive screening and early intervention using sound signals. Nevertheless, existing methods are susceptible to covariate shifts due to background noises, human voice variations, and data selection biases leading to severe performance degradation in real-world scenarios. Hence, we propose a non-invasive framework that contrastively learns personalization from sound waves as a pre-train and predicts latent-spaced profile features through semi-supervised learning. It allows all subjects from various distributions (e.g., regionality, gender, age) to benefit from personalized predictions for robust voice pathology in a privacy-fulfilled manner. We extensively evaluate the framework on four real-world respiratory illnesses datasets, including Coswara, COUGHVID, ICBHI, and our private dataset - ASound, under multiple covariate shift settings (i.e., cross-dataset), improving up to 4.12% in overall performance.

About this implementation

This implementation is based on the fairseq toolkit. Our main source code was written in the following directory:

Requirements and Installation

Please follow the instructions to install the framework.

Additionally, install librosa:

pip install librosa soundfile

Training

Preparing data

Create corresponding metadata (including: sample spectrum file path, number of frequency bands, number of time steps, label, profile id (optional)) for your dataset.

Place the meta information inside data directory.

Pre-training

Downstream task pre-training:

fairseq-hydra-train task.data=$pre-train_data --config-dir examples/RoPADet/config/pretraining --config-name general_pretrain

Profile encoder pre-training:

fairseq-hydra-train task.data=$pre-train_data --config-dir examples/RoPADet/config/pretraining --config-name discriminative

Profile encoding:

python examples/RoPADet/profiles_gen.py data/ --path $model_path

Self-training:

In each iteration, run:

python examples/RoPADet/profiles_gen.py data/ --path $teacher_path; CUDA_VISIBLE_DEVICES="3" fairseq-hydra-train task.data=$self-train_data task.profiles_path=$teacher_profile checkpoint.finetune_from_model=$pre-trained_model --config-dir examples/RoPADet/config/finetuning --config-name profile_self_training

Fine-tuning

Without personalization:

fairseq-hydra-train task.data=$meta_dir_path model.w2v_path=pretrained_model --config-dir examples/RoPADet/config/finetuning --config-name without_profile

With personalization:

fairseq-hydra-train task.data=$meta_dir_path model.w2v_path=pretrained_model task.profiling=True task.profiles_path=$profile_path --config-dir examples/RoPADet/config/finetuning --config-name with_profile

Evaluation

Model without personalization:

python examples/RoPADet/eval_classifier.py data --labels label --input_file $meta_dir_path

Model with personalization:

python examples/RoPADet/eval_classifier_profile.py data --labels label --input_file $meta_dir_path --profile_path $profile_path

Data availability

Our private dataset, ASound, is available upon request for research purposes. Please send your information, including details about research usage and affiliations to tungtk2@fpt.com.

Name		Name	Last commit message	Last commit date
Latest commit History 2,160 Commits
.circleci		.circleci
.github		.github
.hydra		.hydra
.vscode		.vscode
data		data
docs		docs
examples		examples
fairseq		fairseq
fairseq_cli		fairseq_cli
scripts		scripts
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
.isort.cfg		.isort.cfg
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
hubconf.py		hubconf.py
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
setup.py		setup.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Personalization for Robust Voice Pathology Detection in Sound Waves

Overview

About this implementation

Requirements and Installation

Training

Preparing data

Pre-training

Profile encoding:

Self-training:

Fine-tuning

Evaluation

Data availability

About

Releases

Packages

Languages

License

Fsoft-AIC/RoPADet

Folders and files

Latest commit

History

Repository files navigation

Personalization for Robust Voice Pathology Detection in Sound Waves

Overview

About this implementation

Requirements and Installation

Training

Preparing data

Pre-training

Profile encoding:

Self-training:

Fine-tuning

Evaluation

Data availability

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages