Skip to content

aau-es-ml/ssl_noise-robust_kws

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SSL noise robust KWS

This repository contains code for applying Data2Vec to pretrain the KWT model by Axel Berg as described in Improving Label-Deficient Keyword Spotting Through Self-Supervised Pretraining. The goal was to improve the noise robustness of keyword spotting when only a small amount of labelled data is available.

Experiments are carried out on a reduced labelled setup of the Google Speech Commands V2 data set. In the reduced setup 80% of the training set is used for unlabelled pretraining using Data2Vec, and only 20% for labelled training.

The of the code was developed as part of a Semester project at Aalborg University, Aalborg, Denmark. The majority of the code is from or based of the code found in the repository data2vec-KWS.

Additionally, the Data2Vec module takes inspiration from another effort to implement Data2Vec in PyTorch found here, as well as the published code in the FAIRSEQ python library.

Setup

The codebase is implemented in Python and tested for Python 3.10 It is reccomended to first setup a virual environment, e.g. using venv (using conda is also a possibility):

python -m venv venv

The necessary Python packages to run the code is installed by running:

pip install -r requirements.txt

To download the Google Speech Commands V2 data set run the command:

bash download_gspeech_v2.sh <path/to/dataset/root>

For example:

bash download_gspeech_v2.sh google_speech_commands

The text files with the data splits of the reduced labelled setup can be generated by running:

python data2vec-KWS-main/make_data_list.py --pretrain <amount_of_train_set_for_pretrain> -v <path/to/validation_list.txt> -t <path/to/testing_list.txt> -d <path/to/dataset/root> -o <output dir>

For example:

python data2vec-KWS-main/make_data_list.py --pretrain 0.8 -v google_speech_commands/validation_list.txt -t google_speech_commands/testing_list.txt -d google_speech_commands -o google_speech_commands/_generated

Google Speech Commands

The Google Speech Commands V2 data set consists of 105 829 labelled keyword sequences of approximately 1 s.

The original train, validation, test splits are 80:10:10. For experiments 80% of the training set have been used for unlabelled pretraining and the last 20% for labelled training. This yields the following splits:

Split No. keyword examples
Pretraining 67 874
Training 16 969
Validation 9 981
Testing 11 005

Create Noisy dataset

To generate noisy datasets the scripts in the noise_gen folder can be used. Here the script create_noise_folder_snrmix.sh can be used to generate the training data. The training data is 50% clean data and 50% noisy data. The scripts in the folder test_only only creates folders with the noise test data. For all scripts the path to the clean Google Speech Commands V2 data set and to the noise data needs to be inserted along with the desired noise types and SNR's.

Experiment configuration

The configurations for all experiments are stored in .yaml files, which contain everything from the data path to hyperparameters. You will need to ensure that the paths in the configuration file matches your local data paths.

Configuration files for all experiments regarding the above-mentioned dataset are provided in the data2vec-KWS-main/KWT_configs and data2vec-KWS-main/data2vec/data2vec_configs folders. This includes configuration for KWT model baselines on the reduced Speech Commands training set, Data2Vec pretraining configurations for Speech Commands pretratining set and finetuning on the reduced Speech Commands training set.