Keyword-Spotting

In this project, we do an ablation study of existing audio processing models for 'keyword spotting' using the Speech Commands dataset. The dataset has 65,000 one second long utterances of 30 short words in English done by thousands of people.

Evaluation curves for all experiments can be found here:

https://wandb.ai/vbagal/speech_commands

How to reproduce the results?

Clone the repo and follow these steps.

Download dataset using following command:

python download_dataset.py --save_dir <path_to_save_the_dataset>

To train the baseline:

python main.py --run_name <some_name> --data_path <path_where_dataset_is_saved> --model M5 --input_type waveform --batch_size 512

To try out different approaches, please use the keywords like model, input_type, mixup, do_aug, cyclelr. For saving weights, new directory called checkpoints/<some_name> is created.

To test the model:

python main.py --run_name <same_name_as_before> --data_path <path_where_dataset_is_saved> --ckpt_path <path_to_checkpoint> --model M5 --input_type waveform --batch_size 512 --mode test

Comparison of different approaches

Audio can be provided as input to the model in multiple forms such as raw waveform, mel spectrogram or MFCC. With MFCCor Mel Spectrogram, the pipeline is the following:

With raw waveform as the input, the Wav2Vec 2.0 architecture by Facebook AI operates as below:

Method	Raw Audio	Mel Spec	MFCC	Test F1
M5	✔️	❌	❌	0.8636
Resnet-18	❌	✔️	❌	0.9246
Resnet-18	❌	❌	✔️	0.9522
EfficientNet-B2	❌	❌	✔️	0.9507
EfficientNet-B4	❌	❌	✔️	0.9558
Wav2Vec 2.0	✔️	❌	❌	0.9746

Ablation on ResNet-18

Input	Mixup	Mask Augs	Test F1
Mel Spec	❌	❌	0.9246
Mel Spec	✔️	❌	0.9283
MFCC	✔️	❌	0.9522
MFCC	❌	✔️	0.9510
MFCC	✔️	✔️	0.9410

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
code		code
images		images
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code

code

images

images

README.md

README.md

Repository files navigation

Keyword-Spotting

How to reproduce the results?

Comparison of different approaches

Ablation on ResNet-18

About

Releases

Packages

Languages

VirajBagal/Keyword-Spotting

Folders and files

Latest commit

History

Repository files navigation

Keyword-Spotting

How to reproduce the results?

Comparison of different approaches

Ablation on ResNet-18

About

Resources

Stars

Watchers

Forks

Languages