Skip to content

A repo for paper: Self-supervised Spoofing Audio Detection Scheme

Notifications You must be signed in to change notification settings

Zain-Jiang/SSAD

Repository files navigation

Self-supervised Spoofing Audio Detection Scheme

This repository contains the implementations of SSAD,which is a speech waveform encoder trained in a self-supervised manner with the so called worker framework. A SSAD model can be used as a speech feature extractor or a pre-trained encoder for our spoofing audio detection task.

Requirements

  • PyTorch 1.0 or higher
  • Torchvision 0.2 or higher
  • Install the requirements from requirements.txt: pip install -r requirements.txt
  • Intall SSAD modules python: setup.py install NOTE: Edit the cupy-cuda100 requirement in the file if needed depending on your CUDA version. Defaults to 10.0 now
pip install -r requirements.txt # install the requirements
export PYTHONPATH=. #use this phrase to set the root path whenever you start

Pre-trained Model

This is the valid_loss in tensorboard:

image-20201013001612361

The pretrained SSAD encoder model has been trained for the 99 epochs (in my experience, you can trained it for 200 epochs for best appearance), and the classifier is casually trained by me. The results and URLs are as follows.

min-tDCF EER classifier's name
0.1643 6.3869 % SENet12

URLs: https://pan.baidu.com/s/1fx-fk2rOPWMNgLRlwJ33JA passcode: 4j1b

Remember to modify the path and config in the bash scripts below.

Data preparation

To make the data preparation the following files have to be provided:

  • training files list train_scp: contains a file name per line (without directory names), including .wav/mp3/etc. extension.
  • test files list test_scp: contains a wav file name per line (without directory names), including .wav/mp3/etc. extension.
  • dictionary with filename dict.npy-> integer speaker class (speaker id) correspondence (same filenames as in train/test lists).
  • data.cfg: the dataset config file
  • stats_pase+.pkl: the normalization statistics for the workers to work properly
sbatch -A yzren -p gpu --gres=gpu:1 -c 16 preprocess/preprocess.sh

#time: about 50 minutes

Train SSAD encoder

sbatch -A yzren -p gpu --gres=gpu:1 -c 16 train.sh

#time: about 2 days for 100 epochs

Extract features for classifier

sbatch -A yzren -p gpu --gres=gpu:1 -c 16 feature.sh

Train classifier

Operations for classifier should be made in the directory named ADV, and you should modify the config file: ADV/_configs/config_LA_SENet12_LPSseg_uf_seg600.json

cd ADV
sbatch -A yzren -p gpu --gres=gpu:1 -c 16 run_train.sh

Evaluation

sbatch -A yzren -p gpu --gres=gpu:1 -c 16 run_eval.sh

About

A repo for paper: Self-supervised Spoofing Audio Detection Scheme

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published