Awesome ASV Anti-Spoofing

Overview

This is a curated list of awesome ASV(Automatic Speaker Verification) Anti-Spoofing papers, libraries, datasets, and other resources.

The purpose of this repo is to organize the world’s resources for voice anti-spoofing, and make them universally accessible and useful.

To add items to this page, simply send a pull request. (contributing guide)

Publications

Special topics

Review & survey papers

Fast & light anti-spoofing

Anti-spoofing with phoneme

Anti-spoofing with brain

The Crux of Voice (In)Security: A Brain Study of Speaker Legitimacy Detection, 2019

Anti-spoofing with fieldprint

Anti-spoofing with articulatory gesture

Anti-spoofing with Multi-task

Anti-spoofing with smarthome

Protecting Voice Controlled Systems Using Sound Source Identification Based on Acoustic Cues, 2018

Challenges

Spoofing attack type

Self trigger attack

Replay attack

Impostor attack

Other

2021

Data Quality as Predictor of Voice Anti-Spoofing Generalization

2020

2019

2018

2017

2016

2015

2014

Anti-spoofing: voice databases

2013

Vulnerability evaluation of speaker verification under voice conversionspoofing: the effect of text constraints

1999

Vulnerability In Speaker Verification - A Study Of Technical Impostor Techniques

Software

Framework

Link	Language	Description
SpeechBrain	Python & PyTorch	SpeechBrain is an open-source and all-in-one speech toolkit based on PyTorch.
SIDEKIT	Python	SIDEKIT is an open source package allow a rapid prototyping of an end-to-end speaker recognition system.
pyAudioAnalysis	Python	Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications.
kaldi-asr	C++ & Bash	A toolkit for speech & speaker recognition, intended for use by researchers and professionals.
Alize LIA_SpkDet	C++	ALIZÉ is an opensource platform for speaker recognition. LIA_SpkSeg is the tools for model training,feature normalization,socre normalization,etc.
SPEAR Toolkit (based on BOB)	python	This package is part of the signal-processing and machine learning toolbox Bob.
MSRidentity Toolbox	Matlab	This toolbox contains a collection of MATLAB tools and routines that can be used for research and development in speaker recognition. PDF

Evaluation

Clustering

Link	Language	Description
sklearn.cluster	Python	scikit-learn clustering algorithms.
PLDA	Python	Probabilistic Linear Discriminant Analysis & classification, written in Python.
PLDA	C++	Open-source implementation of simplified PLDA (Probabilistic Linear Discriminant Analysis).
Auto-Tuning Spectral Clustering	Python	Auto-tuning Spectral Clustering method that does not need development set or supervised tuning.

Speaker embedding

Link	Method	Language	Description
resemble-ai/Resemblyzer	d-vector	Python & PyTorch	PyTorch implementation of generalized end-to-end loss for speaker verification, which can be used for voice cloning and diarization.
Speaker_Verification	d-vector	Python & TensorFlow	Tensorflow implementation of generalized end-to-end loss for speaker verification.
PyTorch_Speaker_Verification	d-vector	Python & PyTorch	PyTorch implementation of "Generalized End-to-End Loss for Speaker Verification" by Wan, Li et al. With UIS-RNN integration.
Real-Time Voice Cloning	d-vector	Python & PyTorch	Implementation of "Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis" (SV2TTS) with a vocoder that works in real-time.
deep-speaker	d-vector	Python & Keras	Third party implementation of the Baidu paper Deep Speaker: an End-to-End Neural Speaker Embedding System.
x-vector-kaldi-tf	x-vector	Python & TensorFlow & Perl	Tensorflow implementation of x-vector topology on top of Kaldi recipe.
kaldi-ivector	i-vector	C++ & Perl	Extension to Kaldi implementing the standard i-vector hyperparameter estimation and i-vector extraction procedure.
voxceleb-ivector	i-vector	Perl	Voxceleb1 i-vector based speaker recognition system.
pytorch_xvectors	x-vector	Python & PyTorch	PyTorch implementation of Voxceleb x-vectors. Additionaly, includes meta-learning architectures for embedding training. Evaluated with speaker diarization and speaker verification.
ASVtorch	i-vector	Python & PyTorch	ASVtorch is a toolkit for automatic speaker recognition.

Audio feature extraction

Link	Language	Description
LibROSA	Python	Python library for audio and music analysis. https://librosa.github.io/
python_speech_features	Python	This library provides common speech features for ASR including MFCCs and filterbank energies. https://python-speech-features.readthedocs.io/en/latest/
pyAudioAnalysis	Python	Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications.

Audio data augmentation

Link	Language	Description
pyroomacoustics	Python	Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios. https://pyroomacoustics.readthedocs.io
gpuRIR	Python	Python library for Room Impulse Response (RIR) simulation with GPU acceleration
rir_simulator_python	Python	Room impulse response simulator using python

Other software

Link	Language	Description
Rawnet2	Python & Bash	End-to-End Neural Anti-spoofing.
ReMASC	Python	Realistic Replay Attack Corpus for Voice Controlled Systems.
Attentive-Filtering-Network	Python & Bash	University of Edinbrugh-Johns Hopkins University's system for ASVspoof 2017 Version 2.0 dataset.

Datasets

Spoofing datasets

Audio	Type	Language	Pricing	Additional information
ASVspoof 2019	PA(16.44Gb) , LA(7.116Gb)	en	Free	Evaluation Plan
ASVspoof 2017	PA-Train(200.7Mb), Dev(133.7Mb), Eval(1.065Gb)	en	Free	Evaluation Plan
SAS Corpus	LA-SS_LARGE-16k (7.591Gb), SS_LARGE-48k (7.798Gb), SS_MARY_LARGE (7.303Gb), SS_SMALL-16k (7.582Gb), SS_SMALL-16k (7.582Gb), SS_SMALL-48k (7.788Gb), VC_C1 (10.00Gb), VC_EVC (6.518Gb), VC_FESTVOX (10.04Gb), VC_FS (10.15Gb), VC_GMM (9.830Gb), VC_KPLS (9.703Gb), VC_LSP (9.616Gb), VC_TVC (6.489Gb), human (3.229Gb)	en	Free	LICENSE
ASVspoof 2015	LA-Data - Part aa (7.543Gb),Data - Part ab (7.543Gb),Data - Part ac (7.331Gb)	en	Free	LICENSE

Phisical access training sets

Logical access training sets

ASV2019 Training set
ASV2015 Training set Part aa,ab,ac

Augmentation noise sources

Name	Utterances	Pricing	Additional information
AudioSet	2M	Free	A large-scale dataset of manually annotated audio events.
MUSAN	N/A	Free	MUSAN is a corpus of music, speech, and noise recordings.

Speaker Verification training sets

Name	Utterances	Speakers	Language	Pricing	Additional information
TIMIT	6K+	630	en	$250.00	Published in 1993, the TIMIT corpus of read speech is one of the earliest speaker recognition datasets.
VCTK	43K+	109	en	Free	Most were selected from a newspaper plus the Rainbow Passage and an elicitation paragraph intended to identify the speaker's accent.
LibriSpeech	292K	2K+	en	Free	Large-scale (1000 hours) corpus of read English speech.
Multilingual LibriSpeech (MLS)	?	?	en, de, nl, es, fr, it, pt, po	Free	Multilingual LibriSpeech (MLS) dataset is a large multilingual corpus suitable for speech research. The dataset is derived from read audiobooks from LibriVox and consists of 8 languages - English, German, Dutch, Spanish, French, Italian, Portuguese, Polish.
LibriVox	180K	9K+	Multiple	Free	Free public domain audiobooks. LibriSpeech is a processed subset of LibriVox. Each original unsegmented utterance could be very long.
VoxCeleb 1&2	1M+	7K	Multiple	Free	VoxCeleb is an audio-visual dataset consisting of short clips of human speech, extracted from interview videos uploaded to YouTube.
The Spoken Wikipedia Corpora	5K	879	en, de, nl	Free	Volunteer readers reading Wikipedia articles.
CN-Celeb	130K+	1K	zh	Free	A Free Chinese Speaker Recognition Corpus Released by CSLT@Tsinghua University.
BookTubeSpeech	8K	8K	en	Free	Audio samples extracted from BookTube videos - videos where people share their opinions on books - from YouTube. The dataset can be downloaded using BookTubeSpeech-download.
DeepMine	540K	1850	fa, en	Unknown	A speech database in Persian and English designed to build and evaluate speaker verification, as well as Persian ASR systems.
NISP-Dataset	?	345	hi, kn, ml, ta, te (all Indian languages)	Free	This dataset contains speech recordings along with speaker physical parameters (height, weight, ... ) as well as regional information and linguistic information.

Conferences

Conference/Workshop	Frequency	Page Limit	Organization	Blind Review
ICASSP	Annual	4 + 1 (ref)	IEEE	No
InterSpeech	Annual	4 + 1 (ref)	ISCA	No
APSIPA	Annual	4 + 1 (ref)	IEEE	Yes
Odyssey	Biennial	8 + 2 (ref)	ISCA	No
SLT	Biennial	6 + 2 (ref)	IEEE	Yes
ASRU	Biennial	6 + 2 (ref)	IEEE	Yes
WASPAA	Biennial	4 + 1 (ref)	IEEE	No

Other learning materials

Books

Handbook of Biometric Anti-Spoofing

Tech blogs

Video tutorials

Products

Company	Product
Pindrop	Deep Voice Engine
ID R&D	IDLive™ Voice
VoiceAI	Voiceprint recognition API
Kriston	Voiceprint API,SDK

Project

OCTAVE

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

License

PHJhjpeng1992/awesome-asv-antispoofing

Folders and files

Latest commit

History

Repository files navigation