Skip to content

This is a curated list of awesome ASV(Automatic Speaker Verification) Anti-Spoofing papers, libraries, datasets, and other resources.

License

Notifications You must be signed in to change notification settings

PHJhjpeng1992/awesome-asv-antispoofing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

Awesome ASV Anti-Spoofing Awesome Contribution

Table of contents

Overview

This is a curated list of awesome ASV(Automatic Speaker Verification) Anti-Spoofing papers, libraries, datasets, and other resources.

The purpose of this repo is to organize the world’s resources for voice anti-spoofing, and make them universally accessible and useful.

To add items to this page, simply send a pull request. (contributing guide)

Publications

Special topics

Review & survey papers

Fast & light anti-spoofing

Anti-spoofing with phoneme

Anti-spoofing with brain

Anti-spoofing with fieldprint

Anti-spoofing with articulatory gesture

Anti-spoofing with Multi-task

Anti-spoofing with smarthome

Challenges

Spoofing attack type

Self trigger attack

Inaudible voice command attack

*IEMI Threats for Information Security: Remote Command Injection on Modern Smartphones, 2015

Hidden voice command attack

Voice conversion attack

Speech synthesis attack

Replay attack

Impostor attack

Other

2021

2020

2019

2018

2017

2016

2015

2014

2013

1999

Software

Framework

Link Language Description
SpeechBrain GitHub stars Python & PyTorch SpeechBrain is an open-source and all-in-one speech toolkit based on PyTorch.
SIDEKIT Python SIDEKIT is an open source package allow a rapid prototyping of an end-to-end speaker recognition system.
pyAudioAnalysis GitHub stars Python Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications.
kaldi-asr Build Status C++ & Bash A toolkit for speech & speaker recognition, intended for use by researchers and professionals.
Alize LIA_SpkDet C++ ALIZÉ is an opensource platform for speaker recognition. LIA_SpkSeg is the tools for model training,feature normalization,socre normalization,etc.
SPEAR Toolkit (based on BOB) python This package is part of the signal-processing and machine learning toolbox Bob.
MSRidentity Toolbox Matlab This toolbox contains a collection of MATLAB tools and routines that can be used for research and development in speaker recognition. PDF

Evaluation

Clustering

Link Language Description
sklearn.cluster Build Status Python scikit-learn clustering algorithms.
PLDA GitHub stars Python Probabilistic Linear Discriminant Analysis & classification, written in Python.
PLDA GitHub stars C++ Open-source implementation of simplified PLDA (Probabilistic Linear Discriminant Analysis).
Auto-Tuning Spectral Clustering GitHub stars Python Auto-tuning Spectral Clustering method that does not need development set or supervised tuning.

Speaker embedding

Link Method Language Description
resemble-ai/Resemblyzer GitHub stars d-vector Python & PyTorch PyTorch implementation of generalized end-to-end loss for speaker verification, which can be used for voice cloning and diarization.
Speaker_Verification GitHub stars d-vector Python & TensorFlow Tensorflow implementation of generalized end-to-end loss for speaker verification.
PyTorch_Speaker_Verification GitHub stars d-vector Python & PyTorch PyTorch implementation of "Generalized End-to-End Loss for Speaker Verification" by Wan, Li et al. With UIS-RNN integration.
Real-Time Voice Cloning GitHub stars d-vector Python & PyTorch Implementation of "Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis" (SV2TTS) with a vocoder that works in real-time.
deep-speaker GitHub stars d-vector Python & Keras Third party implementation of the Baidu paper Deep Speaker: an End-to-End Neural Speaker Embedding System.
x-vector-kaldi-tf GitHub stars x-vector Python & TensorFlow & Perl Tensorflow implementation of x-vector topology on top of Kaldi recipe.
kaldi-ivector GitHub stars i-vector C++ & Perl Extension to Kaldi implementing the standard i-vector hyperparameter estimation and i-vector extraction procedure.
voxceleb-ivector GitHub stars i-vector Perl Voxceleb1 i-vector based speaker recognition system.
pytorch_xvectors GitHub stars x-vector Python & PyTorch PyTorch implementation of Voxceleb x-vectors. Additionaly, includes meta-learning architectures for embedding training. Evaluated with speaker diarization and speaker verification.
ASVtorch i-vector Python & PyTorch ASVtorch is a toolkit for automatic speaker recognition.

Audio feature extraction

Link Language Description
LibROSA GitHub stars Python Python library for audio and music analysis. https://librosa.github.io/
python_speech_features GitHub stars Python This library provides common speech features for ASR including MFCCs and filterbank energies. https://python-speech-features.readthedocs.io/en/latest/
pyAudioAnalysis GitHub stars Python Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications.

Audio data augmentation

Link Language Description
pyroomacoustics GitHub stars Python Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios. https://pyroomacoustics.readthedocs.io
gpuRIR GitHub stars Python Python library for Room Impulse Response (RIR) simulation with GPU acceleration
rir_simulator_python GitHub stars Python Room impulse response simulator using python

Other software

Link Language Description
Rawnet2 GitHub stars Python & Bash End-to-End Neural Anti-spoofing.
ReMASC GitHub stars Python Realistic Replay Attack Corpus for Voice Controlled Systems.
Attentive-Filtering-Network GitHub stars Python & Bash University of Edinbrugh-Johns Hopkins University's system for ASVspoof 2017 Version 2.0 dataset.

Datasets

Spoofing datasets

Audio Type Language Pricing Additional information
ASVspoof 2019 PA(16.44Gb) , LA(7.116Gb) en Free Evaluation Plan
ASVspoof 2017 PA-Train(200.7Mb), Dev(133.7Mb), Eval(1.065Gb) en Free Evaluation Plan
SAS Corpus LA-SS_LARGE-16k (7.591Gb), SS_LARGE-48k (7.798Gb), SS_MARY_LARGE (7.303Gb), SS_SMALL-16k (7.582Gb), SS_SMALL-16k (7.582Gb), SS_SMALL-48k (7.788Gb), VC_C1 (10.00Gb), VC_EVC (6.518Gb), VC_FESTVOX (10.04Gb), VC_FS (10.15Gb), VC_GMM (9.830Gb), VC_KPLS (9.703Gb), VC_LSP (9.616Gb), VC_TVC (6.489Gb), human (3.229Gb) en Free LICENSE
ASVspoof 2015 LA-Data - Part aa (7.543Gb),Data - Part ab (7.543Gb),Data - Part ac (7.331Gb) en Free LICENSE

Phisical access training sets

Logical access training sets

Augmentation noise sources

Name Utterances Pricing Additional information
AudioSet 2M Free A large-scale dataset of manually annotated audio events.
MUSAN N/A Free MUSAN is a corpus of music, speech, and noise recordings.

Speaker Verification training sets

Name Utterances Speakers Language Pricing Additional information
TIMIT 6K+ 630 en $250.00 Published in 1993, the TIMIT corpus of read speech is one of the earliest speaker recognition datasets.
VCTK 43K+ 109 en Free Most were selected from a newspaper plus the Rainbow Passage and an elicitation paragraph intended to identify the speaker's accent.
LibriSpeech 292K 2K+ en Free Large-scale (1000 hours) corpus of read English speech.
Multilingual LibriSpeech (MLS) ? ? en, de, nl, es, fr, it, pt, po Free Multilingual LibriSpeech (MLS) dataset is a large multilingual corpus suitable for speech research. The dataset is derived from read audiobooks from LibriVox and consists of 8 languages - English, German, Dutch, Spanish, French, Italian, Portuguese, Polish.
LibriVox 180K 9K+ Multiple Free Free public domain audiobooks. LibriSpeech is a processed subset of LibriVox. Each original unsegmented utterance could be very long.
VoxCeleb 1&2 1M+ 7K Multiple Free VoxCeleb is an audio-visual dataset consisting of short clips of human speech, extracted from interview videos uploaded to YouTube.
The Spoken Wikipedia Corpora 5K 879 en, de, nl Free Volunteer readers reading Wikipedia articles.
CN-Celeb 130K+ 1K zh Free A Free Chinese Speaker Recognition Corpus Released by CSLT@Tsinghua University.
BookTubeSpeech 8K 8K en Free Audio samples extracted from BookTube videos - videos where people share their opinions on books - from YouTube. The dataset can be downloaded using BookTubeSpeech-download.
DeepMine 540K 1850 fa, en Unknown A speech database in Persian and English designed to build and evaluate speaker verification, as well as Persian ASR systems.
NISP-Dataset ? 345 hi, kn, ml, ta, te (all Indian languages) Free This dataset contains speech recordings along with speaker physical parameters (height, weight, ... ) as well as regional information and linguistic information.

Conferences

Conference/Workshop Frequency Page Limit Organization Blind Review
ICASSP Annual 4 + 1 (ref) IEEE No
InterSpeech Annual 4 + 1 (ref) ISCA No
APSIPA Annual 4 + 1 (ref) IEEE Yes
Odyssey Biennial 8 + 2 (ref) ISCA No
SLT Biennial 6 + 2 (ref) IEEE Yes
ASRU Biennial 6 + 2 (ref) IEEE Yes
WASPAA Biennial 4 + 1 (ref) IEEE No

Other learning materials

Books

Tech blogs

Video tutorials

Products

Company Product
Pindrop Deep Voice Engine
ID R&D IDLive™ Voice
VoiceAI Voiceprint recognition API
Kriston Voiceprint API,SDK

Project

Standards

About

This is a curated list of awesome ASV(Automatic Speaker Verification) Anti-Spoofing papers, libraries, datasets, and other resources.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages