- Overview
- Publications
- Software
- Datasets
- Conferences
- Leaderboards
- Other learning materials
- Products
- Project
- Standards
This is a curated list of awesome ASV(Automatic Speaker Verification) Anti-Spoofing papers, libraries, datasets, and other resources.
The purpose of this repo is to organize the world’s resources for voice anti-spoofing, and make them universally accessible and useful.
To add items to this page, simply send a pull request. (contributing guide)
- Advances in anti-spoofing: From the perspective of ASVspoof challenges, 2020
- Countermeasures to Replay Attacks: A Review, 2020
- Introduction to Voice Presentation Attack Detection and Recent Advances, 2019
- An Investigation of Deep-Learning Frameworks for Speaker Verification Anti-spoofing, 2017
- Spoofing and countermeasures for speaker verification A survey, 2015
- Void: A fast and light voice liveness detection system, 2020
- Audio Replay Attack Detection with Deep Learning Frameworks, 2017
- Phoneme Specific Modelling and Scoring Techniques for Anti Spoofing System, 2019
- The SYSU System for the Interspeech 2015 Automatic Speaker Verification Spoofing and Countermeasures Challenge, 2015
- The Catcher in the Field: A Fieldprint based Spoofing Detection for Text-Independent Speaker Verification, 2019
- You Can Hear But You Cannot Steal: Defending against Voice Impersonation Attacks on Smartphones, 2017
- Hearing Your Voice is Not Enough: An Articulatory Gesture Based Liveness Detection for Voice Authentication, 2017
- VoiceLive: A Phoneme Localization based Liveness Detection for Voice Authentication on Smartphones, 2016
- Adversarial Multi-Task Learning for Speaker Normalization in Replay Detection, 2020
- Multi-task learning of deep neural networks for joint automatic speaker verification and spoofing detection, 2019
- Anti-Spoofing Speaker Verification System with Multi-Feature Integration and Multi-Task Learning, 2019
- Replay spoofing detection system for automatic speaker verification using multi-task learning of noise classes, 2018
- ASVspoof 2021, 2021
- ASVspoof 2019, 2019
- ASVspoof 2017, 2017
- BTAS 2016, 2016
- ASVspoof 2015, 2015
-
Your Voice Assistant is Mine: How to Abuse Speakers to Steal Information and Control Your Phone, 2014
-
A11y Attacks: Exploiting Accessibility in Operating Systems, 2014
*IEMI Threats for Information Security: Remote Command Injection on Modern Smartphones, 2015
Hidden voice command attack
-
White-box attack
-
Black-box attack
-
Gray-box attack
- A study on replay attack and anti-spoofing for text-dependent speaker verification, 2014
- A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification, 2017
-
Can a Professional Imitator Fool a GMM-Based Speaker Verification System?, 2005
-
I-Vectors Meet Imitators: On Vulnerability of Speaker Verification Systems Against Voice Mimicry, 2013
- End-to-end anti-spoofing with RawNet2
- Residual networks for resisting noise: analysis of an embeddings-based spoofing countermeasure
- Detecting Replay Attacks Using Multi-Channel Audio: A Neural Network-Based Method
- An analysis of speaker dependent models in replay detection
- Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification: Fundamentals
- Dynamically Mitigating Data Discrepancy with Balanced Focal Loss for Replay Attack Detection
- Voice Spoofing Detection Corpus for Single and Multi-order Audio Replays
- An Ensemble Based Approach for Generalized Detection of Spoofing Attacks to Automatic Speaker Recognizers
- Defense against adversarial attacks on spoofing countermeasures of ASV
- Multiple Points Input For Convolutional Neural Networks in Replay Attack Detection
- Auditory Inspired Spatial Differentiation for Replay Spoofing Attack Detection
- Attention-Based LSTM Algorithm for Audio Replay Detection in Noisy Environments
- Cross-domain replay spoofing attack detection using domain adversarial training
- Transmission Line Cochlear Model Based AM-FM Features for Replay Attack Detection
- Adversarial Attacks on Spoofing Countermeasures of automatic speaker verification
- Replay Spoofing Countermeasure Using Autoencoder and Siamese Network on ASVspoof 2019 Challenge
- Independent Modelling of Long and Short Term Speech Information for Replay Detection
- Voice livness detection based on pop-noise detector with phoneme information for speaker verification
- An end-to-end spoofing countermeasure for automatic speaker verificationusing evolving recurrent neural networks
- Deep Siamese Architecture Based Replay Detection for Secure VoiceBiometric
- Use of Claimed Speaker Models for Replay Detection
- Replay Attacks Detection Using Phase and Magnitude Features with Various Frequency Resolutions
- Performance evaluation of front- and back-end techniques for ASV spoofingdetection systems based on deep features
- Modulation Dynamic Features for the Detection of Replay Attacks
- Audio Replay Attack Detection Using High-Frequency Features
- Replay Attack Detection Using DNN for Channel Discrimination
- Investigating the use of Scattering Coefficients for Replay Attack Detection
- Constant Q cepstral coefficients: a spoofing countermeasure for automatic speaker verification
- Anti-spoofing Methods for Automatic Speaker Verification System
- Overview of BTAS 2016 Speaker Anti-spoofing Competition
- Voice Liveness Detection for Speaker Verification based on a Tandem Single/Double-channel Pop Noise Detector
- Cross-Database Evaluation of Audio-Based Spoofing Detection Systems
- Spoofing detection from a feature representationperspective
- Spoofing Speech Detection using Temporal Convolutional Neural Network
- Robust Deep Feature for Spoofing Detection - The SJTU System for ASVspoof 2015 Challenge
- A Comparison of Features for Synthetic Speech Detection
Link | Language | Description |
---|---|---|
SpeechBrain |
Python & PyTorch | SpeechBrain is an open-source and all-in-one speech toolkit based on PyTorch. |
SIDEKIT | Python | SIDEKIT is an open source package allow a rapid prototyping of an end-to-end speaker recognition system. |
pyAudioAnalysis |
Python | Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications. |
kaldi-asr |
C++ & Bash | A toolkit for speech & speaker recognition, intended for use by researchers and professionals. |
Alize LIA_SpkDet | C++ | ALIZÉ is an opensource platform for speaker recognition. LIA_SpkSeg is the tools for model training,feature normalization,socre normalization,etc. |
SPEAR Toolkit (based on BOB) | python | This package is part of the signal-processing and machine learning toolbox Bob. |
MSRidentity Toolbox | Matlab | This toolbox contains a collection of MATLAB tools and routines that can be used for research and development in speaker recognition. PDF |
- t-DCF: a Detection Cost Function for the Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification [MATLAB]&[Python]
- Asvspoof 2021 Evaluation Plan
Link | Language | Description |
---|---|---|
sklearn.cluster |
Python | scikit-learn clustering algorithms. |
PLDA |
Python | Probabilistic Linear Discriminant Analysis & classification, written in Python. |
PLDA |
C++ | Open-source implementation of simplified PLDA (Probabilistic Linear Discriminant Analysis). |
Auto-Tuning Spectral Clustering |
Python | Auto-tuning Spectral Clustering method that does not need development set or supervised tuning. |
Link | Method | Language | Description |
---|---|---|---|
resemble-ai/Resemblyzer |
d-vector | Python & PyTorch | PyTorch implementation of generalized end-to-end loss for speaker verification, which can be used for voice cloning and diarization. |
Speaker_Verification |
d-vector | Python & TensorFlow | Tensorflow implementation of generalized end-to-end loss for speaker verification. |
PyTorch_Speaker_Verification |
d-vector | Python & PyTorch | PyTorch implementation of "Generalized End-to-End Loss for Speaker Verification" by Wan, Li et al. With UIS-RNN integration. |
Real-Time Voice Cloning |
d-vector | Python & PyTorch | Implementation of "Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis" (SV2TTS) with a vocoder that works in real-time. |
deep-speaker |
d-vector | Python & Keras | Third party implementation of the Baidu paper Deep Speaker: an End-to-End Neural Speaker Embedding System. |
x-vector-kaldi-tf |
x-vector | Python & TensorFlow & Perl | Tensorflow implementation of x-vector topology on top of Kaldi recipe. |
kaldi-ivector |
i-vector | C++ & Perl | Extension to Kaldi implementing the standard i-vector hyperparameter estimation and i-vector extraction procedure. |
voxceleb-ivector |
i-vector | Perl | Voxceleb1 i-vector based speaker recognition system. |
pytorch_xvectors |
x-vector | Python & PyTorch | PyTorch implementation of Voxceleb x-vectors. Additionaly, includes meta-learning architectures for embedding training. Evaluated with speaker diarization and speaker verification. |
ASVtorch | i-vector | Python & PyTorch | ASVtorch is a toolkit for automatic speaker recognition. |
Link | Language | Description |
---|---|---|
LibROSA |
Python | Python library for audio and music analysis. https://librosa.github.io/ |
python_speech_features |
Python | This library provides common speech features for ASR including MFCCs and filterbank energies. https://python-speech-features.readthedocs.io/en/latest/ |
pyAudioAnalysis |
Python | Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications. |
Link | Language | Description |
---|---|---|
pyroomacoustics |
Python | Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios. https://pyroomacoustics.readthedocs.io |
gpuRIR |
Python | Python library for Room Impulse Response (RIR) simulation with GPU acceleration |
rir_simulator_python |
Python | Room impulse response simulator using python |
Link | Language | Description |
---|---|---|
Rawnet2 |
Python & Bash | End-to-End Neural Anti-spoofing. |
ReMASC |
Python | Realistic Replay Attack Corpus for Voice Controlled Systems. |
Attentive-Filtering-Network |
Python & Bash | University of Edinbrugh-Johns Hopkins University's system for ASVspoof 2017 Version 2.0 dataset. |
Audio | Type | Language | Pricing | Additional information |
---|---|---|---|---|
ASVspoof 2019 | PA(16.44Gb) , LA(7.116Gb) | en | Free | Evaluation Plan |
ASVspoof 2017 | PA-Train(200.7Mb), Dev(133.7Mb), Eval(1.065Gb) | en | Free | Evaluation Plan |
SAS Corpus | LA-SS_LARGE-16k (7.591Gb), SS_LARGE-48k (7.798Gb), SS_MARY_LARGE (7.303Gb), SS_SMALL-16k (7.582Gb), SS_SMALL-16k (7.582Gb), SS_SMALL-48k (7.788Gb), VC_C1 (10.00Gb), VC_EVC (6.518Gb), VC_FESTVOX (10.04Gb), VC_FS (10.15Gb), VC_GMM (9.830Gb), VC_KPLS (9.703Gb), VC_LSP (9.616Gb), VC_TVC (6.489Gb), human (3.229Gb) | en | Free | LICENSE |
ASVspoof 2015 | LA-Data - Part aa (7.543Gb),Data - Part ab (7.543Gb),Data - Part ac (7.331Gb) | en | Free | LICENSE |
- ASV2019 Training set
- ASV2015 Training set Part aa,ab,ac
Name | Utterances | Pricing | Additional information |
---|---|---|---|
AudioSet | 2M | Free | A large-scale dataset of manually annotated audio events. |
MUSAN | N/A | Free | MUSAN is a corpus of music, speech, and noise recordings. |
Name | Utterances | Speakers | Language | Pricing | Additional information |
---|---|---|---|---|---|
TIMIT | 6K+ | 630 | en | $250.00 | Published in 1993, the TIMIT corpus of read speech is one of the earliest speaker recognition datasets. |
VCTK | 43K+ | 109 | en | Free | Most were selected from a newspaper plus the Rainbow Passage and an elicitation paragraph intended to identify the speaker's accent. |
LibriSpeech | 292K | 2K+ | en | Free | Large-scale (1000 hours) corpus of read English speech. |
Multilingual LibriSpeech (MLS) | ? | ? | en, de, nl, es, fr, it, pt, po | Free | Multilingual LibriSpeech (MLS) dataset is a large multilingual corpus suitable for speech research. The dataset is derived from read audiobooks from LibriVox and consists of 8 languages - English, German, Dutch, Spanish, French, Italian, Portuguese, Polish. |
LibriVox | 180K | 9K+ | Multiple | Free | Free public domain audiobooks. LibriSpeech is a processed subset of LibriVox. Each original unsegmented utterance could be very long. |
VoxCeleb 1&2 | 1M+ | 7K | Multiple | Free | VoxCeleb is an audio-visual dataset consisting of short clips of human speech, extracted from interview videos uploaded to YouTube. |
The Spoken Wikipedia Corpora | 5K | 879 | en, de, nl | Free | Volunteer readers reading Wikipedia articles. |
CN-Celeb | 130K+ | 1K | zh | Free | A Free Chinese Speaker Recognition Corpus Released by CSLT@Tsinghua University. |
BookTubeSpeech | 8K | 8K | en | Free | Audio samples extracted from BookTube videos - videos where people share their opinions on books - from YouTube. The dataset can be downloaded using BookTubeSpeech-download. |
DeepMine | 540K | 1850 | fa, en | Unknown | A speech database in Persian and English designed to build and evaluate speaker verification, as well as Persian ASR systems. |
NISP-Dataset | ? | 345 | hi, kn, ml, ta, te (all Indian languages) | Free | This dataset contains speech recordings along with speaker physical parameters (height, weight, ... ) as well as regional information and linguistic information. |
Conference/Workshop | Frequency | Page Limit | Organization | Blind Review |
---|---|---|---|---|
ICASSP | Annual | 4 + 1 (ref) | IEEE | No |
InterSpeech | Annual | 4 + 1 (ref) | ISCA | No |
APSIPA | Annual | 4 + 1 (ref) | IEEE | Yes |
Odyssey | Biennial | 8 + 2 (ref) | ISCA | No |
SLT | Biennial | 6 + 2 (ref) | IEEE | Yes |
ASRU | Biennial | 6 + 2 (ref) | IEEE | Yes |
WASPAA | Biennial | 4 + 1 (ref) | IEEE | No |
-
ID R&D and Synaptics First to Deploy Voice Biometrics on NPU for Smart Home Applications by Vineet Ganju
Company | Product |
---|---|
Pindrop | Deep Voice Engine |
ID R&D | IDLive™ Voice |
VoiceAI | Voiceprint recognition API |
Kriston | Voiceprint API,SDK |