This repository contains a basic reimplementation of Codec-ASV (https://ieeexplore.ieee.org/abstract/document/10888177), the speaker recognition project using Neural Audio Compression model as input feature.
Codec-ASV uses its private pretrained Encodec model which was not onensourced, so some design in this project is differenent from the original paper, especially the Embedding projection and MLP part should be noticed. All the essential components in the paper have been implemented without a complete training process, my advice is that training from C1+F3 may set a nice reference, but still, More efforts are needed for better results fron each combination of Embedding Init and Embedding Fuse methods.
The compression model is the official release from Facebook's paper "High Fidelity Neural Audio Compression" and here is github page https://github.com/facebookresearch/encodec
The backend ASV model ECAPA-TDNN uses the reimplementation from Dr. Tao, https://github.com/TaoRuijie/ECAPA-TDNN, it's the backbone of this project, so if you are new to ASV or have any question about the original code, visiting his project would be helpful.
Note: That is the setting based on my device, you can modify the torch and torchaudio version based on your device.
Start from building the environment
conda create -n ECAPA python=3.7.9 anaconda
conda activate ECAPA
pip install -r requirements.txt
Start from the existing environment
pip install -r requirements.txt
The folder 'data_preparation' includes the bash instructions I transplanted from wespeaker project https://github.com/wenet-e2e/wespeaker , the download address of both vox1 and vox2 on huggingface are both valid, other older links may not work. Some local modification may be necessary before using.
The training list and test trials for vox1 are in the folder 'speechlist'
You can also follow the official code to perpare your VoxCeleb2 dataset from the 'Data preparation' part in this repository.
Dataset for training usage:
-
VoxCeleb2 training set;
-
MUSAN dataset;
-
RIR dataset.
Dataset for evaluation:
Then you can change the data path in the trainECAPAModel.py
. Train ECAPA-TDNN model end-to-end by using:
python -u trainECAPAModel.py --save_path exps/exp1
Every test_step
epoches, system will be evaluated in Vox1_O set and print the EER.
The result will be saved in saved_exps/exp1/score.txt
. The model will saved in saved_exps/exp1/model