Introduction

This repository contains a basic reimplementation of Codec-ASV (https://ieeexplore.ieee.org/abstract/document/10888177), the speaker recognition project using Neural Audio Compression model as input feature.

Codec-ASV uses its private pretrained Encodec model which was not onensourced, so some design in this project is differenent from the original paper, especially the Embedding projection and MLP part should be noticed. All the essential components in the paper have been implemented without a complete training process, my advice is that training from C1+F3 may set a nice reference, but still, More efforts are needed for better results fron each combination of Embedding Init and Embedding Fuse methods.

The compression model is the official release from Facebook's paper "High Fidelity Neural Audio Compression" and here is github page https://github.com/facebookresearch/encodec

The backend ASV model ECAPA-TDNN uses the reimplementation from Dr. Tao, https://github.com/TaoRuijie/ECAPA-TDNN, it's the backbone of this project, so if you are new to ASV or have any question about the original code, visiting his project would be helpful.

Dependencies

Note: That is the setting based on my device, you can modify the torch and torchaudio version based on your device.

Start from building the environment

conda create -n ECAPA python=3.7.9 anaconda
conda activate ECAPA
pip install -r requirements.txt

Start from the existing environment

pip install -r requirements.txt

Data preparation

Downloading from huggingface

The folder 'data_preparation' includes the bash instructions I transplanted from wespeaker project https://github.com/wenet-e2e/wespeaker , the download address of both vox1 and vox2 on huggingface are both valid, other older links may not work. Some local modification may be necessary before using.

The training list and test trials for vox1 are in the folder 'speechlist'

Another reference

You can also follow the official code to perpare your VoxCeleb2 dataset from the 'Data preparation' part in this repository.

Dataset for training usage:

VoxCeleb2 training set;
MUSAN dataset;
RIR dataset.

Dataset for evaluation:

VoxCeleb1 test set for Vox1_O
VoxCeleb1 train set for Vox1_E and Vox1_H (Optional)

Training

Then you can change the data path in the trainECAPAModel.py. Train ECAPA-TDNN model end-to-end by using:

python -u trainECAPAModel.py --save_path exps/exp1

Every test_step epoches, system will be evaluated in Vox1_O set and print the EER.

The result will be saved in saved_exps/exp1/score.txt. The model will saved in saved_exps/exp1/model

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data_preparation		data_preparation
speeh_list		speeh_list
.gitignore		.gitignore
ECAPAModel.py		ECAPAModel.py
README.md		README.md
dataLoader.py		dataLoader.py
exps		exps
loss.py		loss.py
model.py		model.py
requirements.txt		requirements.txt
run.sh		run.sh
tools.py		tools.py
trainECAPAModel.py		trainECAPAModel.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Introduction

Dependencies

Data preparation

Downloading from huggingface

Another reference

Training

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

BetterZane/Codec-ASV_reimplementation

Folders and files

Latest commit

History

Repository files navigation

Introduction

Dependencies

Data preparation

Downloading from huggingface

Another reference

Training

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages