Skip to content

A basic reimplementation of the paper "Codec-ASV: Exploring Neural Audio Codec For Speaker Representation Learning".

Notifications You must be signed in to change notification settings

BetterZane/Codec-ASV_reimplementation

Repository files navigation

Introduction

This repository contains a basic reimplementation of Codec-ASV (https://ieeexplore.ieee.org/abstract/document/10888177), the speaker recognition project using Neural Audio Compression model as input feature.

Codec-ASV uses its private pretrained Encodec model which was not onensourced, so some design in this project is differenent from the original paper, especially the Embedding projection and MLP part should be noticed. All the essential components in the paper have been implemented without a complete training process, my advice is that training from C1+F3 may set a nice reference, but still, More efforts are needed for better results fron each combination of Embedding Init and Embedding Fuse methods.

The compression model is the official release from Facebook's paper "High Fidelity Neural Audio Compression" and here is github page https://github.com/facebookresearch/encodec

The backend ASV model ECAPA-TDNN uses the reimplementation from Dr. Tao, https://github.com/TaoRuijie/ECAPA-TDNN, it's the backbone of this project, so if you are new to ASV or have any question about the original code, visiting his project would be helpful.

Dependencies

Note: That is the setting based on my device, you can modify the torch and torchaudio version based on your device.

Start from building the environment

conda create -n ECAPA python=3.7.9 anaconda
conda activate ECAPA
pip install -r requirements.txt

Start from the existing environment

pip install -r requirements.txt

Data preparation

Downloading from huggingface

The folder 'data_preparation' includes the bash instructions I transplanted from wespeaker project https://github.com/wenet-e2e/wespeaker , the download address of both vox1 and vox2 on huggingface are both valid, other older links may not work. Some local modification may be necessary before using.

The training list and test trials for vox1 are in the folder 'speechlist'

Another reference

You can also follow the official code to perpare your VoxCeleb2 dataset from the 'Data preparation' part in this repository.

Dataset for training usage:

  1. VoxCeleb2 training set;

  2. MUSAN dataset;

  3. RIR dataset.

Dataset for evaluation:

  1. VoxCeleb1 test set for Vox1_O

  2. VoxCeleb1 train set for Vox1_E and Vox1_H (Optional)

Training

Then you can change the data path in the trainECAPAModel.py. Train ECAPA-TDNN model end-to-end by using:

python -u trainECAPAModel.py --save_path exps/exp1 

Every test_step epoches, system will be evaluated in Vox1_O set and print the EER.

The result will be saved in saved_exps/exp1/score.txt. The model will saved in saved_exps/exp1/model

About

A basic reimplementation of the paper "Codec-ASV: Exploring Neural Audio Codec For Speaker Representation Learning".

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •