AudioVisualLip

Audio-Visual Voice Biometrics is a audio-visual speaker recognition task, which leverages auditory and visual speech in a video. The portrait- and linguistic-based speaker characteristics are extracted via the temporal dynamics modeling. It involves the conventional speaker recognition and lip biometrics tasks.

Introduction

This is the official implementation of ICASSP23 paper CROSS-MODAL AUDIO-VISUAL CO-LEARNING FOR TEXT-INDEPENDENT SPEAKER VERIFICATION.

Datasets

Please turn to the ./preprocessing to extract lips for the training and test datasets.

Running

After getting the lip data of training sets and test sets, you could run ./main_audiovisuallip_DATASET_CM.py for training and testing with only switching the stage in the code. When doing this, be sure to change the ./conf/config_audiovisuallip_DATASET_new.yaml to your own configuration.

Results

pretrained models

You could find the pretrained audio-only and visual-only model here: https://drive.google.com/drive/folders/1IalsNtmDH-qFnfgmn_O92J1MUHCaQepl?usp=sharing

Reference

AVLip:

@inproceedings{liu2023cross,
  title={Cross-Modal Audio-Visual Co-Learning for Text-Independent Speaker Verification},
  author={Liu, Meng and Lee, Kong Aik and Wang, Longbiao and Zhang, Hanyi and Zeng, Chang and Dang, Jianwu},
  booktitle={ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={1--5},
  year={2023},
  organization={IEEE}
}

DeepLip:

@inproceedings{liu2021deeplip,
  title={DeepLip: A Benchmark for Deep Learning-Based Audio-Visual Lip Biometrics},
  author={Liu, Meng and Wang, Longbiao and Lee, Kong Aik and Zhang, Hanyi and Zeng, Chang and Dang, Jianwu},
  booktitle={2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)},
  pages={122--129},
  year={2021},
  organization={IEEE}
}

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
conf		conf
data		data
models		models
preprocess		preprocess
utils		utils
wenet		wenet
README.md		README.md
datasets_new.py		datasets_new.py
main_audio_lrs3_new.py		main_audio_lrs3_new.py
main_audio_vox_new.py		main_audio_vox_new.py
main_audiovisuallip_lrs3_CM.py		main_audiovisuallip_lrs3_CM.py
main_audiovisuallip_vox_CM.py		main_audiovisuallip_vox_CM.py
main_visual_lip_lrs3_new.py		main_visual_lip_lrs3_new.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AudioVisualLip

Introduction

Datasets

Running

Results

pretrained models

Reference

About

Releases

Packages

Languages

DanielMengLiu/AudioVisualLip

Folders and files

Latest commit

History

Repository files navigation

AudioVisualLip

Introduction

Datasets

Running

Results

pretrained models

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages