Discriminative Multi-modality Speech Recognition

In this paper, we propose a two-stage speech recognition model. In the first stage, the target voice is separated from background noises with help from the corresponding visual information of lip movements, making the model 'listen' clearly. At the second stage, the audio modality combines visual modality again to better understand the speech by a MSR sub-network, further improving the recognition rate.

Paper

Paper(Arxiv)

Preparation

First of all, clone the code

git clone https://github.com/JackSyu/AE-MSR.git

Then, create a folder:

cd AE-MSR && mkdir data

Requirement

Python 3.5
Tensorflow 1.12.0.
CUDA 9.0 or higher. 
MATLAB (optionally）

Data preprocessing

LRS3:
Download or use your own data.
Extract the video frames and crop lip area.

cd preprocessing
python dataset_tfrecord_trainval.py

Training & Testing

We train the audio enhancement sub-network and the MSR sub-network separately.

python Train_Audio_Visual_Speech_Enhancement.py
python Train_Audio_Visual_Speech_Recognition.py

Then we freeze the AE sub-network and complete the subsequent joint training.

python Train_AE_MSR.py
Python Test_AE_MSR.py

Citation

If you find our code useful, please consider citing:

@InProceedings{Xu_2020_CVPR,
author = {Xu, Bo and Lu, Cheng and Guo, Yandong and Wang, Jacob},
title = {Discriminative Multi-Modality Speech Recognition},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
models		models
preprocessing		preprocessing
README.md		README.md
Test_AE_MSR.py		Test_AE_MSR.py
Train_AE_MSR.py		Train_AE_MSR.py
Train_Audio_Visual_Speech_Enhancement.py		Train_Audio_Visual_Speech_Enhancement.py
Train_Audio_Visual_Speech_Recognition.py		Train_Audio_Visual_Speech_Recognition.py
configuration.py		configuration.py
input.py		input.py
statistic.py		statistic.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Discriminative Multi-modality Speech Recognition

Paper

Preparation

Requirement

Data preprocessing

Training & Testing

Citation

About

Releases

Packages

Languages

JackSyu/AE-MSR

Folders and files

Latest commit

History

Repository files navigation

Discriminative Multi-modality Speech Recognition

Paper

Preparation

Requirement

Data preprocessing

Training & Testing

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages