This repository contains the code for BFRNet.
Filter-Recovery Network for Multi-Speaker Audio-Visual Speech Separation
Haoyue Cheng,
Zhaoyang Liu,
Wayne Wu
and Limin Wang
- Download the VoxCeleb2 test mixture lists from the following link:
https://pan.xunlei.com/s/VNXTbMyuZOijYSvNAFJPmVOvA1?pwd=wxtt#
- Create directory "voxceleb2" in the main directory BFRNet, and move the mixture files to directory "voxceleb2".
# Directory structure of the VoxCeleb2 dataset:
# ├── VoxCeleb2
# │ └── [mp4] (contain the face tracks)
# │ └── [train]
# │ └── [spk_id]
# │ └── [video_id]
# │ └── [clip_id]
# │ └── .mp4 files
# │ └── [val]
# │ └── [mouth] (contain the audio files and mouth roi files)
# │ └── [train]
# │ └── [spk_id]
# │ └── [video_id]
# │ └── [clip_id]
# │ └── .h5 files, .wav files
# │ └── [val]
# Directory structure of the lrs2/lrs3 dataset:
# ├── lrs2/lrs3
# │ └── [main] (contain the face tracks, audio files, and mouth roi files)
# │ └── [video_id]
# │ └── .wav files, .npz files, .mp4 files
- Please contact with chenghaoyue98@gmail.com to download datasets.
- Train the model with slurm:
GPUS=[GPUS] GPUS_PER_NODE=[GPUS_PER_NODE] bash train_slurm.sh [PARTITION] [JOB_NAME]
- torch.distributed training:
NNODES=[NNODES] GPUS_PER_NODE=[GPUS_PER_NODE] bash train_dist.sh [JOB_NAME]
- Download the pre-trained networks from the following link:
https://drive.google.com/drive/folders/1J0qxFMb7NVbsXQwM4HiOJ1u7MI0pUquO
- Create directory "checkpoints" in the main directory BFRNet, and move the models to directory "checkpoints".
- Evaluate the models on VoxCeleb2 unseen_2mix test set:
mix_number=2 test_file="anno/unseen_2mix.txt" bash test.sh inference_unseen_2mix