Skip to content
Unsupervised Any-to-many Audiovisual Synthesis via Exemplar Autoencoders
Python Shell
Branch: master
Clone or download
Latest commit 0234ad1 Jan 14, 2020
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
audioUtils Update audiovisual synthesis Dec 29, 2019
data Update audiovisual synthesis Dec 29, 2019
example Add example run Jan 12, 2020
scripts
tmp Update Audiovisual Jan 12, 2020
vocoder Update Audio2audio Dec 27, 2019
.gitignore Update Audiovisual Jan 12, 2020
README.md Update arXiv Jan 14, 2020
model_vc.py Update audiovisual synthesis Dec 29, 2019
model_video.py Update Audio2audio Dec 27, 2019
requirements.txt Update audiovisual synthesis Dec 29, 2019
saveWav.py Update Audio2audio Dec 27, 2019
test_audio.py Update Audiovisual Jan 12, 2020
test_audiovisual.py Update Audiovisual Jan 12, 2020
train_audio.py Add example run Jan 12, 2020
train_audiovisual.py Add example run Jan 12, 2020

README.md

Audiovisual-Synthesis

Unsupervised Any-to-many Audiovisual Synthesis via Exemplar Autoencoders

Kangle Deng, Aayush Bansal, Deva Ramanan

project page / demo / arXiv

This repo provides a PyTorch Implementation of our work.

Acknowledgements: This code borrows heavily from Auto-VC and Tacotron.

Summary Video

Dependencies

First, make sure ffmpeg installed on your machine.

Then, run: pip install -r requirements.txt

Data

We provide our CelebAudio Dataset at link.

Train

Voice Conversion

Check 'scripts/train_audio.sh' for an example of training a Voice-Conversion model. Make sure directory 'logs' exist.

Generally, run:

python train_audio.py --data_path PATH_TO_TRAINING_DATA --experiment_name EXPERIMENT_NAME --save_freq SAVE_FREQ --test_path_A PATH_TO_TEST_AUDIO --test_path_B PATH_TO_TEST_AUDIO --batch_size BATCH_SIZE --save_dir PATH_TO_SAVE_MODEL

Audiovisual Synthesis

Check 'scripts/train_audiovisual.sh' for an example of training a Audiovisual-Synthesis model. We usually train an audiovisual model based on a pretrained audio model.

1-stage generation -- video resolution: 256 * 256

Generally, run:

python train_audiovisual.py --video_path PATH_TO_TRAINING_DATA --experiment_name EXPERIMENT_NAME --save_freq SAVE_FREQ --test_path PATH_TO_TEST_AUDIO --batch_size BATCH_SIZE --save_dir PATH_TO_SAVE_MODEL --use_256 --load_model LOAD_MODEL_PATH

2-stage generation -- video resolution: 512 * 512

If you want the video resolution to be 512 * 512, use the StackGAN-style 2-stage generation.

Generally, run:

python train_audiovisual.py --video_path PATH_TO_TRAINING_DATA --experiment_name EXPERIMENT_NAME --save_freq SAVE_FREQ --test_path PATH_TO_TEST_AUDIO --batch_size BATCH_SIZE --save_dir PATH_TO_SAVE_MODEL --residual --load_model LOAD_MODEL_PATH

Test

Voice Conversion

Check 'scripts/test_audio.sh' for an example of testing a Voice-Conversion model.

To convert a wavfile using a trained model, run:

python test_audio.py --model PATH_TO_MODEL --wav_path PATH_TO_INPUT --output_file PATH_TO_OUTPUT

Audiovisual Synthesis

Check 'scripts/test_audiovisual.sh' for an example of testing a Audiovisual-Synthesis model.

1-stage generation -- video resolution: 256 * 256

python test_audiovisual.py --load_model PATH_TO_MODEL --wav_path PATH_TO_INPUT --output_file PATH_TO_OUTPUT --use_256 

2-stage generation -- video resolution: 512 * 512

python test_audiovisual.py --load_model PATH_TO_MODEL --wav_path PATH_TO_INPUT --output_file PATH_TO_OUTPUT --residual
You can’t perform that action at this time.