GitHub - YashNita/Co-Separating-Sound-Object-

Co-Separating Sounds of Visual Objects

Co-Separating Sounds of Visual Objects
Ruohan Gao¹ and Kristen Grauman^1,2
¹UT Austin, ²Facebook AI Research
In International Conference on Computer Vision (ICCV), 2019

If you find our code or project useful in your research, please cite:

 @inproceedings{gao2019coseparation,
   title = {Co-Separating Sounds of Visual Objects},
   author = {Gao, Ruohan and Grauman, Kristen},
   booktitle = {ICCV},
   year = {2019}
 }

Generate noisy object detections

We use the public PyTorch implementation of Faster R-CNN (https://github.com/jwyang/faster-rcnn.pytorch) to train an object detector with a ResNet-101 backbone. The object detector is trained on ∼30k images of 15 object categories from the Open Images dataset. The 15 object categories include: Banjo, Cello, Drum, Guitar, Harp, Harmonica, Oboe, Piano, Saxophone, Trombone, Trumpet, Violin, Flute, Accordion, and Horn. The pre-trained detector is shared at Google Drive. Please refer to https://github.com/jwyang/faster-rcnn.pytorch for instructions on how to use the pre-trained object detector or train a new detector on categories of your interest. Use the pretrained-detector to generate object detections for both training and testing set, and save the object detection results of each video as one .npy file under /your_data_root/detection_results/. See Supp. for how we reduce the noise of the obtained detections.

Co-Separation training

Use the following command to train your co-separation model:

export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
python train.py --name audioVisual --hdf5_path /your_root/hdf5/soloduet/ --scene_path /your_root/hdf5/ADE.h5 --gpu_ids 0,1,2,3,4,5,6,7 --batchSize 80 --nThreads 32 --display_freq 10 --save_latest_freq 500 --niter 1 --validation_freq 200 --validation_batches 20 --num_batch 35000 --lr_steps 15000 30000 --classifier_loss_weight 0.05 --coseparation_loss 1 --unet_num_layers 7 --lr_visual 0.00001 --lr_unet 0.0001 --lr_classifier 0.0001 --weighted_loss --visual_pool conv1x1 --optimizer adam --log_freq True --with_additional_scene_image --tensorboard True --validation_visualization True |& tee -a log.txt

Co-Separation testing

Use the following command to mix and separate two solo videos using the your trained co-separation model or the shared model pre-trained on MUSIC dataset:

python test.py --video1_name video1_name --video2_name video2_name --visual_pool conv1x1 --unet_num_layers 7 --data_path /your_data_root/MUSIC_dataset/solo/ --weights_visual pretrained_models/audioVisual/visual_latest.pth --weights_unet pretrained_models/audioVisual/unet_latest.pth --weights_classifier pretrained_models/audioVisual/classifier_latest.pth  --num_of_object_detections_to_use 5 --with_additional_scene_image --scene_path /your_root/hdf5/ADE.h5 --output_dir_root results/

Acknowlegements

Thanks to Dongguang You for help with initial experiments setup. Portions of the code are adapted from the 2.5D Visual Sound implementation (https://github.com/facebookresearch/2.5D-Visual-Sound) and the Sound-of-Pixels implementation (https://github.com/hangzhaomit/Sound-of-Pixels). Please also refer to the original License of these projects.

Licence

The code in this repository is CC BY 4.0 licensed, as found in the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
models		models
options		options
utils		utils
LICENSE		LICENSE
README.md		README.md
co_separation.png		co_separation.png
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

models

models

options

options

utils

utils

LICENSE

LICENSE

README.md

README.md

co_separation.png

co_separation.png

test.py

test.py

train.py

train.py

Repository files navigation

Co-Separating Sounds of Visual Objects

Generate noisy object detections

Co-Separation training

Co-Separation testing

Acknowlegements

Licence

About

Releases

Packages

Contributors 2

Languages

License

YashNita/Co-Separating-Sound-Object-

Folders and files

Latest commit

History

Repository files navigation

Co-Separating Sounds of Visual Objects

Generate noisy object detections

Co-Separation training

Co-Separation testing

Acknowlegements

Licence

About

Resources

License

Stars

Watchers

Forks

Languages