Multi Modal Sound Source Separation

This repository contains a machine learning model used for sound source separation. Model uses Visual and audio features to separate different audio sources from single duet video. Model has similar structure to the model introduced in paper https://arxiv.org/pdf/1804.03160.pdf.

This project was done as a part of a bachelor's thesis which studies the modern approach for sound source separation.

Getting Started

Following the instructions below allows user to run the model on wanted data. Data used in the study was the dataset downloadable from https://github.com/roudimit/MUSIC_dataset.

Prerequisites

Anaconda or another virtual environment manager.
Some video data which has a singular music/sound source (preferable the same data used in the study)
Python libraries in requirements.txt file

Installing

Clone the repository to your local computer and then set up desired settings from arguments.py.

Training the networks

Before training the networks, data needs to be stored and the path to data defined. Train the audio model with command "python main.py" and setting "trainAudio" has to be True. If only visual network training desired, "trainAudio" must be False and "trainFrame" must be True.

Evaluating the model

After the networks has been trained, you can evaluate the model with setting "evalCombined" being True and then running the main script. To evaluate the performance on duet videos, run script with setting "evalDuet" being True.

Model Output

Model outputs two spectrograms that can be formed back to a separated audio track. For training, two videos are combined and their mixture is used to train the model and as an output, the two videos' audios are separated from the mixture. As for using the model on duet audio, the input is a video with two instruments playing and the output is each instrument track separately.

Training Output

Example training sample output:

Input audio mixture spectogram:

Instrument 1 Frame:

Instrument 1 Ground Truth Audio Spectrogram Binary Mask:

Instrument 1 Predicted Audio Spectrogram Binary Mask:

Instrument 2 Frame:

Instrument 2 Ground Truth Audio Spectrogram Binary Mask:

Instrument 2 Predicted Audio Spectrogram Binary Mask:

Duet Output

Two duet outputs:

Authors

Petteri Nuotiomaa Github:Notsk1

Acknowledgments

As mentioned before, the techniques used were inspired from many sources, but most prominently from the paper https://arxiv.org/pdf/1804.03160.pdf.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
dataset		dataset
models		models
.gitignore		.gitignore
README.md		README.md
arguments.py		arguments.py
main.py		main.py
requirements.txt		requirements.txt
utils.py		utils.py
viz.py		viz.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi Modal Sound Source Separation

Getting Started

Prerequisites

Installing

Training the networks

Evaluating the model

Model Output

Training Output

Duet Output

Authors

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Multi Modal Sound Source Separation

Getting Started

Prerequisites

Installing

Training the networks

Evaluating the model

Model Output

Training Output

Duet Output

Authors

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages