Skip to content

Notsk1/SoundSeparationImplementation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multi Modal Sound Source Separation

This repository contains a machine learning model used for sound source separation. Model uses Visual and audio features to separate different audio sources from single duet video. Model has similar structure to the model introduced in paper https://arxiv.org/pdf/1804.03160.pdf.

This project was done as a part of a bachelor's thesis which studies the modern approach for sound source separation.

Getting Started

Following the instructions below allows user to run the model on wanted data. Data used in the study was the dataset downloadable from https://github.com/roudimit/MUSIC_dataset.

Prerequisites

  • Anaconda or another virtual environment manager.
  • Some video data which has a singular music/sound source (preferable the same data used in the study)
  • Python libraries in requirements.txt file

Installing

Clone the repository to your local computer and then set up desired settings from arguments.py.

Training the networks

Before training the networks, data needs to be stored and the path to data defined. Train the audio model with command "python main.py" and setting "trainAudio" has to be True. If only visual network training desired, "trainAudio" must be False and "trainFrame" must be True.

Evaluating the model

After the networks has been trained, you can evaluate the model with setting "evalCombined" being True and then running the main script. To evaluate the performance on duet videos, run script with setting "evalDuet" being True.

Model Output

Model outputs two spectrograms that can be formed back to a separated audio track. For training, two videos are combined and their mixture is used to train the model and as an output, the two videos' audios are separated from the mixture. As for using the model on duet audio, the input is a video with two instruments playing and the output is each instrument track separately.

Training Output

Example training sample output:

Input audio mixture spectogram:

InputSpec

Instrument 1 Frame:

frame0

Instrument 1 Ground Truth Audio Spectrogram Binary Mask:

GTMask0

Instrument 1 Predicted Audio Spectrogram Binary Mask:

outputMask0

Instrument 2 Frame:

frame1

Instrument 2 Ground Truth Audio Spectrogram Binary Mask:

GTMask1

Instrument 2 Predicted Audio Spectrogram Binary Mask:

outputMask1

Duet Output

Two duet outputs:

duet_output

Authors

Petteri Nuotiomaa Github:Notsk1

Acknowledgments

As mentioned before, the techniques used were inspired from many sources, but most prominently from the paper https://arxiv.org/pdf/1804.03160.pdf.

About

Bachelor's thesis project repository implementing a sound source separation machine learning model.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages