Overview:
This project utilizes the Open-Unmix source separation modeling architecture
to produce separated audio to the loudspeakers as it is being processed.
The original Open-Unmix repository can be found here.
Open-Unmix utilizes 3 bidirectional LSTM layers to generate a spectral mask of its targeted source.
The final separation is produced by Wiener filtering the original mixed signal with the estimated spectral mask.
The online, streaming version was accomplished by training unidirectional LSTM models
and implementing a producer-consumer multithreading system in Python.
Included in the 'models' folder are trained models for sung vocals and spoken speech targets.
These were uploaded using git lfs and may require lfs in order to obtain them locally.
The model for sung vocals was trained using the MUSDB dataset
and the spoken speech model was trained using a subset of 7000 examples
from Mozilla's Common Voice dataset and 7000 samples from the UrbanSound8k dataset of urban noise.
Examples:
Given the provided models, the program can separate sung vocals from a musical mix
or speech from environmental noise.
When evaluating music files, either sung vocals or the backing instruments may be extracted.
python3 unmix_stream.py path_to_music_file.wav acapella
python3 unmix_stream.py path_to_music_file.wav instrumental
python3 unmix_stream.py path_to_noisy_speech_file.wav speech
References:
Stöter, F.R., Uhlich, S., Liutkus, A., Mitsufuji, Y. (2019). Open-Unmix - A Reference Implementation for Music Source Separation. Journal of Open Source Software, Open Journals, 4(41), 1667.
Open-Unmix Repository
Mozilla (2017). Mozilla Common Voice.
Common Voice Dataset
Salamon, J., Jacoby, C., & Bello, J. P. (2014, November). A dataset and taxonomy for urban sound research. In Proceedings of the 22nd ACM international conference on Multimedia (pp. 1041-1044). ACM.
UbanSound Dataset Paper