Autoregressive Generation and Source Separation of Multi-source Raw Music using Residual Quantization

In this project you can find an implementation of an autoregressive transformer-based generative model, trained with the discrete codes produced by a neural audio codec model based on the residual quantized variational autoencoder architecture. The model can generate new music and separate audio sources, making a step toward a general audio model. The model is trained on the Slakh dataset.

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

This project requires Python and Conda. If you don't have them installed, please do so first.

Preparing Data

This step is optional and required only if you are interested in training a model from scratch or reproducing the results. Detailed instructions to download the Slakh dataset can be found here.

Installing

To set up the environment for this project, follow these steps:

Clone the repository:

git clone <repository_url>

Navigate to the project directory
Create a new Conda environment using the environment.yml file:

conda env create -f environment.yml

Activate the new Conda environment:

conda activate torch

Usage

This section of the README walks through how to train and sample from a model. You can decide to train, test or sampling both from the RQ-VAE and the RQTransformer. The first is used to separate sources and the second to generate new music.

Train

To start the training of the RQ-VAE, go to config.py and set to True config.IS_TRAINING and config.IS_TRAINING_AE and then use the following command:

python main.py

To train the RQTransformer you need to download the checkpoint the RQ-VAE and set to False config.IS_TRAINING_AE and then run the following command:

python main.py

The training is integrate with wandb, so if you want to have all the useful information in your wandb account, you need to set the key for the login and the project name in run.py. Finally you have to set to True config.IS_WANDB.

Test

To test the RQ-VAE, so the source separation perfomance, you can download the pretrained model here. To test the RQTRansformer you can download the pretrained model here. Note that to test the RQTransformer you need both models, because the RQTransformer uses the quantized representations of the RQ-VAE. After you have installed the files you need to unzip them and move them to data/checkpoints/RQ-VAE or data/checkpoints/RQTransformer.

To start the testing of the RQ-VAE, go to config.py and set to True only config.IS_TESTING and config.IS_TRAINING_AE and then use the following command:

python main.py

To start the testing of the RQTransformer, go to config.py and set to True only config.IS_TESTING and then use the following command:

python main.py

Separation

The checkpoint provided of the RQ-VAE reach a test SI-SDRi (scale-invariant signal to distortion ratio improvement) of 11.4916 in the source separation task. In data/separations you can find some examples of separation, under the name of recon_* and the corresponding original source.

Generation

If you want to generate some new music you can set to True config.IS_SAMPLING and run the following command

python main.py

Note that at the moment the performance of the RQTransformer are not so good so the music generated won't be pleasant to listen to, if you have some ideas for improvement please let me know.

Other implementations

VQ-VAE and VAE are also implemented. For the VQ-VAE the best test SI-SDRi reached is 6.8866, for VAE is -3.2843. As expected they both underperform with respect of RQ-VAE.

Report

If you are curious and want to know more details on the tasks, the models architecture and the experiments you can read a short report.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.vscode		.vscode
__pycache__		__pycache__
data/separations		data/separations
models		models
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py
constants.py		constants.py
dataset.py		dataset.py
environment.yml		environment.yml
main.py		main.py
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Autoregressive Generation and Source Separation of Multi-source Raw Music using Residual Quantization

Getting Started

Prerequisites

Preparing Data

Installing

Usage

Train

Test

Separation

Generation

Other implementations

Report

About

Releases

Packages

Languages

License

LeonardoBerti00/Source-Separation-of-Multi-source-Music-using-Residual-Quantization

Folders and files

Latest commit

History

Repository files navigation

Autoregressive Generation and Source Separation of Multi-source Raw Music using Residual Quantization

Getting Started

Prerequisites

Preparing Data

Installing

Usage

Train

Test

Separation

Generation

Other implementations

Report

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages