**This notebook was developed for the project of Deep Learning & Applied AI @Sapienza [2022/2023]**

**Task**: Students will be required to compare the source separation music performance of
the LQ-VAE model and the LASS model by re-training both models using publicly
available datasets. They will then train a model with a loss function as in LQ-VAE, but
using the technique of counting occurrences in the model codebook at inference time, as
is done in LASS. The project aims to assess whether this hybrid approach can lead to
better separation performance while maintaining efficiency at inference time.

I could not achieve this task with my own hardware, the following code is strucutred to run on **Google Colab**'s T4 GPU, the service offered is extremely useful however there are time restrictions (around 3 hours per day).

The project consists of three models:

1.   [LQVAE](https://github.com/michelemancusi/LQVAE-separation)  - [paper](https://arxiv.org/abs/2110.05313)
2.   [LASS](https://github.com/gladia-research-group/latent-autoregressive-source-separation)   - [paper](https://arxiv.org/abs/2301.08562)
3.   [HYBRID](https://github.com/Pieroni1704202/LQVAE-LASS-hybrid/tree/main)     (LASS+LQVAE)

All models leverage their architecture from the paper [Jukebox: A Generative Model for Music](https://arxiv.org/abs/2005.00341).

One of the first challenges was correctly installing the environment for Jukebox. A lower version of Python is required to run the code of all the models, usually this is easily done with a conda environment, however, Google Colab does not fully support the use of conda environments. One way to address the issue was to install an earlier version of Miniconda, which included the desired version of Python.

The data used to train the models is from [Synthesized Lakh (Slakh) Dataset](http://www.slakh.com/). The instruments used are bass and drums. From the entire dataset only 600 songs (22Khz) were extracted, 300 for bass and another 300 for drums, and then they were mixed pairwise to form 300 mixtures, this simple process can be found in the code *'slakh_scrape.py'*. The mixtures and sources (bass and drums) were finally divided into 210 for train and 90 for test. The reduced number of samples is due to the lack of computational resources.

Once the training and evaluation of LASS and LQVAE were completed, the transition to building the hybrid model occurred. The model includes a VQ-VAE with an enforced post-quantization linearization on its loss, imposing an algebraic structure on the latent space, as in LQVAE. However the likelihood will not be modeled through a σ-isotropic Gaussian. Instead, similar to LASS, it will be modeled through discrete conditionals.  




## MPI and Conda installation 💻

In [None]:
!sudo apt-get update
!sudo apt install mpich

In [None]:
%env PYTHONPATH=

In [None]:
%%bash
MINICONDA_INSTALLER_SCRIPT=Miniconda3-py37_4.12.0-Linux-x86_64.sh
MINICONDA_PREFIX=/usr/local
wget https://repo.continuum.io/miniconda/$MINICONDA_INSTALLER_SCRIPT
chmod +x $MINICONDA_INSTALLER_SCRIPT
./$MINICONDA_INSTALLER_SCRIPT -b -f -p $MINICONDA_PREFIX

In [None]:
import sys
_ = (sys.path
        .append("/usr/local/lib/python3.7/site-packages"))

In [None]:
import sys
sys.path

In [None]:
!which conda # should return /usr/local/bin/conda

In [None]:
!python --version

## Jukebox-environment 👷

In [None]:
!conda install mpi4py==3.0.3 -y
!pip install ffmpeg-python==0.2.0
!conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=11.0 -c pytorch -y

The project was developed on Google Colab, Google Drive functions as it's disk space.

In [None]:
from google.colab import drive
GDRIVE_DIR = '/content/drive'

drive.mount(GDRIVE_DIR, force_remount=True)

Mounted at /content/drive


**Run this cell to select either LASS, LQVAE, or HYBRID**


In [None]:
# %cd '/content/drive/MyDrive/Deep_Learning/LQVAE-LASS-hybrid'

############### OR

# %cd '/content/drive/MyDrive/Deep_Learning/LQVAE-separation'

############### OR

%cd '/content/drive/MyDrive/Deep_Learning/latent-autoregressive-source-separation/lass_audio'

/content/drive/MyDrive/Deep_Learning/latent-autoregressive-source-separation-main/lass_audio


Install the required libraries

In [None]:
!pip install -r requirements.txt

Install the selected jukebox implementation selected above 🎶

In [None]:
!pip install -e .

In [None]:
!pip install av==8.1.0
!pip install tensorboardX

Install and login WANDB ⚖

In [None]:
!pip install wandb -qU

In [None]:
%env WANDB__EXECUTABLE=/usr/local/bin/python
%env WANDB_API_KEY='################################'

In [None]:
!wandb login

# 1.LQVAE 🔵

## Lqvae train

In [None]:
!mpiexec -n 1 python jukebox/train.py --hps=vqvae --sample_length=131072 --bs=2 --audio_files_dir=../data/train/mix --labels=False --train --test --aug_shift --aug_blend --name=lq_vae --test_audio_files_dir=../data/test/mix

## Prior train

LQVAE - PRIOR - BASS

In [None]:
!mpiexec -n 1 python jukebox/train.py --hps=vqvae,small_prior,all_fp16,cpu_ema --name=pior_source --audio_files_dir=../data/train/bass --test_audio_files_dir=../data/test/bass --labels=False --train --test --aug_shift --aug_blend --prior --levels=3 --level=2 --weight_decay=0.01 --min_duration=24 --sample_length=524288 --bs=8 --n_ctx=8192 --sample=True --restore_vqvae=./logs/lq_vae/checkpoint_step_19160.pth.tar --restore_prior=./logs/pior_source/checkpoint_latest.pth.tar

LQVAE - PRIOR - DRUMS

In [None]:
!mpiexec -n 1 python jukebox/train.py --hps=vqvae,small_prior,all_fp16,cpu_ema --name=prior_drums --audio_files_dir=../data/train/drums --test_audio_files_dir=../data/test/drums --labels=False --train --test --aug_shift --aug_blend --prior --levels=3 --level=2 --weight_decay=0.01 --min_duration=24 --sample_length=524288 --bs=8 --n_ctx=8192 --sample=True --restore_vqvae=./logs/lq_vae/checkpoint_step_19160.pth.tar --restore_prior=./logs/prior_drums/checkpoint_latest.pth.tar

## Codebook precalc

In [None]:
%cd '/content/drive/MyDrive/Deep_Learning/LQVAE-separation/script'

In [None]:
!python codebook_precalc.py --save_path=../logs/codebook_sum_precalc.pt --restore_vqvae=../logs/lq_vae/checkpoint_step_19160.pth.tar --raw_to_tokens=64 --l_bins=2048 --sample_rate=22050 --commit=1.0 --emb_width=64

## Evaluation

 ### Bayesian Inference

In [None]:
!pip install ipykernel

In [None]:
%cd '/content/drive/MyDrive/Deep_Learning/LQVAE-separation/script'

/content/drive/MyDrive/Deep_Learning/LQVAE-separation-master/LQVAE-separation-master/script


In [None]:
!python bayesian_inference.py --shift=5 --path_1=../../data/test/drums/Track00001_1.wav --path_2=../../data/test/bass/Track00001_1.wav --restore_vqvae=../logs/lq_vae/checkpoint_step_19160.pth.tar --restore_priors '../logs/prior_drums/checkpoint_latest.pth.tar' '../logs/pior_source/checkpoint_latest.pth.tar' --sum_codebook=../logs/codebook_sum_precalc.pt --save_path ./results

### Bayesian test

This sequence of cells runs the evaluation of LQVAE: from twenty mixtures a single chunk of three seconds is extracted, this chunk is then separated using the method described in this [paper](https://arxiv.org/abs/2110.05313). To evaluate the results the generated sources are compared to the originals through Signal to Distortion Ratios, this metric is used also in both papers of LASS and LQVAE.

The lower results compared to the original can be attributed to the lower training time of the model, as said before the access to a GPU is restricted to three hours per day, this is the same reason why only twenty chunks were used to evaluate the model.


In [None]:
%cd '/content/drive/MyDrive/Deep_Learning/LQVAE-separation'

/content/drive/MyDrive/Deep_Learning/LQVAE-separation-master/LQVAE-separation-master


In [None]:
!mpiexec -n 1 python ./script/bayesian_test.py

# 2.LASS  🔴

A downgrade of protobuf is needed before training the vqvae

In [None]:
!pip install protobuf==3.20

Install diba

In [None]:
#install diba
%cd '/content/drive/MyDrive/Deep_Learning/latent-autoregressive-source-separation/diba'
!pip install .
%cd '/content/drive/MyDrive/Deep_Learning/latent-autoregressive-source-separation/lass_audio'

## Vqvae training

Train jukebox vqvae

In [None]:
!mpiexec -n 1 python ./jukebox/train.py --hps=vqvae --sample_length=131072 --bs=2 --audio_files_dir=../data/train/mix --labels=False --train --test --aug_shift --aug_blend --name=vq_vae --test_audio_files_dir=../data/test/mix

## Prior training

TRAIN PRIOR BASS LASS

In [None]:
!mpiexec -n 1 python ./jukebox/train.py --hps=vqvae,small_prior,all_fp16,cpu_ema --name=lass_prior_bass --audio_files_dir=../data/train/bass --test_audio_files_dir=../data/test/bass --labels=False --train --test --aug_shift --aug_blend --prior --levels=3 --level=2 --weight_decay=0.01 --min_duration=24 --sample_length=524288 --bs=8 --n_ctx=8192 --sample=True --restore_vqvae=./logs/vq_vae/checkpoint_step_19160.pth.tar

TRAIN PRIOR DRUMS LASS

In [None]:
!mpiexec -n 1 python ./jukebox/train.py --hps=vqvae,small_prior,all_fp16,cpu_ema --name=lass_prior_drums --audio_files_dir=../data/train/drums --test_audio_files_dir=../data/test/drums --labels=False --train --test --aug_shift --aug_blend --prior --levels=3 --level=2 --weight_decay=0.01 --min_duration=24 --sample_length=524288 --bs=8 --n_ctx=8192 --sample=True --restore_vqvae=./logs/vq_vae/checkpoint_step_19160.pth.tar

## Train Sums

### Copy all the sources in the same dir

In [None]:
!find /content/drive/MyDrive/Deep_Learning/data/train/drums -name "*.wav" -exec sh -c 'cp "$1" "/content/drive/MyDrive/Deep_Learning/data/train_sums/drums_$(basename "$1")"' _ {} \;

In [None]:
!find /content/drive/MyDrive/Deep_Learning/data/train/bass -name "*.wav" -exec sh -c 'cp "$1" "/content/drive/MyDrive/Deep_Learning/data/train_sums/bass_$(basename "$1")"' _ {} \;

### Run train sums

Compute an approximation of distribution of sums of latent codes in a VQ-VAE, from 9000 onward it ran out of memory and a manual restarts were required.

In [None]:
!mpiexec -n 1 python ./lass/train_sums.py --epochs=100  --vqvae-path=./logs/vq_vae/checkpoint_step_19160.pth.tar --audio-files-dir=../data/train_sums  --output-dir=./logs/vqvae_sum_distribution --sample-length=5.944308 --sample-rate=22050 --save-iters=250

## Evaluation



This sequence of cells runs the evaluation of LASS: from twenty mixtures a single chunk of three seconds is extracted, this chunk is then separated using the method described in this [paper](https://arxiv.org/abs/2301.08562). To evaluate the results the generated sources are compared to the originals through Signal to Distortion Ratios, this metric is used also in both papers of LASS and LQVAE.

The lower results compared to the original can be attributed to the lower training time of the model, as said before the access to a GPU is restricted to three hours per day, this is the same reason why only twenty chunks were used to evaluate the model.


In [None]:
!mpiexec -n 1 python ./lass/separate.py

# 3.HYBRID 🟣

This model is an hybrid of LASS and LQVAE, the idea is to enforce a post-quantization linearization on the loss of the vqvae as in LQVAE and use discrete conditionals to model likelihood function as in LASS.

## Copy files and run train sums

In [None]:
!lfind /content/drive/MyDrive/Deep_Learning/data/train/drums -name "*.wav" -exec sh -c 'cp "$1" "/content/drive/MyDrive/Deep_Learning/data/train_sums/drums_$(basename "$1")"' _ {} \;

In [None]:
!lfind /content/drive/MyDrive/Deep_Learning/data/train/bass -name "*.wav" -exec sh -c 'cp "$1" "/content/drive/MyDrive/Deep_Learning/data/train_sums/bass_$(basename "$1")"' _ {} \;

In [None]:
!mpiexec -n 1 python ./lass/train_sums.py --epochs=100  --vqvae-path=./logs/lq_vae/checkpoint_step_19160.pth.tar --audio-files-dir=../data/train_sums  --output-dir=./logs/vqvae_sum_distribution --sample-length=5.944308 --sample-rate=22050 --save-iters=250

## Evaluation

This sequence of cells runs the evaluation of LQVAE-LASS-hybrid: from twenty mixtures a single chunk of three seconds is extracted, this chunk is then separated. To evaluate the results the generated sources are compared to the originals through Signal to Distortion Ratios, this metric is used also in both papers of LASS and LQVAE.

The lower results compared to the original can be attributed to the lower training time of the model, as said before the access to a GPU is restricted to three hours per day, this is the same reason why only twenty chunks were used to evaluate the model.

In [None]:
!mpiexec -n 1 python ./script/bayesian_test.py