## RAVE
**RAVE** is a variational autoencoder for fast and high-quality neural audio synthesis by Antoine Caillon and Philippe Esling.
\
\
[Article on arxiv](https://arxiv.org/abs/2111.05011) & [Source code on Github](https://github.com/acids-ircam/RAVE)

---
<small>*This notebook has been written to set up a training with RAVE on kaggle.com using GPUs. It comes with additional Augment Audio Tools scripts by Materialvision.*
\
*Last updated: 26.05.2023*
\
*Author: Martin Heinze*

# Install Conda, RAVE, dependencies

In [None]:
#Install Conda
!mkdir /kaggle/temp
%cd /kaggle/temp
!curl -L https://repo.anaconda.com/miniconda/Miniconda3-py39_4.12.0-Linux-x86_64.sh -o miniconda.sh
!chmod +x miniconda.sh
!sh miniconda.sh -b -p /kaggle/temp/miniconda

#Upgrade ipython ipykernel, install ffmpeg
!/kaggle/temp/miniconda/bin/pip install --upgrade ipython ipykernel
!/kaggle/temp/miniconda/bin/conda install ffmpeg --yes

#Install RAVE
#Pin to specific release tag if necessary using 'acids-rave==tag.version‘
!/kaggle/temp/miniconda/bin/pip install acids-rave==2.1.18

#Uninstall Cublas library. For some reason this creates conflicts with the environment on Kaggle. 
!/kaggle/temp/miniconda/bin/pip uninstall nvidia_cublas_cu11 --yes

# Install and run Audio Augmentation Scripts

Taken from https://github.com/materialvision/augment_audio_tools:

This script provides a set of audio augmentations for machine learning purposes, in particular useful for the RAVE model. It allows you to process audio files by changing their speed, resampling, splitting stereo files into mono, adding silence, and creating chunks.

Documentation see https://github.com/materialvision/augment_audio_tools#usage

***These scripts are optional for RAVE trainings. Per default (and for initial training), the below section has been enabled, disable when you want to train on your dataset without preprocessing by these scripts or if you want to resume your training.***

In [None]:
#Install Audio Augmentation Script libraries
#
!/kaggle/temp/miniconda/bin/pip install numpy soundfile resampy
%cd /kaggle/temp/
!git clone https://github.com/materialvision/augment_audio_tools

#Process Audio files
%cd /kaggle/temp/augment_audio_tools/
!mkdir /kaggle/temp/dataset_augmented
!/kaggle/temp/miniconda/bin/python augment_audio_speed.py /kaggle/input/<yourdataset> /kaggle/temp/dataset_augmented --split_stereo --add_silence 5 --chunk_duration 15 --speed_change 0.3

# Preprocess the dataset and start initial training

Preprocessing is necessary once per training data set. In below section, the metadata from preprocessing is stored in '/kaggle/working/meta'. Use the filepath to your dataset on Kaggle as input path, this is usually '/kaggle/input/yourdataset'. 

While training the output is stored in a folder named 'yourtrainingname' followed by an underscore and a 10-character-string inside '/kaggle/working/runs'. '/kaggle/working/' also contains a 'status' folder with 'data.mdb' and 'lock.mdb' files.

***For initial training, the below section has been enabled, disable when you want to resume your training.***

In [None]:
#Preprocess dataset
!mkdir /kaggle/working/meta
!/kaggle/temp/miniconda/bin/rave preprocess --input_path /kaggle/temp/dataset_augmented/ --output_path /kaggle/working/meta/

#Initiate training
#Use config parameters as preferred for your training. The setup below is exemplary. For configuration options check https://github.com/acids-ircam/RAVE#training
%cd /kaggle/working
!/kaggle/temp/miniconda/bin/rave train --config v2.gin --config wasserstein --db_path /kaggle/working/meta/ --name yourtrainingname --override CAPACITY=128 --override PHASE_1_DURATION=400000

# Resume training
In order to resume training RAVE on Kaggle, you can transform the output data of the earlier run into a new dataset from the Output tab of your notebook. You then add the new dataset to the notebook, disable the above initial **audio augmentation** and **preprocess and training section** of the notebook and adjust the below section according to your initial training details.

***For initial training, the below section has been disabled, enable when you want to resume your training and disable the initial training section before running the notebook.***

In [None]:
##Copy contents of earlier training to /kaggle/working folder.
#!cp -r /kaggle/input/<rootfolderofyourearliertraining>/* /kaggle/working

#%cd /kaggle/working/

##Resume training
##Use config parameters as used in your earlier training.
#!/kaggle/temp/miniconda/bin/rave train --config v2.gin --config wasserstein --db_path /kaggle/working/meta/ --name yourtrainingname --override PHASE_1_DURATION=400000 --ckpt /kaggle/working/runs/<yourtrainingnamewitharandomstringintheend>/version_<XX>/checkpoints/

# Download your Kaggle Data
The Kaggle API allows downloading your training data from your notebooks conveniently. You can find information on setup and CLI commands here: https://github.com/Kaggle/kaggle-api

# Export model
After your training is finished, you can export a model (.ts Torchscript) file.
\
\
***The below section has been disabled. The training won't be finished automatically, you would have to run model export manually, either on this notebook/ runtime or locally on your machine. If the latter, make sure the setup is the same (e.g. RAVE version, dependencies, libraries...).*** 

In [None]:
#Model export. Use '--streaming' flag to export a model capable of real time processing and '--stereo' for a model with two decoders.

#!/kaggle/temp/miniconda/bin/rave export --run /kaggle/working/runs/<yourtrainingnamewitharandomstringintheend> --streaming --stereo