# RAVE

**RAVE** is a variational autoencoder for fast and high-quality neural audio synthesis by Antoine Caillon and Philippe Esling. [Article on arxiv](https://arxiv.org/abs/2111.05011) & [Source code on Github](https://github.com/acids-ircam/RAVE)

This notebook supports Victor Shepardson's [fork of RAVE](https://github.com/victor-shepardson/RAVE) which introduces a few great features and configurations for training a model. 

----

Notebook author: [Martin Heinze](https://github.com/devstermarts)

Last updated: 13.01.2025

## Install Miniconda, RAVE, dependencies

In [None]:
#Install Miniconda
!mkdir /kaggle/temp
%cd /kaggle/temp
!curl -L https://repo.anaconda.com/miniconda/Miniconda3-py39_4.12.0-Linux-x86_64.sh -o miniconda.sh
!chmod +x miniconda.sh
!sh miniconda.sh -b -p /kaggle/temp/miniconda

#Upgrade ipython ipykernel, install ffmpeg
!/kaggle/temp/miniconda/bin/pip install --upgrade ipython ipykernel
!/kaggle/temp/miniconda/bin/conda install ffmpeg --yes

In [None]:
#Install RAVE fork
%cd /kaggle/temp
!git clone https://github.com/victor-shepardson/RAVE.git
!RAVE_VERSION=2.4.0b CACHED_CONV_VERSION=2.6.0b /kaggle/temp/miniconda/bin/pip install -e RAVE

## Preprocess training and validation dataset
Preprocessing is necessary once per training. In below section, the data from preprocessing is stored in '/kaggle/working/processed'. Use the filepath to your dataset as input path, this is usually '/kaggle/input/your_audio_dataset'. \

In [None]:
#Preprocess dataset
!mkdir /kaggle/working/processed
!/kaggle/temp/miniconda/bin/rave preprocess \
--input_path /kaggle/input/your_audio_dataset/ \
--output_path /kaggle/working/processed/ \
--channels 2 #no. of audio channels 1=mono; 2=stereo...

The fork allows replacing the default validation dataset (X% of the training dataset) with a custom validation dataset, which needs to be preprocessed separately and can be added with flag '--val_db_path' in training later. If you want to go with the default setting, just delete the 'Preprocess validation dataset' below and remove the flag from training and resume sections. \
Note that the dedicated validation dataset does not have any impact on the training itself but gives you more control in such way that you can decide on representative items from your training dataset as opposed to a random selection of it. 

In [None]:
#Preprocess validation dataset
!mkdir /kaggle/working/validation
!/kaggle/temp/miniconda/bin/rave preprocess \
--input_path /kaggle/input/your_validation_dataset \
--output_path /kaggle/working/validation/ \
--channels 2 #no. of audio channels 1=mono; 2=stereo...

## Start initial training
While training the output is stored in a folder named 'your_training_name' followed by an underscore and a 10-character-string inside '/kaggle/working/runs'. '/kaggle/working/' also contains a 'status' folder with 'data.mdb' and 'lock.mdb' files. \
Note that there are a lot of training configuration options some of which cannot be combined. The documentation on this topic is sparse so you might want to check code yourself or ask the creators. 

***For initial training, the below section has been enabled, disable when you want to resume your training.***

In [None]:
#Initiate training
#Use config parameters as preferred for your training. 
#The setup below is exemplary. For configuration options check https://github.com/victor-shepardson/RAVE
#The fork comes with additional training parameters and options. Check train.py for details. 
%cd /kaggle/working
!/kaggle/temp/miniconda/bin/rave train \
--config v2.gin \
--config wasserstein \
--db_path /kaggle/working/processed/ \
--name your_training_name \
--override CAPACITY=128 \
--override PHASE_1_DURATION=2000000 \
--val_db_path /kaggle/working/validation/ \ #in case you want to use a separate, preprocessed validation dataset
--channels 2 #no. of audio channels 1=mono; 2=stereo...

## Resume training
In order to conveniently resume training RAVE on Kaggle, you can transform the output data of the first run into a new dataset from the output tab of your notebook. You then add the new dataset to the notebook, disable the above initial, **Preprocess the dataset** and **Start initial training** sections of the notebook and adjust the below section according to your initial training configuration.

To further progress on the training after the first run, update the processed dataset by creating a new version from the output of the latest run. Make sure to check for updates on the dataset in your notebook before resuming the training.

***When you want to resume your training, disable the initial training section above before running the notebook.***

In [None]:
#Copy contents of earlier training to /kaggle/working folder.
!cp -r /kaggle/input/root_folder_of_your_earlier_training/* /kaggle/working
%cd /kaggle/working/

#Resume training
#Use config parameters as used in your earlier training.
!/kaggle/temp/miniconda/bin/rave train \
--config v2.gin \
--config wasserstein \
--db_path /kaggle/working/processed/ \
--name your_training_name \
--override CAPACITY=128 \
--override PHASE_1_DURATION=2000000 \
--val_db_path /kaggle/working/validation/ \ #in case you want to use a separate, preprocessed validation dataset
--channels 2 \ #no. of audio channels 1=mono; 2=stereo...
--ckpt /kaggle/working/runs/your_training_name_with_a_random_string_in_the_end/version_X/checkpoints/epochXXX.ckpt #point to a specific checkpoint file to continue training

## Export model
After your training is finished, you can export a model (.ts Torchscript) file.

***For the export, start a notebook session (don't save and run this notebook). In the session, make sure to run the setup cells for Miniconda and RAVE and dependencies install before running export.*** 

In [None]:
#Model export. 
#Use '--streaming' flag to export a model capable of real time processing.
#For more export configurations and options check https://github.com/victor-shepardson/RAVE -> export.py for details. 

#Export model
!/kaggle/temp/miniconda/bin/rave export \
--run /kaggle/input/root_folder_of_your_earlier_training/runs/your_training_name_with_a_random_string_in_the_end \
--streaming \
--channels 2 \ #no. of audio channels 1=mono; 2=stereo...
--output /kaggle/working/ \
--named your-model-name