<img src="https://raw.githubusercontent.com/Torsion-Audio/Scyclone/main/assets/pictures/interface.png" width="100%"/>

__Scyclone__ is an audio plugin developed by **Torsion Audio** that utilizes neural timbre transfer technology to offer a new approach to audio production. The plugin builds upon [RAVE](https://github.com/acids-ircam/RAVE) [Antoine Cailon et. al], a realtime audio variational auto encoder, facilitating neural timbre transfer in both single and couple inference mode.

This enables a new artificial layering technique to be applied on the incoming signal in creating richer drum pallets, fuller atmospheres or simply transferring the timbre of the raw signal to another sound pallet. To further control the behaviour and production of the neural networks, we have internally equipped the plugin with signal processing modules allowing the user to shape, control and embellish the source and target timbres in a distinct manner.

Scyclone comes with two pre-trained models, **funk_drums** trained on four hours of data inspired by the captivating sounds of vintage drum-breaks and **Djembe**, trained on five hours of carefully compiled Djembe dataset.


This notebook allows you to train your own models on customized datasets to use as presets inside the plugin. Although we encorage the interested reader to study the RAVE [article](https://arxiv.org/pdf/2111.05011.pdf) and visit the well-written [github](https://github.com/acids-ircam/RAVE) repository, in this notebook we have gathered all the necessary information required to successfully train a model without holding any former knowledge. The notebook is strcutured as follows:

\



*   **Pre-requsites**
*   **Dataset**
*   **Training**
*   **Export**


\


There are several options available for training machine learning models, including both local machines and cloud platforms. For simplicity's sake, we have chosen Google Colab as it offers a convenient all-in-one solution. It's important to mention that our selection of Google Colab does not imply any endorsement of Google or any other cloud computing service provider.

<img src="https://raw.githubusercontent.com/Torsion-Audio/Scyclone/main/assets/pictures/logo.png" alt="alt" width="1%"/>  **`Torsion Audio`**














# 1. Pre-requisites

To upload your customized dataset and save training checkpoints, you can use Google Drive. If you do not have a Google Drive account yet, follow the instructions on this link:
> https://accounts.google.com/AccountChooser?service=writely

Once you have set-up an account, log-in to Colab and run the cell to mount the drive on this session. To run a cell, simply press the *play* button on the left-handside of each cell. 




In [None]:
#@title 1.1 Mount Google Drive
from google.colab import drive
drive.mount('content')

Mounted at content




Model training is a resource-intensive process that demands significant time and computational power. Machine learning practitioners often rely on the accelerated computing capabilities of GPUs to expedite model acquisition and optimization. Google Colab addresses this need by providing access to GPUs, allowing users to leverage their computational capabilities. The availability and duration of GPU access in Google Colab may vary depending on the user's subscription method or plan.

\

> ### 1.2 Purchase Compute Units
To access faster GPUs, we need to purchase compute units. Compute units is a currency used by Google Colab to calculate GPU utilisation. To purchase compute units:
```
1. Click on RAM/Disk button on the top right-hand side of this notebook
2. Under "Resources", click on "Learn more."
3. Follow the instruction and purchase computation units
```
To provide you with insights, here is the estimate amount of compute units spent to train the **funk_Drums** preset in Scyclone:
```
GPU Type:     A100
C Units:      750
Usage rate:   ~13 C Units/hr
Runtime:      60hrs
Dataset size: 3:31:00
```

\
Please note that the GPU type and dataset size influences the **usage rate** and the **compute units** required to train a model. You can also initiate a training without purchasing any compute units. In this case Google assigns you an interruptable medium-class GPU which greatly influences the time it takes to obtain an optimal sounding model. 

Once you have purchased enough compute units, you are all set to move on to the next step. You can see your available compute units under **Recourses** on the top right-hand side of this notebook.We will now install the necessary libraries and dependencies required for the training. You can **Install Dependencies** by running it's respective cell below. This might take several minutes.

In [None]:
#@title 1.3 Install Dependencies

!curl -L https://repo.anaconda.com/miniconda/Miniconda3-py39_4.12.0-Linux-x86_64.sh -o miniconda.sh
!chmod +x miniconda.sh
!sh miniconda.sh -b -p /content/miniconda
!/content/miniconda/bin/pip install --quiet acids-rave==2.1.7 #test without foring the version
!/content/miniconda/bin/pip install --quiet pytorch_lightning==1.9.0
!/content/miniconda/bin/pip install cached-conv==2.4.1
!/content/miniconda/bin/pip install onnx
!/content/miniconda/bin/pip install --quiet --upgrade ipython ipykernel
!/content/miniconda/bin/conda install ffmpeg -y
!/content/miniconda/bin/pip install effortless_config
!/content/miniconda/bin/apt-get install unzip
#!git clone https://github.com/Torsion-Audio/Scyclone-AI/tree/main
!nvidia-smi

# 2. Dataset


> ### 2.1 Compiling Datasets
Percussive sound datasets exhibit greater compatibility with RAVE timbre transfer models, based on our extensive experience in the field. In order to facilitate effective model learning from shared sound structures, it is prudent to ensure alignment of sound characteristics during dataset creation. This can be accomplished by focusing on specific percussion instruments or instrument groups. Please note that this shouldn't discourage you from experimenting with different datasets other than tbose with percussive characteristcs.

> The acquisition of data can be achieved through the process of sound recording or by gathering samples from production libraries or musical compositions. These samples may consist of individual percussive sounds (one-shots) or longer recordings (loops and musical pieces). To ensure optimal training outcomes, it is advisable to curate a well-balanced collection of sounds within the dataset, while simultaneously avoiding recordings that possess excessive room reverberations. 

> Furthermore, the size of the dataset should be proportional to its diversity. For datasets comprising single percussion instruments, a minimum duration of 3 hours is recommended, whereas larger sound palettes should encompass at least 6 hours of audio material.

> ### 2.2 Upload Dataset
Once you have your dataset zipped and stored on your local drive, you can easily upload it to Google Drive by following these steps:
```
1. On the top left-hand side of this page, click on the "folder" icon and open the browser
2. Navigate to /content/MyDrive/
3. Right-click on "MyDrive" folder and create a "New Folder"
4. Rename the folder to "scyclone"
5. In the same directory, create two new folders and rename them to "dataset" and "ckpt"
6. Drag and Drop the .zip dataset into the "dataset" folder
```
This might take several minutes based on the size of your dataset. You'll now the upload has been successful onces the **blue circular progress bar** vanishes on the bottom-right of the browser.



> ### 2.3 Unzip Data
To unzip you need to provide the path to the **data_zip** and **dataset_dir**. Inside the Scyclone directory, create a folder and rename it to **dataset_unzip**. The paths should look like this now:
```
data_zip:     /content/content/MyDrive/scyclone/dataset/NameOfYourDataset.zip
dataset_dir:  /content/content/MyDrive/scyclone/dataset/
```
Copy and paste the directories to the **Unzip data** cell and run it. This might also take several minutes.



In [None]:
#@title Unzip data
import os

os.mkdir("/content/content/MyDrive/scyclone/dataset/dataset_unzip") 

data_zip = ""     #@param {type:"string"}
dataset_dir = ""  #@param {type:"string"}

%cd dataset_dir

!unzip $data_zip -d dataset_dir

# 3. Training

> RAVE comes with different architectures and training methods. In our experiments we realised **v1** and **onnx** to be more stable and suitable for our use case in comparison to other architectures. You are welcome to experiment with the architectures and configurations as you see fit. However, please consider that configurations other that the ones mentioned below might lead to an unsuitable outcome in respect to Scyclone.
```
1. Set the name of the training. This could be the preset name
2. Set the dataset_dir to the directory where you have unzipped the dataset
3. Set the save_dir to the directory where you want to save the checkpoints
4. Architecture allows you to choose the RAVE architecture you are willing to train. We recommend v1 or onnx
5. Currently SCYCLONE handles 48kHz of sampling rate. Therefore, we set the "sr" to 48000
```
The variables in the **Train** cell should look similar to:
```
name = "preset_name"          
dataset_dir = "./content/content/MyDrive/scyclone/dataset/dataset_unzip/"    
save_dir = "./content/content/MyDrive/scyclone/ckpt/"       
architecture = "v1"
sr = 48000   
```





> Run the cell to initiate the training.


In [None]:
#@title Train


name = ""           #@param {type:"string"}
dataset_dir = ""    #@param {type:"string"}
save_dir = ""       #@param {type:"string"}
architecture = "v1" #@param ["v1"]
sr = 48000          #@param [48000]

#set the architecture
if architecture == 'v1':
  architecture = '/content/Scyclone-AI/architectures/scyclone-config-v1.gin'


%cd /content/
!mkdir dataset
%cd $save_dir
preprocessed_dataset = "/content/dataset"


!/content/miniconda/bin/rave preprocess --input_path $dataset_dir --output_path $preprocessed_dataset --sampling_rate $sr
!/content/miniconda/bin/rave train --config $architecture --db_path $preprocessed_dataset --name $name --override LATENT_SIZE=16 --override CAPACITY=32

If the training was interrupted or you had to pause the training, you can use the **Resume** cell to continue from the last checkpoint.
On that end, please ensure that all parameters and hyperparameters in the **Resume** cell are identical to the ones in the **Train** cell. 
Supposing:

**name = funk_drums** 

Checkpoints should be saved like this:


```
./path/to/checkpoint_folder/runs/funk_drums_123456/version_0/checkpoints/last.ckpt
./path/to/checkpoint_folder/runs/funk_drums_123456/version_0/checkpoints/last-v1.ckpt

```


The numbers next to the name in the directory path are randomly generated identifiers and could differ to the ones above. Copy the path to the **last-v1.ckpt** or **last.ckpt** and paste it infront of **resume_dir** and run the cell. You will also recieve a new **version** number everytime you continue the interrupted training; where the new checkpoints will be saved.



In [None]:
#@title Resume 

name = ""           #@param {type:"string"}
dataset_dir = ""    #@param {type:"string"}
save_dir = ""       #@param {type:"string"}
architecture = "v1" #@param ["v1"]
sr = 48000       #@param [48000]
resume_dir = "" #@param {type:"string"}


if architecture == 'v1':
  architecture = '/content/Scyclone-AI/architectures/scyclone-config-v1.gin'


%cd /content/
!mkdir dataset
%cd $save_dir
preprocessed_dataset = "/content/dataset"


# !/content/miniconda/bin/rave train --helpfull
!/content/miniconda/bin/rave preprocess --input_path $dataset_dir --output_path $preprocessed_dataset --sampling_rate $sr
!/content/miniconda/bin/rave train --config $architecture --db_path $preprocessed_dataset --name $name --ckpt $resume_dir --override LATENT_SIZE=16 --override CAPACITY=32

# 3.1 Training done?

> There are numerous factors determining the required number of iterations to obtain an acceptable sounding model; discussing which is beyond the scope of this notebook. However, to provide you with some insights, here are sizes and the number of epochs of two models we trained:
```
name:    FUNK_DRUMS
size:    3:31:00 hrs
epochs:  ~6360
Iters:   ~3.5 Million
```
```
name:   DJEMBE
size:   3:48:00 hrs
epochs: ~5200
Iters:  ~3 Million
```

> On that end, feel free to experiment with the number of iterations meaning to train the model for more or less iterations as the ones mentioned above. 

# 4. Export

Once the training is finished, copy the path to run directory and paste it in front of **run_dir** below and run the cell. The path should look similar to:


```
./path/to/checkpoint_folder/runs/funk_drums_123456

```

If you have experienced a disconnection from the runtime, it is necessary to remount Google Drive and reinstall the dependencies that are essential for exporting an **.onnx** model and performing the conversion to **.ort** format.

In [None]:
#@title 4.0 Remount Google Drive and reinstall dependencies

from google.colab import drive
drive.mount('content')

!curl -L https://repo.anaconda.com/miniconda/Miniconda3-py39_4.12.0-Linux-x86_64.sh -o miniconda.sh
!chmod +x miniconda.sh
!sh miniconda.sh -b -p /content/miniconda
!/content/miniconda/bin/pip install --quiet acids-rave==2.1.7 #test without foring the version
!/content/miniconda/bin/pip install --quiet pytorch_lightning==1.9.0
!/content/miniconda/bin/pip install cached-conv==2.4.1
!/content/miniconda/bin/pip install onnx
!/content/miniconda/bin/pip install --quiet --upgrade ipython ipykernel
!/content/miniconda/bin/conda install ffmpeg -y
!/content/miniconda/bin/pip install effortless_config
!git clone https://github.com/Torsion-Audio/Scyclone-AI/tree/main

In [None]:
#@title 4.1 Export Model
import os

#Export .onnx
run_dir = "/content/content/MyDrive/RAVE_TRAINING/onnx/jungle/runs/jungle_final_917935e402/runs/jungle_final_917935e402" #@param {type:"string"}
!/content/miniconda/bin/rave export_onnx --run $run_dir


#install .onnx required for ort conversion
!pip install onnxruntime==1.14.1
!pip install onnx==1.13.1


#paths for onnx to ort conversion
model_name = run_dir.split('/')[-1]
onnx_model_path = os.path.join(run_dir, model_name + '.onnx')
ort_save_path  = run_dir


# Create a directory to hold the ONNX model
!mkdir -p "$ort_save_path"
!python -m onnxruntime.tools.convert_onnx_models_to_ort "$onnx_model_path" --enable_type_reduction


print('model exported succesfully')

In [None]:
#@title 4.2 Download Model

from google.colab import files
files.download(ort_save_path + "/{}.ort".format(model_name))

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

You have now successfully exported the model!
<img src="https://raw.githubusercontent.com/Torsion-Audio/Scyclone-AI/tree/main/assets/load_model.png" width="100%"/>



```
1. Save the .ort file to your local drive
2. Open Scyclone
3. Hover over one of the network nodes and select the preset loader icon on the network arm
4. Select the trained .ort model and click "open"
5. Now you have the model imported and it's ready for synthesis!

```

