# AFTER: Audio Features Transfer and Exploration in Real-time

[AFTER](https://github.com/acids-ircam/AFTER) is a diffusion-based generative model that creates new audio by blending two sources: one audio stream to set the style or timbre, and another input (either audio or MIDI) to shape the structure over time.

This repository is a real-time implementation of the [research paper](https://arxiv.org/abs/2408.00196) *Combining audio control and style transfer using latent diffusion* by Nils Demerlé, P. Esling, G. Doras, and D. Genova.

----

Please note that right now, this notebook currently *only* covers training **AFTER audio-to-audio models** using **pre-trained autoencoders** (e.g. RAVE models). 

Therefore, in order to use this notebook, you need: 
* an autoencoder model exported **without streaming** for preprocessing and training
* the same autoencoder model exported **with streaming** enabled for model export 

----

Notebook author: [Martin Heinze](https://github.com/devstermarts)

Last updated: 16.04.2025

## Setup runtime

Set up Miniconda with Python 3.11, then install AFTER from GitHub on the runtime. 

In [None]:
#Install Miniconda

!mkdir /kaggle/temp
%cd /kaggle/temp
!curl -L https://repo.anaconda.com/miniconda/Miniconda3-py311_24.11.1-0-Linux-x86_64.sh -o miniconda.sh
!chmod +x miniconda.sh
!sh miniconda.sh -b -p /kaggle/temp/miniconda

In [None]:
#Install AFTER

%cd /kaggle/temp
!git clone -b wip https://github.com/devstermarts/AFTER.git
%cd /kaggle/temp/AFTER
!/kaggle/temp/miniconda/bin/pip install -e .

## Preprocess dataset for audio-to-audio model

Preprocessing needs to be done once per training. You can use both autoencoders trained with AFTER source code (not covered here) or e.g. RAVE. In this notebook, the assumption is that you have a pre-trained autoencoder at hand. Note that for both preprocessing and training later, you need that autoencoder **without streaming** enabled. 

In [None]:
#AFTER dataset preprocessing

!mkdir /kaggle/temp/processed

!/kaggle/temp/miniconda/bin/after prepare_dataset \
--input_path /kaggle/input/your-audio-folder \
--output_path /kaggle/temp/processed \
--emb_model_path /kaggle/input/your-rave-model-without-streaming.ts \
--gpu 0

## Train audio-to-audio model

Below section covers training an audio-to-audio AFTER model. Again, use your pre trained autoencoder model **without streaming** here.

In order to resume training, you need to transform the output data of the first run into a new dataset from the output tab of your notebook after the first run is complete. You then add the new dataset to the notebook, enable below copy command from your earlier training output to your runtime's working directory and set the --restart flag in training section to the steps of the checkpoint you want to resume from. 

In [None]:
#Copy files to /kaggle/working folder to continue training from an earlier checkpoint.
#!cp -r /kaggle/input/your-earlier-training-output/* /kaggle/working

In [None]:
#AFTER model training. Use --restart flag when you continue an earlier training. 

!/kaggle/temp/miniconda/bin/after train  \
--name your-training-name \
--db_path /kaggle/temp/processed \
--emb_model_path /kaggle/input/your-rave-model-without-streaming.ts \
--config base \
--out_path /kaggle/working/after_runs/ #\
#--restart steps-of-latest-checkpoint 

## Export audio-to-audio model

To export your AFTER model for real time use with nn~, you need to use an autoencoder with **streaming enabled** (opposed to pre processing and training, where you used the model *without* streaming). 

In [None]:
#AFTER model export

!/kaggle/temp/miniconda/bin/after export \
--model_path your-training-name \
--emb_model_path /kaggle/input/your-rave-model-WITH-streaming.ts \
--step 800000