# Stable Audio Tools

**Stable Audio Tools** is an open-source library developed and provided by Stability AI for training and inference of generative audio models. It supports text-to-audio generation, audio-to-audio prompting, and audio inpainting.

Sourcecode: https://github.com/Stability-AI/stable-audio-tools

-----

Notebook author: [Martin Heinze](https://github.com/devstermarts)

Last updated: 17.01.2025

## Setup

In [None]:
#Install miniconda (Python 3.10)

!mkdir /kaggle/temp/
%cd /kaggle/temp

!curl -L https://repo.anaconda.com/miniconda/Miniconda3-py310_23.10.0-1-Linux-x86_64.sh -o miniconda.sh
!chmod +x miniconda.sh
!sh miniconda.sh -b -p /kaggle/temp/miniconda

In [None]:
#Install SAT from GitHub

%cd /kaggle/temp

!git clone https://github.com/Stability-AI/stable-audio-tools.git
%cd stable-audio-tools
!/kaggle/temp/miniconda/bin/pip install .

In [None]:
#Force reinstall of compatible versions of torch, numpy, flash-attn - placeholder for more elegant version later.

!/kaggle/temp/miniconda/bin/pip install --force-reinstall torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1
!/kaggle/temp/miniconda/bin/pip install --force-reinstall numpy==1.*
!/kaggle/temp/miniconda/bin/pip install flash-attn

### Setup wandb

To setup the connection to your wandb dashboard, add a secret in Kaggle notebook editor via -> Add-ons -> Secrets. 
Replace YOUR-SECRET-NAME in the code below with the identifier you've chosen for your API-key. 

In [None]:
!/kaggle/temp/miniconda/bin/pip install wandb weave

In [None]:
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
secret_value_0 = user_secrets.get_secret("YOUR-SECRET-NAME")

import os
os.environ['WANDB_API_KEY'] = secret_value_0

In [None]:
!/kaggle/temp/miniconda/bin/wandb login

## Autoencoder training

Setup your training configuration as detailed in SAT's documentation as .json file and provide it to the notebook e.g. in a separate dataset. Note that the training configuration also contains the path to your training data set. 

Choose the architecture you want to train (e.g. encodec_musicgen_rvq.json) from the set of configs in the respective folder of the SAT package. 

In [None]:
%cd /kaggle/temp/stable-audio-tools

!/kaggle/temp/miniconda/bin/python3 train.py \
--dataset-config /path/to/your/training-config.json \
--model-config ./stable_audio_tools/configs/model_configs/autoencoders/encodec_musicgen_rvq.json \
--name your-training-name\
--batch-size 8 \
--checkpoint-every 1500 \
--num-workers 4 \
--save-dir /kaggle/working/ \
#--ckpt-path /path/to/your/checkpoint/epoch=1000-step=100000.ckpt
