Live Music Diffusion Models

Training and inference code for Live Music Diffusion Models (LMDMs): streaming, autoregressive music diffusion models. Models generate audio block-by-block over a sliding context window, supporting live generation. Huge shout-out to the Stable Audio folks, where this codebase draws heavy inspiration from.

This is our public facing code repo. For access to development code used during the project, please reach out to znovack@ucsd.edu or brade@mit.edu.

Install

$ pip install .

Requires PyTorch 2.5+ (Flash / Flex Attention). Developed against Python 3.10.

Models

Two attention regimes, each available as a plain finetune or as an ARC-forcing model:

Config	Attention	Type
`saos_encdec.json`	enc-dec (bidirectional context)	finetune
`saos_block_causal.json`	block-causal (sliding-window causal)	finetune
`saos_arc_forcing_encdec.json`	enc-dec	ARC-forcing
`saos_arc_forcing_block_causal.json`	block-causal	ARC-forcing

Configs live in stable_audio_tools/configs/model_configs/txt2audio/.

Training

python train.py \
    --model-config stable_audio_tools/configs/model_configs/txt2audio/<config>.json \
    --dataset-config <your_dataset>.json \
    --pretrained-ckpt-path /path/to/base.ckpt \
    --save-dir ./checkpoints \
    --batch-size 40 --precision 16-mixed --name <run-name>

Training should proceed in two stages:

Finetune: use saos_encdec.json or saos_block_causal.json. This mirrors standard diffusion finetuning and has the same overall memory bandwidth. Initialize this with your standard favorite music diffusion model (SAO, SAO-Small).
ARC-forcing: use saos_arc_forcing_encdec.json or saos_arc_forcing_block_causal.json. ARC configs set training.arc.self_forcing and pull the teacher/discriminator from the base model; the attention regime is set by training.inpainting.mask_kwargs.context_router_attention_pattern. This should be initialized from your finetuned LMDM in the first step. Note that the memory bandwidth here will increase as a function of the rollout length, so plan accordingly.

See train.sh for an end-to-end launch example. Training defaults are in defaults.ini.

Inference

Streaming block-AR generation goes through generate_diffusion_cond_blockar — it denoises one block_size block at a time over a sliding context window, optionally reusing a KV cache for fast streaming. Set context_router_attention_pattern to match the model ("enc-dec" or "block-causal") and pass use_kv_cache=True for streaming.

A runnable end-to-end example (loading a checkpoint, building conditioning, calling the function, and decoding) is in notebooks/inference.ipynb.

Roadmap

Sketch-control training
More detailed accompaniment training support
ONNX export pipeline
Interface setup

Citation

If you use this repo, please cite us at:

@article{novack2026lmdm,
  title         = {Live Music Diffusion Models: Efficient Fine-Tuning and Post-Training of Interactive Diffusion Music Generators},
  author        = {Novack, Zachary and Brade, Stephen and Kim, Haven and Flores Garc{\'i}a, Hugo and Shikarpur, Nithya and Talegaonkar, Chinmay and Kim, Suwan and Chen, Valerie K. and McAuley, Julian and Berg-Kirkpatrick, Taylor and Huang, Cheng-Zhi Anna},
  journal       = {arXiv preprint arXiv:2605.22717},
  year          = {2026},
  archivePrefix = {arXiv},
  eprint        = {2605.22717},
  primaryClass  = {cs.SD},
  url           = {https://arxiv.org/abs/2605.22717}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.claude		.claude
.vscode		.vscode
LICENSES		LICENSES
docs		docs
notebooks		notebooks
scripts		scripts
stable_audio_tools		stable_audio_tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
defaults.ini		defaults.ini
environment.yml		environment.yml
pre_encode.py		pre_encode.py
pyproject.toml		pyproject.toml
run_gradio.py		run_gradio.py
setup.py		setup.py
train.py		train.py
train.sh		train.sh
unwrap_model.py		unwrap_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Live Music Diffusion Models

Install

Models

Training

Inference

Roadmap

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Live Music Diffusion Models

Install

Models

Training

Inference

Roadmap

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages