Official repository for the paper
"Neural Proxies for Sound Synthesizers: Learning Perceptually Informed Preset Representations"
published in the Journal of the Audio Engineering Society (JAES).
This repository provides:
- Dataset generation for synthesizer presets
- Training of neural proxies (preset encoders)
- Evaluation on a sound-matching downstream task
→ Audio examples are available on the project website.
→ The repository for the audio models evaluation can be found here.
→ The published version of the paper is available on JAES's website here, while the Author's Accepted Manuscript (AAM) is available on arXiv.
- PyTorch + Lightning
- DawDreamer (VST rendering)
- WandB (logging)
- Optuna (HPO)
- Hydra (config management)
See requirements.txt for the full list.
Clone the repo and install via pip or Docker.
→ See Installation & environment setup for details.
Currently, the following synthesizers are supported:
→ See Adding synthesizers for instructions on integrating new ones.
Wrappers for the following audio models are available in the src/models/audio/ directory:
- EfficientAT (used in the paper)
- Torchopenl3
- PaSST
- Audio-MAE
- Mel features.
→ See Adding audio models for integration instructions.
→ The code for the audio models evaluation can be found in its corresponding repository.
An overview of the implemented neural proxies can be found in src/models/preset/model_zoo.py.
Download pretrained checkpoints here and place them in checkpoints/.
See Datasets for download links and generation instructions of synthetic and hand-crafted preset datasets.
This repository provides the following experiments:
- Training and evaluation of synthesizer proxies.
- Hyperparameter optimization (HPO) with Optuna.
- Sound matching downstream tasks (finetuning + estimator network).
→ See Experiments for scripts, configs, and usage examples.
The detailed step-by-step instructions to replicate the results from the paper, including model evaluation and visualization scripts can be found in Reproducibility.
@article{combes2025neural,
author={Combes, Paolo and Weinzierl, Stefan and Obermayer, Klaus},
journal={Journal of the Audio Engineering Society},
title={Neural Proxies for Sound Synthesizers: Learning Perceptually Informed Preset Representations},
year={2025},
volume={73},
issue={9},
pages={561-577},
month={September},
} Special shout out to Joseph Turian for his initial guidance on the topic and overall methodology, and to Gwendal le Vaillant for the useful discussion on SPINVAE from which the transformer-based preset encoder is inspired.