An optimal transport inspired loss function for improving frequency localization in differentiable DSP

This is the official repository for the paper "Unsupervised Harmonic Parameter Estimation Using Differentiable DSP and Spectral Optimal Transport.", by Bernardo Torres, Geoffroy Peeters, and Gaël Richard. Check out the poster here.

We introduce a loss function for comparing spectra horizontally inspired by optimal transport. It computes the one dimensional Wasserstein distance between the spectra of two signals, which gives a measure of energy displacement along the frequency axis.

We propose doing this

Multi-Scale Spectral loss and others do this

By computing the gradient of this loss function with respect to the parameters of a signal processor (such as an sinusoidal oscillator), we can improve frequency localization/estimation compared to traditional vertical spectral losses (such as the Multi-Scale Spectral loss).

Summary

Spectral Optimal Transport: We use a loss function inspired by optimal transport to compare the spectra of two signals. In the paper we test this loss function on an autoencoding task aimed at estimating the parameters of a harmonic synthesizer (fundamental frequency and amplitudes) and at obtaining good reconstruction.

The loss function was largely based on POTs implementation of the 1D Wasserstein distance, which computes the quantile functions of the spectra (illustrated below).

Lightweight pitch estimator: Our encoder uses a lightweight architecture (46K params) based on PESTO to estimate the parameters of a harmonic synthesizer (fundamental frequency and amplitudes).
Differentiable DSP: Our decoder is a harmonic synthesizer from DDSP that synthesizes audio from fundamental frequency and amplitude parameters. Even though the decoder is not trained, using automatic differentiation we can compute the gradient of the loss function w.r.t. its input parameters (fundamental frequency and amplitudes).
Toy autoencoding task:

Some things to be careful about: SOT can be quite sensitive to spectral leakage and noise, so it definitely works better if your spectra have clear modes and are not only noise. This is due mostly to normalization. You can definitely threshold your noise or remove low amplitude points entirely, SOT works even if your spectra have different support :) (they just have to live in the same metric space and have each amplitude value associated with a frequency value).

Data Description

The synthetic data used for training, evaluation, and testing is available here. You can download it and put the file 40_1950_4096_04_1_4000_8_1_harmonic.pth in a data subfolder. You can use PreloadedSinusoidDataModule in synthetic_data.py to load it easily. Code for generating the data is also available.

Running Experiments

We recommend installing the dependencies in a virtual environment. We provide the conda environment file used [environment.yml]. If your cuda version checks out, you can run the following commands to create the environment and activate it:

conda env create -f environment.yml
conda activate sot

If using a different environment, please check your PyTorch Lightning and Lightning CLI version for compatibility, as it has been changing a lot recently and some modifications might be needed to run the code.

Paper experiments

Configuration files are available in the paper-experiments folder. Each subfolder is an experiment as described in the paper and we provide configuration files for each of the 5 runs with different random seeds.

To run an experiment, run the following command:

python train.py --config paper-experiments/<experiment_name>/<run>/train_config.yaml

Reproducing results from the paper

We provide pre-trained checkpoints for each experiment here. Download the file, extract it (+- 2.6 Gb) and put it under this_project_folder/checkpoints/. The stucture should look like this:

checkpoints
├── MSS-LIN
│   ├── $run_$seed
│   │   ├── checkpoints
│   │   │   ├── checkpoint_name.ckpt
│   │   ├── train_config.yaml
│   ├── ...
├── SOT-2048
│   ├── $run_$seed
│   │   ├── checkpoints
│   │   │   ├── checkpoint_name.ckpt
│   │   ├── train_config.yaml
│   ├── ...
...

To reproduce the result table from the paper, run:

python eval_paper.py

Citation

If you find our work useful or use it in your research, you can cite it using:

@inproceedings{torres2024unsupervised,
  title={Unsupervised harmonic parameter estimation using differentiable DSP and spectral optimal transport},
  author={Torres, Bernardo and Peeters, Geoffroy and Richard, Ga{\"e}l},
  booktitle={ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={1176--1180},
  year={2024},
  organization={IEEE}
}

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
figures		figures
paper-experiments		paper-experiments
LICENSE		LICENSE
README.md		README.md
ddsp.py		ddsp.py
encoder.py		encoder.py
environment.yml		environment.yml
eval_paper.py		eval_paper.py
features.py		features.py
losses.py		losses.py
metrics.py		metrics.py
plot_log_utils.py		plot_log_utils.py
save_config.py		save_config.py
synthetic_data.py		synthetic_data.py
synths.py		synths.py
train.py		train.py
trainer.py		trainer.py
utils.py		utils.py

License

bernardo-torres/1d-spectral-optimal-transport

Folders and files

Latest commit

History

Repository files navigation

An optimal transport inspired loss function for improving frequency localization in differentiable DSP

Summary

Data Description

Running Experiments

Paper experiments

Reproducing results from the paper

Citation

About

Resources

License

Stars

Watchers

Forks

Languages