Skip to content
Multiresolution alignment of multiple unsynchronized audio sequences using sequential Monte Carlo samplers
HTML JavaScript MATLAB Python CSS Mathematica PHP
Branch: master
Clone or download

Latest commit

Fetching latest commit…
Cannot retrieve the latest commit at this time.

Files

Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
Evaluation
Fingerprinting_based_alignment
SMC_based_alignment
Software_Documentation
README.md

README.md

Multiresolution Alignment for Multiple Unsynchronized Audio Sequences using Sequential Monte Carlo Samplers

Citation

If you are using this source code please consider citing the following papers:

D. Basaran et al., Multiresolution alignment for multiple unsynchronized audio sequences using sequential Monte Carlo samplers, Digital Signal Processing (2017), https://doi.org/10.1016/j.dsp.2017.10.024

Bibtex

	@article{BASARAN2017,
	title = "Multiresolution alignment for multiple unsynchronized audio sequences using sequential Monte Carlo samplers",
	journal = "Digital Signal Processing",
	year = "2017",
	issn = "1051-2004",
	doi = "https://doi.org/10.1016/j.dsp.2017.10.024",
	url = "http://www.sciencedirect.com/science/article/pii/S1051200417302518",
	author = "Dogac Basaran and Ali Taylan Cemgil and Emin Anarim",
	keywords = "Multiple audio alignment, Multiresolution alignment, Audio fingerprint, Bayesian inference, Sequential Monte Carlo samplers, Sequential alignment"
	}

Dogac Basaran, Ali Taylan Cemgil, Emin Anarim, Multiresolution alignment for multiple unsynchronized audio sequences using Sequential Monte Carlo samplers, SoftwareX, 12 December 2017, ISSN 2352-7110, https://doi.org/10.1016/j.softx.2017.11.006.

Bibtex

    @article{Basaran2017,
    title = "Multiresolution alignment for multiple unsynchronized audio sequences using Sequential Monte Carlo samplers ",
    journal = "SoftwareX ",
    year = "2017",
    issn = "2352-7110",
    doi = "https://doi.org/10.1016/j.softx.2017.11.006",
    url = "https://www.sciencedirect.com/science/article/pii/S235271101730064X",
    author = "Dogac Basaran and Ali Taylan Cemgil and Emin Anarim",
    keywords = "Multiple audio alignment, Multiresolution alignment, Audio fingerprint, Bayesian inference, Sequential Monte Carlo samplers, Sequential alignment"
    }

Software

There are 3 separate parts of the software;

1- Multiresolution multiple audio alignment software.

2- Fingerprinting based multiple audio alignment software (Baseline).

3- Evaluation.

Here, we demostrate the software with using the audio dataset from GT_090912 event of the Jiku dataset (available here). Note that the audio files have a name convention for microphones with multiple recordings. In the provided link, the audio filenames are modified accordingly.

Multiresolution Multiple Audio Alignment Software

This is the main software that computes the alignment estimates of unsynchronized audio files using the SMC based multiresolution multiple audio alignment method. The software is written in Matlab and is located under the folder "/SMC_based_alignment". The documentation of the software is available here.

Fingerprinting Based Multiple Audio Alignment Software (Baseline)

As a baseline, we use a fingerprinting based alignment approach. The fingerprinting software is obtained from

D. Ellis, Robust Landmark based Audio Fingerprinting, web source, available (2009). URL http://labrosa.ee.columbia.edu/matlab/fingerprint/

The codes for fingerprinting software are available under the subfolder "/Fingerprinting_based_alignment/fingerprint_labrosa".

Note that this software is not directly applicable to the alignment setting since it is a query-by-example based audio fingerprinting software. For alignment purposes, we simply count the number of exact hash(fingerprint) matches between each pair of sequences. Then by thresholding according to the number of hash matches, we decide if the sequences are matching. The time information of the matching hashes are then used to compute the relative offset between sequences. A more detailed description can be found in Sec.4.1 in the DSP manuscript.

Evaluation

The evaluation software is written in python 2.7 and is located under the folder "/Evaluation". The documentation of the software is available here.

The ground-truth for the Jiku dataset is given in the "Jiku_GT_090912.xml" file available under the subfolder "/Evaluation/ground_truth". The ground-truth is obtained from

M. Guggenberger, M. Lux, L. Boszormenyi, A Synchronization Ground Truth for the Jiku Mobile Video Dataset

"Jiku_GT_090912.xml" file contains the offset of each sequence on the universal time line that is not compatible with the evaluation procedure where the relative offsets are considered. Hence, we reformat the ground-truth information with the "groundtruth.py" file into a text file "ground_truth.txt" available under "Evaluation/ground_truth".

Note that "groundtruth.py" file have to be run for one time. Once the "ground_truth.txt" file is created, there is no need to run it again.

The baseline method requires a threshold to decide a matching/not matching decision between two sequences. A grid search is applied to tune the threshold for best accuracy result using the "fingerprinting_evaluation.py" module. The threshold that results in highest accuracy is chosen for comparison. The software prints the best accuracy with the respective threshold value and plots a figure with two subplots; the accuracy for each threshold in the first plot and precision, recall and F-measure values for each threshold in the second plot.

How to use the software

For the demonstration of the software, please apply the following steps,

1- Download and decompress Jiku dataset with the provided download link.

2- Download and decompress the project from the github repo

3- To run SMC based multiresolution multiple audio alignment software, simply run

/path/to/project/SMC_based_alignment/SMC_demonstration.m 

and choose /path/to/audio_data as input in the browse menu. The resulting alignment estimates will be written in a text file under

/path/to/project/Evaluation/SMC_offset_estimation_results 

with the name convention,

offset_estimation_SMC_result_<dd>_<mm>_<yyyy>_<hh>h_<mm>m.txt

4- To evaluate the estimation results, simply run

/path/to/project/Evaluation/compute_accuracy.py

and choose /path/to/project/SMC_offset_estimation_results/result_file to compute the accuracy, precision, recall and F-measure scores.

5- To run the fingerprinting based multiple audio alignment system, simply run

/path/to/project/Fingerprinting_based_alignment/fingerprinting_based_audio_alignment.m

and choose /path/to/audio_data as input in the browse menu. The estimation results written to a separate text file for each threshold under

/path/to/project/Evaluation/fingerprinting_offset_estimation_results 

with the name convention,

offset_estimation_fingerprinting_thr_<value>_result.txt

6- To evaluate the estimation results, simply run

/path/to/project/Evaluation/fingerprinting_evaluation.py

the best threshold (highest accuracy) will be computed and the respective evalutaion metrics will be printed on the screen.

You can’t perform that action at this time.