Skip to content

A fundamental frequency estimation algorithm using features from the magnitude and phase spectrogram.

License

Notifications You must be signed in to change notification settings

bastibe/MAPS-Scripts

Repository files navigation

This is part of the dissertation Pitch of Voiced Speech in the Short-Time Fourier Transform: Algorithms, Ground Truths, and Evaluation Methods, on the topic of Fundamental Frequency Estimation
(Accepted Dissertation)
© 2020, Bastian Bechtold, Jade Hochschule & Carl von Ossietzky Universität Oldenburg, Germany.

MAPS: A Fundamental Frequency Estimation Algorithm for Speech in the Magnitude and Phase Spectrogram

DOI GitHub

This repository contains source code for MAPS, the Magnitude and Phase Spectrogram fundamental frequency estimation algorithm.

Implementations of the algorithm are provided in Python (maps_f0.py), Matlab (maps_f0.m), and Julia (maps_f0.jl). MAPS is provided under the terms of the GPLv3 license. PEFAC [1], RAPT [2], and YIN [3] are covered by their respective licenses.

Additionally, MAPS Evaluation.ipynb contains a reproducible research notebook for comparing MAPS to the well-known fundamental frequency estimation algorithms PEFAC [1], RAPT [2], and YIN [3] on the PTDB-TUG [4] speech corpus and the QUT-NOISE [5] noise corpus.

References:

  1. Sira Gonzalez and Mike Brookes. PEFAC - A Pitch Estimation Algorithm Robust to High Levels of Noise. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(2):518—530, February 2014.
  2. David Talkin. A robust algorithm for pitch tracking (RAPT). Speech coding and synthesis, 495:518, 1995.
  3. Alain de Cheveigné and Hideki Kawahara. YIN, a fundamental frequency estimator for speech and music. The Journal of the Acoustical Society of America, 111(4):1917, 2002.
  4. Gregor Pirker, Michael Wohlmayr, Stefan Petrik, and Franz Pernkopf. A Pitch Tracking Corpus with Evaluation on Multipitch Tracking Scenario. page 4, 2011.
  5. David B. Dean, Sridha Sridharan, Robert J. Vogt, and Michael W. Mason. The QUT-NOISE-TIMIT corpus for the evaluation of voice activity detection algorithms. Proceedings of Interspeech 2010, 2010.