Skip to content

audiojs/pitch-detection

Repository files navigation

pitch-detection test npm MIT

Pitch, chroma, chord and key detection. YIN, McLeod, pYIN, HPS, cepstrum, SWIPE, autocorrelation, AMDF, NNLS chroma, chord templates, Krumhansl-Schmuckler.

Pitch

YIN — cumulative mean normalized difference
McLeod — normalized square difference (MPM)
pYIN — probabilistic YIN with Beta prior
Autocorrelation — normalized autocorrelation
AMDF — average magnitude difference

Spectral pitch

HPS — harmonic product spectrum
Cepstrum — real cepstrum peak picking
SWIPE — sawtooth waveform inspired estimator

Harmony

Chroma — PCP / NNLS pitch-class profiles
Chord — template matching + Viterbi smoothing
Key — Krumhansl-Schmuckler key finding

Install

npm install pitch-detection

Usage

import { yin, mcleod, chroma, chord, key } from 'pitch-detection'

let fs = 44100
let frame = new Float32Array(2048)  // fill from your audio source

// pitch
let result = yin(frame, { fs })
// → { freq: 440.1, clarity: 0.97 }  or  null

// chroma → chord → key
let c = chroma(frame, { fs, method: 'nnls' })
let ch = chord(c)
// → { root: 0, quality: 'maj', label: 'C', confidence: 0.92 }
let k = key(c)
// → { tonic: 0, mode: 'major', label: 'C', confidence: 0.85, scores: [...] }

Works in Node.js and browser. No Web Audio API needed — operates on raw Float32Array samples.

Sliding windows — call repeatedly as new samples arrive:

let hop = 512
for (let i = 0; i + 2048 <= samples.length; i += hop) {
  let frame = samples.subarray(i, i + 2048)
  let result = yin(frame, { fs })
  if (result) console.log(i / fs, result.freq.toFixed(1))
}

Full pipeline — pitch → chroma → chord → key on a sequence of frames:

import { chroma, chord, smoothChords, key } from 'pitch-detection'

let frames = []
for (let i = 0; i + 4096 <= samples.length; i += 2048) {
  frames.push(chroma(samples.subarray(i, i + 4096), { fs, method: 'nnls' }))
}
let chords = smoothChords(frames, { selfProb: 0.5 })
// → [{ root: 0, quality: 'maj', label: 'C' }, ...]
let k = key(frames)
// → { tonic: 0, mode: 'major', label: 'C', confidence: 0.85, scores: [...] }

API

All pitch algorithms return { freq, clarity } | null:

  • freq — fundamental frequency in Hz
  • clarity — algorithm-specific confidence in [0, 1]
  • null — no periodic structure found (silence, noise, polyphony)

Time-domain algorithms (YIN, McLeod, pYIN, autocorrelation, AMDF) accept any buffer length. Spectral algorithms (HPS, cepstrum, SWIPE, chroma) require power-of-2 length.


YIN

de Cheveigné & Kawahara, 2002. The reference algorithm for monophonic pitch estimation. Most cited, most tested, most robust.

import yin from 'pitch-detection/yin.js'

let result = yin(samples, { fs: 44100 })
Param Default
fs 44100 Sample rate (Hz)
threshold 0.15 CMND threshold — lower = stricter, fewer detections

Use when: General-purpose monophonic pitch tracking — speech, singing, solo instruments. The most reliable choice when in doubt.
Not for: Polyphonic audio (returns dominant or null), real-time with hard latency budgets (needs full window).
Ref: de Cheveigné & Kawahara, "YIN, a fundamental frequency estimator for speech and music", JASA 2002.
Complexity: $O(N^2/4)$ — two nested passes over half the window.

McLeod

McLeod & Wyvill, 2005. Normalized square difference with smarter peak picking. Handles smaller windows — good for vibrato and fast pitch changes.

import mcleod from 'pitch-detection/mcleod.js'

let result = mcleod(samples, { fs: 44100 })
Param Default
fs 44100 Sample rate (Hz)
threshold 0.9 Peak selection threshold as fraction of global max

Use when: Vibrato tracking, small hop sizes, singing voice where YIN occasionally double-triggers.
Not for: Highly noisy signals (NSDF is less thresholded than YIN's CMND).
Ref: McLeod & Wyvill, "A smarter way to find pitch", ICMC 2005.
Complexity: $O(N^2/4)$ — same asymptotic cost as YIN.

pYIN

Mauch & Dixon, 2014. Probabilistic YIN — runs YIN at multiple thresholds weighted by a Beta(2, 18) prior, producing a distribution over candidate pitches instead of a single hard pick. More robust than YIN on ambiguous frames.

import pyin from 'pitch-detection/pyin.js'

let result = pyin(samples, { fs: 44100 })
// → { freq: 440.1, clarity: 0.92, candidates: [{ freq: 440.1, prob: 0.85 }, ...] }
Param Default
fs 44100 Sample rate (Hz)
minFreq 50 Minimum detectable frequency (Hz)
maxFreq 2000 Maximum detectable frequency (Hz)

Use when: Ambiguous pitched content — breathy vocals, noisy recordings, or when you need a pitch posterior for downstream HMM tracking.
Not for: Clean signals where YIN already works well (pYIN is ~10× slower due to multi-threshold sweep).
Ref: Mauch & Dixon, "pYIN: A Fundamental Frequency Estimator Using Probabilistic Threshold Distributions", ICASSP 2014.

Autocorrelation

Normalized autocorrelation — the simplest pitch estimator. Educational baseline.

import autocorrelation from 'pitch-detection/autocorrelation.js'

let result = autocorrelation(samples, { fs: 44100 })
Param Default
fs 44100 Sample rate (Hz)
threshold 0.5 Minimum normalized autocorrelation value to accept

Use when: Learning, quick prototypes, signals with strong dominant periodicity and low noise.
Not for: Production — octave errors are common without additional heuristics.
Ref: Rabiner, "Use of autocorrelation analysis for pitch detection", IEEE TASSP 1977.
Complexity: $O(N^2/4)$.

AMDF

Ross et al., 1974. Average Magnitude Difference Function — the classical predecessor to YIN. Measures average absolute difference between a signal and its delayed copy; minima indicate periodicity.

import amdf from 'pitch-detection/amdf.js'

let result = amdf(samples, { fs: 44100 })
Param Default
fs 44100 Sample rate (Hz)
minFreq 50 Minimum detectable frequency (Hz)
maxFreq 2000 Maximum detectable frequency (Hz)
threshold 0.3 Normalized AMDF dip threshold

Use when: Low-complexity environments, embedded systems. Simpler and cheaper than YIN (no squaring, no cumulative normalization).
Not for: Noisy signals — lacks YIN's cumulative normalization that suppresses octave errors.
Ref: Ross et al., "Average magnitude difference function pitch extractor", IEEE TASSP 1974.
Complexity: $O(N^2/4)$.


HPS

Schroeder, 1968. Harmonic Product Spectrum — multiplies the spectrum by its downsampled copies so that harmonic peaks align at the fundamental. Robust to the missing-fundamental problem.

import hps from 'pitch-detection/hps.js'

let result = hps(samples, { fs: 44100 })
Param Default
fs 44100 Sample rate (Hz)
harmonics 5 Number of harmonic products
minFreq 50 Minimum detectable frequency (Hz)
maxFreq 4000 Maximum detectable frequency (Hz)
cents 10 Candidate spacing in cents
threshold 0.1 Minimum clarity to accept

Use when: Harmonic-rich signals (guitar, piano, brass). Naturally handles missing fundamentals.
Not for: Pure sinusoids (only one harmonic), very noisy signals.
Ref: Schroeder, "Period histogram and product spectrum", JASA 1968.
Requires: Power-of-2 window length.

Cepstrum

Noll, 1967. Real cepstrum — $c(\tau) = \text{IFFT}(\log |\text{FFT}(x)|)$. A peak at quefrency $\tau$ corresponds to period $\tau$ in the time domain.

import cepstrum from 'pitch-detection/cepstrum.js'

let result = cepstrum(samples, { fs: 44100 })
Param Default
fs 44100 Sample rate (Hz)
minFreq 50 Minimum detectable frequency (Hz)
maxFreq 2000 Maximum detectable frequency (Hz)
threshold 0.3 Minimum clarity to accept

Use when: Harmonic signals where you want a clean spectral-domain method. Good pedagogical complement to time-domain algorithms.
Not for: Low-pitched signals (quefrency resolution is limited by window length).
Ref: Noll, "Cepstrum pitch determination", JASA 1967.
Requires: Power-of-2 window length.

SWIPE

Camacho & Harris, 2008. SWIPE' (Sawtooth Waveform Inspired Pitch Estimator, prime harmonics). Measures spectral similarity between the window and a sawtooth template whose lobes sit at prime harmonics. More accurate than HPS on clean instrumental signals; robust against octave errors because only prime harmonics contribute.

Simplified single-window form: uses one FFT instead of the multi-resolution loudness pyramid of the original paper — sufficient for stationary windows.

import swipe from 'pitch-detection/swipe.js'

let result = swipe(samples, { fs: 44100 })
Param Default
fs 44100 Sample rate (Hz)
minFreq 60 Minimum detectable frequency (Hz)
maxFreq 4000 Maximum detectable frequency (Hz)
cents 10 Candidate spacing in cents
threshold 0.15 Minimum clarity to accept

Use when: Clean instrumental signals, studio recordings, where sub-Hz accuracy matters.
Not for: Very noisy or reverberant signals (single-window form lacks multi-resolution robustness of the full SWIPE').
Ref: Camacho & Harris, "A sawtooth waveform inspired pitch estimator for speech and music", JASA 2008.
Requires: Power-of-2 window length.


Chroma

Fujishima, 1999 (PCP) / Mauch & Dixon, 2010 (NNLS). Chroma feature — a 12-D vector where each bin holds the energy attributed to one pitch class (C, C#, ..., B).

import chroma from 'pitch-detection/chroma.js'

// PCP — classical spectral folding
let c = chroma(samples, { fs: 44100 })

// NNLS — nonnegative least squares (cleaner for polyphonic audio)
let c2 = chroma(samples, { fs: 44100, method: 'nnls' })

PCP (default)

Each spectral bin is mapped to its nearest pitch class and squared magnitudes are accumulated. Simple and fast.

NNLS

Fits the observed $\sqrt{\text{spectrum}}$ as a nonnegative combination of synthetic pitch-tone profiles (fundamental plus geometrically decaying overtones, Gaussian lobes in log-frequency with σ = 0.5 semitones). Uses multiplicative NMF updates: $a \leftarrow a \cdot (D^\top s) / (D^\top D a + \varepsilon)$. Suppresses octave and harmonic confusion on polyphonic audio.

Pitch dictionary covers MIDI 24–96 (C1–C7) with configurable harmonics per tone.

Param Default
fs 44100 Sample rate (Hz)
method 'pcp' 'pcp' or 'nnls'
minFreq 65 Min frequency for PCP mapping (~C2)
maxFreq 2093 Max frequency for PCP mapping (~C7)
harmonics 8 Overtones per pitch (NNLS only)
iterations 30 NMF iterations (NNLS only)

Returns: Float64Array(12), L1-normalized.

Use when: Building chord/key detectors, music information retrieval, audio fingerprinting. NNLS for polyphonic; PCP for speed.
Ref (PCP): Fujishima, "Realtime chord recognition of musical sound", ICMC 1999.
Ref (NNLS): Mauch & Dixon, "Approximate Note Transcription for the Improved Identification of Difficult Chords", ISMIR 2010.
Requires: Power-of-2 window length.

Chord

Fujishima, 1999 (templates) / Viterbi smoothing. Classifies chroma frames as one of 24 major/minor triads via cosine similarity with binary templates.

import chord, { TEMPLATES, smooth as smoothChords } from 'pitch-detection/chord.js'

// single frame
let c = chord(chromaVec)
// → { root: 0, quality: 'maj', label: 'C', confidence: 0.92 }

// smoothed sequence
let chords = smoothChords(chromaFrames, { selfProb: 0.5 })
// → [{ root: 0, quality: 'maj', label: 'C' }, ...]

chord(chromaVec, opts)

Cosine similarity against 24 binary templates (12 major + 12 minor triads). Returns the best match with confidence score.

Param Default
minConfidence 0.3 Below this, returns quality 'N' (no chord)

Returns: { root, quality, label, confidence } where quality is 'maj', 'min', or 'N'.

smooth(frames, opts)

Viterbi decoding with a sticky self-transition prior. Observation log-likelihood = 8 × cosine similarity (temperature 8 gives reasonably sharp distributions).

Param Default
selfProb 0.5 Self-transition probability (higher = smoother)

Returns: { root, quality, label }[] — one chord per frame.

TEMPLATES

Exported array of 24 chord templates: { root, quality, label, vec } where vec is a Float64Array(12) with 1 on chord tones.

Use when: Quick chord labeling from chroma features. Combine with NNLS chroma for best results.
Ref: Fujishima, "Realtime chord recognition of musical sound", ICMC 1999.

Key

Krumhansl & Schmuckler. Detects musical key from chroma via Pearson correlation against 24 rotated major/minor key profiles (Krumhansl-Kessler probe-tone ratings).

import key, { KK_MAJOR, KK_MINOR } from 'pitch-detection/key.js'

let k = key(chromaVec)
// → { tonic: 0, mode: 'major', label: 'C', confidence: 0.85, scores: [...] }

// from multiple frames (averages internally)
let k2 = key(chromaFrames)
Param Default
profile { major: KK_MAJOR, minor: KK_MINOR } Custom key profiles

Returns: { tonic, mode, label, confidence, scores } where scores is all 24 keys sorted descending.

Exported profiles

  • KK_MAJOR — Krumhansl-Kessler major profile: [6.35, 2.23, 3.48, 2.33, 4.38, 4.09, 2.52, 5.19, 2.39, 3.66, 2.29, 2.88]
  • KK_MINOR — Krumhansl-Kessler minor profile: [6.33, 2.68, 3.52, 5.38, 2.60, 3.53, 2.54, 4.75, 3.98, 2.69, 3.34, 3.17]

Use when: Key detection for music analysis, automatic transposition, music information retrieval.
Ref: Krumhansl, Cognitive Foundations of Musical Pitch, Oxford 1990.
Ref: Temperley, "What's Key for Key?", Music Perception 1999.


Comparison

Pitch algorithms

YIN McLeod pYIN AMDF HPS Cepstrum SWIPE
Domain time time time time spectral spectral spectral
Accuracy ★★★★★ ★★★★ ★★★★★ ★★★ ★★★★ ★★★ ★★★★★
Noise robustness ★★★★★ ★★★★ ★★★★★ ★★★ ★★★ ★★★ ★★★★
Octave errors rare rare rare common rare occasional rare
Missing fundamental no no no no yes yes yes
Min window ~4 periods ~2 periods ~4 periods ~4 periods power of 2 power of 2 power of 2
Best for general vibrato ambiguous embedded harmonic-rich pedagogical studio

See also

About

Pitch detection algorithms

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors