Pitch, chroma, chord and key detection. YIN, McLeod, pYIN, HPS, cepstrum, SWIPE, autocorrelation, AMDF, NNLS chroma, chord templates, Krumhansl-Schmuckler.
|
YIN — cumulative mean normalized difference |
HPS — harmonic product spectrum |
Chroma — PCP / NNLS pitch-class profiles |
npm install pitch-detection
import { yin, mcleod, chroma, chord, key } from 'pitch-detection'
let fs = 44100
let frame = new Float32Array(2048) // fill from your audio source
// pitch
let result = yin(frame, { fs })
// → { freq: 440.1, clarity: 0.97 } or null
// chroma → chord → key
let c = chroma(frame, { fs, method: 'nnls' })
let ch = chord(c)
// → { root: 0, quality: 'maj', label: 'C', confidence: 0.92 }
let k = key(c)
// → { tonic: 0, mode: 'major', label: 'C', confidence: 0.85, scores: [...] }Works in Node.js and browser. No Web Audio API needed — operates on raw
Float32Arraysamples.
Sliding windows — call repeatedly as new samples arrive:
let hop = 512
for (let i = 0; i + 2048 <= samples.length; i += hop) {
let frame = samples.subarray(i, i + 2048)
let result = yin(frame, { fs })
if (result) console.log(i / fs, result.freq.toFixed(1))
}Full pipeline — pitch → chroma → chord → key on a sequence of frames:
import { chroma, chord, smoothChords, key } from 'pitch-detection'
let frames = []
for (let i = 0; i + 4096 <= samples.length; i += 2048) {
frames.push(chroma(samples.subarray(i, i + 4096), { fs, method: 'nnls' }))
}
let chords = smoothChords(frames, { selfProb: 0.5 })
// → [{ root: 0, quality: 'maj', label: 'C' }, ...]
let k = key(frames)
// → { tonic: 0, mode: 'major', label: 'C', confidence: 0.85, scores: [...] }All pitch algorithms return { freq, clarity } | null:
freq— fundamental frequency in Hzclarity— algorithm-specific confidence in[0, 1]null— no periodic structure found (silence, noise, polyphony)
Time-domain algorithms (YIN, McLeod, pYIN, autocorrelation, AMDF) accept any buffer length. Spectral algorithms (HPS, cepstrum, SWIPE, chroma) require power-of-2 length.
de Cheveigné & Kawahara, 2002. The reference algorithm for monophonic pitch estimation. Most cited, most tested, most robust.
import yin from 'pitch-detection/yin.js'
let result = yin(samples, { fs: 44100 })| Param | Default | |
|---|---|---|
fs |
44100 |
Sample rate (Hz) |
threshold |
0.15 |
CMND threshold — lower = stricter, fewer detections |
Use when: General-purpose monophonic pitch tracking — speech, singing, solo instruments. The most reliable choice when in doubt.
Not for: Polyphonic audio (returns dominant or null), real-time with hard latency budgets (needs full window).
Ref: de Cheveigné & Kawahara, "YIN, a fundamental frequency estimator for speech and music", JASA 2002.
Complexity:
McLeod & Wyvill, 2005. Normalized square difference with smarter peak picking. Handles smaller windows — good for vibrato and fast pitch changes.
import mcleod from 'pitch-detection/mcleod.js'
let result = mcleod(samples, { fs: 44100 })| Param | Default | |
|---|---|---|
fs |
44100 |
Sample rate (Hz) |
threshold |
0.9 |
Peak selection threshold as fraction of global max |
Use when: Vibrato tracking, small hop sizes, singing voice where YIN occasionally double-triggers.
Not for: Highly noisy signals (NSDF is less thresholded than YIN's CMND).
Ref: McLeod & Wyvill, "A smarter way to find pitch", ICMC 2005.
Complexity:
Mauch & Dixon, 2014. Probabilistic YIN — runs YIN at multiple thresholds weighted by a Beta(2, 18) prior, producing a distribution over candidate pitches instead of a single hard pick. More robust than YIN on ambiguous frames.
import pyin from 'pitch-detection/pyin.js'
let result = pyin(samples, { fs: 44100 })
// → { freq: 440.1, clarity: 0.92, candidates: [{ freq: 440.1, prob: 0.85 }, ...] }| Param | Default | |
|---|---|---|
fs |
44100 |
Sample rate (Hz) |
minFreq |
50 |
Minimum detectable frequency (Hz) |
maxFreq |
2000 |
Maximum detectable frequency (Hz) |
Use when: Ambiguous pitched content — breathy vocals, noisy recordings, or when you need a pitch posterior for downstream HMM tracking.
Not for: Clean signals where YIN already works well (pYIN is ~10× slower due to multi-threshold sweep).
Ref: Mauch & Dixon, "pYIN: A Fundamental Frequency Estimator Using Probabilistic Threshold Distributions", ICASSP 2014.
Normalized autocorrelation — the simplest pitch estimator. Educational baseline.
import autocorrelation from 'pitch-detection/autocorrelation.js'
let result = autocorrelation(samples, { fs: 44100 })| Param | Default | |
|---|---|---|
fs |
44100 |
Sample rate (Hz) |
threshold |
0.5 |
Minimum normalized autocorrelation value to accept |
Use when: Learning, quick prototypes, signals with strong dominant periodicity and low noise.
Not for: Production — octave errors are common without additional heuristics.
Ref: Rabiner, "Use of autocorrelation analysis for pitch detection", IEEE TASSP 1977.
Complexity:
Ross et al., 1974. Average Magnitude Difference Function — the classical predecessor to YIN. Measures average absolute difference between a signal and its delayed copy; minima indicate periodicity.
import amdf from 'pitch-detection/amdf.js'
let result = amdf(samples, { fs: 44100 })| Param | Default | |
|---|---|---|
fs |
44100 |
Sample rate (Hz) |
minFreq |
50 |
Minimum detectable frequency (Hz) |
maxFreq |
2000 |
Maximum detectable frequency (Hz) |
threshold |
0.3 |
Normalized AMDF dip threshold |
Use when: Low-complexity environments, embedded systems. Simpler and cheaper than YIN (no squaring, no cumulative normalization).
Not for: Noisy signals — lacks YIN's cumulative normalization that suppresses octave errors.
Ref: Ross et al., "Average magnitude difference function pitch extractor", IEEE TASSP 1974.
Complexity:
Schroeder, 1968. Harmonic Product Spectrum — multiplies the spectrum by its downsampled copies so that harmonic peaks align at the fundamental. Robust to the missing-fundamental problem.
import hps from 'pitch-detection/hps.js'
let result = hps(samples, { fs: 44100 })| Param | Default | |
|---|---|---|
fs |
44100 |
Sample rate (Hz) |
harmonics |
5 |
Number of harmonic products |
minFreq |
50 |
Minimum detectable frequency (Hz) |
maxFreq |
4000 |
Maximum detectable frequency (Hz) |
cents |
10 |
Candidate spacing in cents |
threshold |
0.1 |
Minimum clarity to accept |
Use when: Harmonic-rich signals (guitar, piano, brass). Naturally handles missing fundamentals.
Not for: Pure sinusoids (only one harmonic), very noisy signals.
Ref: Schroeder, "Period histogram and product spectrum", JASA 1968.
Requires: Power-of-2 window length.
Noll, 1967. Real cepstrum —
import cepstrum from 'pitch-detection/cepstrum.js'
let result = cepstrum(samples, { fs: 44100 })| Param | Default | |
|---|---|---|
fs |
44100 |
Sample rate (Hz) |
minFreq |
50 |
Minimum detectable frequency (Hz) |
maxFreq |
2000 |
Maximum detectable frequency (Hz) |
threshold |
0.3 |
Minimum clarity to accept |
Use when: Harmonic signals where you want a clean spectral-domain method. Good pedagogical complement to time-domain algorithms.
Not for: Low-pitched signals (quefrency resolution is limited by window length).
Ref: Noll, "Cepstrum pitch determination", JASA 1967.
Requires: Power-of-2 window length.
Camacho & Harris, 2008. SWIPE' (Sawtooth Waveform Inspired Pitch Estimator, prime harmonics). Measures spectral similarity between the window and a sawtooth template whose lobes sit at prime harmonics. More accurate than HPS on clean instrumental signals; robust against octave errors because only prime harmonics contribute.
Simplified single-window form: uses one FFT instead of the multi-resolution loudness pyramid of the original paper — sufficient for stationary windows.
import swipe from 'pitch-detection/swipe.js'
let result = swipe(samples, { fs: 44100 })| Param | Default | |
|---|---|---|
fs |
44100 |
Sample rate (Hz) |
minFreq |
60 |
Minimum detectable frequency (Hz) |
maxFreq |
4000 |
Maximum detectable frequency (Hz) |
cents |
10 |
Candidate spacing in cents |
threshold |
0.15 |
Minimum clarity to accept |
Use when: Clean instrumental signals, studio recordings, where sub-Hz accuracy matters.
Not for: Very noisy or reverberant signals (single-window form lacks multi-resolution robustness of the full SWIPE').
Ref: Camacho & Harris, "A sawtooth waveform inspired pitch estimator for speech and music", JASA 2008.
Requires: Power-of-2 window length.
Fujishima, 1999 (PCP) / Mauch & Dixon, 2010 (NNLS). Chroma feature — a 12-D vector where each bin holds the energy attributed to one pitch class (C, C#, ..., B).
import chroma from 'pitch-detection/chroma.js'
// PCP — classical spectral folding
let c = chroma(samples, { fs: 44100 })
// NNLS — nonnegative least squares (cleaner for polyphonic audio)
let c2 = chroma(samples, { fs: 44100, method: 'nnls' })Each spectral bin is mapped to its nearest pitch class and squared magnitudes are accumulated. Simple and fast.
Fits the observed
Pitch dictionary covers MIDI 24–96 (C1–C7) with configurable harmonics per tone.
| Param | Default | |
|---|---|---|
fs |
44100 |
Sample rate (Hz) |
method |
'pcp' |
'pcp' or 'nnls' |
minFreq |
65 |
Min frequency for PCP mapping (~C2) |
maxFreq |
2093 |
Max frequency for PCP mapping (~C7) |
harmonics |
8 |
Overtones per pitch (NNLS only) |
iterations |
30 |
NMF iterations (NNLS only) |
Returns: Float64Array(12), L1-normalized.
Use when: Building chord/key detectors, music information retrieval, audio fingerprinting. NNLS for polyphonic; PCP for speed.
Ref (PCP): Fujishima, "Realtime chord recognition of musical sound", ICMC 1999.
Ref (NNLS): Mauch & Dixon, "Approximate Note Transcription for the Improved Identification of Difficult Chords", ISMIR 2010.
Requires: Power-of-2 window length.
Fujishima, 1999 (templates) / Viterbi smoothing. Classifies chroma frames as one of 24 major/minor triads via cosine similarity with binary templates.
import chord, { TEMPLATES, smooth as smoothChords } from 'pitch-detection/chord.js'
// single frame
let c = chord(chromaVec)
// → { root: 0, quality: 'maj', label: 'C', confidence: 0.92 }
// smoothed sequence
let chords = smoothChords(chromaFrames, { selfProb: 0.5 })
// → [{ root: 0, quality: 'maj', label: 'C' }, ...]Cosine similarity against 24 binary templates (12 major + 12 minor triads). Returns the best match with confidence score.
| Param | Default | |
|---|---|---|
minConfidence |
0.3 |
Below this, returns quality 'N' (no chord) |
Returns: { root, quality, label, confidence } where quality is 'maj', 'min', or 'N'.
Viterbi decoding with a sticky self-transition prior. Observation log-likelihood = 8 × cosine similarity (temperature 8 gives reasonably sharp distributions).
| Param | Default | |
|---|---|---|
selfProb |
0.5 |
Self-transition probability (higher = smoother) |
Returns: { root, quality, label }[] — one chord per frame.
Exported array of 24 chord templates: { root, quality, label, vec } where vec is a Float64Array(12) with 1 on chord tones.
Use when: Quick chord labeling from chroma features. Combine with NNLS chroma for best results.
Ref: Fujishima, "Realtime chord recognition of musical sound", ICMC 1999.
Krumhansl & Schmuckler. Detects musical key from chroma via Pearson correlation against 24 rotated major/minor key profiles (Krumhansl-Kessler probe-tone ratings).
import key, { KK_MAJOR, KK_MINOR } from 'pitch-detection/key.js'
let k = key(chromaVec)
// → { tonic: 0, mode: 'major', label: 'C', confidence: 0.85, scores: [...] }
// from multiple frames (averages internally)
let k2 = key(chromaFrames)| Param | Default | |
|---|---|---|
profile |
{ major: KK_MAJOR, minor: KK_MINOR } |
Custom key profiles |
Returns: { tonic, mode, label, confidence, scores } where scores is all 24 keys sorted descending.
KK_MAJOR— Krumhansl-Kessler major profile:[6.35, 2.23, 3.48, 2.33, 4.38, 4.09, 2.52, 5.19, 2.39, 3.66, 2.29, 2.88]KK_MINOR— Krumhansl-Kessler minor profile:[6.33, 2.68, 3.52, 5.38, 2.60, 3.53, 2.54, 4.75, 3.98, 2.69, 3.34, 3.17]
Use when: Key detection for music analysis, automatic transposition, music information retrieval.
Ref: Krumhansl, Cognitive Foundations of Musical Pitch, Oxford 1990.
Ref: Temperley, "What's Key for Key?", Music Perception 1999.
| YIN | McLeod | pYIN | AMDF | HPS | Cepstrum | SWIPE | |
|---|---|---|---|---|---|---|---|
| Domain | time | time | time | time | spectral | spectral | spectral |
| Accuracy | ★★★★★ | ★★★★ | ★★★★★ | ★★★ | ★★★★ | ★★★ | ★★★★★ |
| Noise robustness | ★★★★★ | ★★★★ | ★★★★★ | ★★★ | ★★★ | ★★★ | ★★★★ |
| Octave errors | rare | rare | rare | common | rare | occasional | rare |
| Missing fundamental | no | no | no | no | yes | yes | yes |
| Min window | ~4 periods | ~2 periods | ~4 periods | ~4 periods | power of 2 | power of 2 | power of 2 |
| Best for | general | vibrato | ambiguous | embedded | harmonic-rich | pedagogical | studio |
- fourier-transform — FFT used by spectral algorithms
- beat-detection — onset detection, tempo estimation, beat tracking
- digital-filter — filter design and processing
- time-stretch — time stretching and pitch shifting
- pitch-shift — pitch shifting algorithms