Surfaced as PR #82 review thread COPILOT-2 (resolved as out-of-scope for #78). The complaint is correct and pre-existing — should have been synced as part of M14 (PR #48, 2026-05-01) and was not.
Drift between spec and implementation
docs/spec.md §3.1, lines 110–113:
1. **Resample** — convert to 16,000 Hz (SoX `rate` with VHQ quality, or `torchaudio.functional.resample`)
2. **Downmix** — stereo → mono (average channels)
3. **Spectral filter** — low-pass at 7,500 Hz to remove irrelevant high-frequency noise from budget sensors (Butterworth order 4)
4. **Denoising** — spectral subtraction (Wiener filtering) to remove electrical hum; parameterize noise profile from silent leading segment
Actual implementation in synthbanshee/augment/preprocessing.py (post-M14):
| step |
spec (stale) |
code (current) |
| 1 |
SoX or torchaudio |
scipy.signal.resample_poly (CLAUDE.md forbids torchaudio) |
| 2 |
average channels |
data.mean(axis=1) if multichannel — match |
| 3 |
7.5 kHz low-pass Butterworth-4 |
80 Hz high-pass Butterworth-2 (M14 removed the LPF entirely; LPF at 7.5 kHz on 16 kHz audio is just below Nyquist and was removing real signal) |
| 4 |
Wiener spectral subtraction (default on) |
Optional Wiener via PreprocessingConfig.wiener_denoise, default False (M14 changed default; on-by-default Wiener over-smoothed clean TTS) |
Why this matters
Anyone reading spec.md §3.1 to understand the pipeline gets pre-M14 information. Step 5 (loudness, just rewritten in #82) is in sync; steps 1–4 are not. This is exactly the paper-vs-reality drift that lets future audits over-trust either source.
Suggested fix
Rewrite §3.1 steps 1–4 to match current preprocessing.py:
1. **Resample** — convert to 16 kHz with `scipy.signal.resample_poly`
(polyphase filter; `torchaudio` is forbidden in this repo, see AGENTS.md).
2. **Downmix** — stereo → mono via channel averaging.
3. **High-pass filter at 80 Hz** (Butterworth order 2, sos form) to remove
DC and sub-bass rumble that small phone microphones cannot capture.
Note: M14 (PR #48) removed the legacy 7.5 kHz low-pass filter — at
16 kHz Nyquist it was destroying sibilants and breathiness cues.
4. **Wiener denoising** — *optional*, controlled by
`PreprocessingConfig.wiener_denoise` (default `False`). M14 changed the
default because Wiener on clean TTS output over-smooths high-frequency
transients. Enable only for clips with real added noise (Tier B/C
after acoustic augmentation).
Plus a sentence at the top of §3.1 noting that authoritative pipeline order lives in synthbanshee/augment/preprocessing.py:preprocess() so future drift is harder to introduce silently.
References
Surfaced as PR #82 review thread COPILOT-2 (resolved as out-of-scope for #78). The complaint is correct and pre-existing — should have been synced as part of M14 (PR #48, 2026-05-01) and was not.
Drift between spec and implementation
docs/spec.md§3.1, lines 110–113:Actual implementation in
synthbanshee/augment/preprocessing.py(post-M14):scipy.signal.resample_poly(CLAUDE.md forbids torchaudio)data.mean(axis=1)if multichannel — matchPreprocessingConfig.wiener_denoise, defaultFalse(M14 changed default; on-by-default Wiener over-smoothed clean TTS)Why this matters
Anyone reading
spec.md§3.1 to understand the pipeline gets pre-M14 information. Step 5 (loudness, just rewritten in #82) is in sync; steps 1–4 are not. This is exactly the paper-vs-reality drift that lets future audits over-trust either source.Suggested fix
Rewrite §3.1 steps 1–4 to match current
preprocessing.py:Plus a sentence at the top of §3.1 noting that authoritative pipeline order lives in
synthbanshee/augment/preprocessing.py:preprocess()so future drift is harder to introduce silently.References
synthbanshee/augment/preprocessing.py:preprocess()— authoritative source.