Skip to content

docs(spec): §3.1 preprocessing pipeline steps 1–4 still describe pre-M14 implementation #84

@shaypal5

Description

@shaypal5

Surfaced as PR #82 review thread COPILOT-2 (resolved as out-of-scope for #78). The complaint is correct and pre-existing — should have been synced as part of M14 (PR #48, 2026-05-01) and was not.

Drift between spec and implementation

docs/spec.md §3.1, lines 110–113:

1. **Resample** — convert to 16,000 Hz (SoX `rate` with VHQ quality, or `torchaudio.functional.resample`)
2. **Downmix** — stereo → mono (average channels)
3. **Spectral filter** — low-pass at 7,500 Hz to remove irrelevant high-frequency noise from budget sensors (Butterworth order 4)
4. **Denoising** — spectral subtraction (Wiener filtering) to remove electrical hum; parameterize noise profile from silent leading segment

Actual implementation in synthbanshee/augment/preprocessing.py (post-M14):

step spec (stale) code (current)
1 SoX or torchaudio scipy.signal.resample_poly (CLAUDE.md forbids torchaudio)
2 average channels data.mean(axis=1) if multichannel — match
3 7.5 kHz low-pass Butterworth-4 80 Hz high-pass Butterworth-2 (M14 removed the LPF entirely; LPF at 7.5 kHz on 16 kHz audio is just below Nyquist and was removing real signal)
4 Wiener spectral subtraction (default on) Optional Wiener via PreprocessingConfig.wiener_denoise, default False (M14 changed default; on-by-default Wiener over-smoothed clean TTS)

Why this matters

Anyone reading spec.md §3.1 to understand the pipeline gets pre-M14 information. Step 5 (loudness, just rewritten in #82) is in sync; steps 1–4 are not. This is exactly the paper-vs-reality drift that lets future audits over-trust either source.

Suggested fix

Rewrite §3.1 steps 1–4 to match current preprocessing.py:

1. **Resample** — convert to 16 kHz with `scipy.signal.resample_poly`
   (polyphase filter; `torchaudio` is forbidden in this repo, see AGENTS.md).
2. **Downmix** — stereo → mono via channel averaging.
3. **High-pass filter at 80 Hz** (Butterworth order 2, sos form) to remove
   DC and sub-bass rumble that small phone microphones cannot capture.
   Note: M14 (PR #48) removed the legacy 7.5 kHz low-pass filter — at
   16 kHz Nyquist it was destroying sibilants and breathiness cues.
4. **Wiener denoising***optional*, controlled by
   `PreprocessingConfig.wiener_denoise` (default `False`).  M14 changed the
   default because Wiener on clean TTS output over-smooths high-frequency
   transients.  Enable only for clips with real added noise (Tier B/C
   after acoustic augmentation).

Plus a sentence at the top of §3.1 noting that authoritative pipeline order lives in synthbanshee/augment/preprocessing.py:preprocess() so future drift is harder to introduce silently.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions