v0.2.0 — Real-time mic → WAV, with VAD segments and cleaner NR ✨🎤
-
What’s new
- Millisecond-first API: Simple timing controls; safe integer-millisecond buffering ⏱️
- Stream-to-file: Continuous 16‑bit WAV writing with periodic processing 💾
- Optional VAD: Emit speech segments to a queue and/or save per-segment WAVs 🗣️➡️📂
- Noise reduction first: NR applied before gain for consistent loudness 🔇➜🔊
- Observability: Switched to
loggingfor friendly, toggleable output 🧭
-
Breaking changes
- Renamed
keep_before_speech_ms→pre_speech_padding_ms,keep_after_speech_ms→post_speech_padding_msfor clarity (intuitive naming) [[memory:6212612]] ✅ - Removed VAD-specific
sample_rate/buffer_size(VAD shares audio settings) 🧹 - Signature:
input_audio(output_audio_filepath, ...)is now positional for output; returnsb""(side-effects are the file and VAD outputs)⚠️
- Renamed
-
Key API
AudioConfig:sample_rate=16000,channels=1,buffer_size=512,batch_process_ms=320,gain_db=20.0VADConfig:threshold=0.5,pre_speech_padding_ms=300,post_speech_padding_ms=500NoiseReductionConfig:prop_decrease=0.8, etc.- Constraints enforced:
buffer_size * 1000 % sample_rate == 0batch_process_msis a multiple of buffer durationAudioConfig.sample_rate == NoiseReductionConfig.sample_rate
-
Tiny example
from input_audio import input_audio, VADConfig
input_audio(
"./recordings/full.wav",
enable_vad=True,
vad_config=VADConfig(pre_speech_padding_ms=300, post_speech_padding_ms=500),
vad_dirpath="./tmp_vad", # optional per-segment WAVs
max_recording_duration_ms=10_000,
)-
Quality improvements
- Fade-in/out at segment boundaries to reduce clicks 🎛️
- Accurate float32 pipeline; robust int16 conversion for WAV 📈
- Final flush ensures no frames left behind on exit ✅
- Safe gain with clipping protection 🔧
-
Dependencies
- Added:
pydantic,durl-py, explicitnumpy,torch,torchaudio,logging-bullet-train - Removed:
soundfile - Bumped many versions; see
requirements*.txtandpyproject.toml📦
- Added:
-
Docs & meta
- README rewritten: Quick Start, API reference, constraints, and examples 📝
- Version bump:
0.1.0→0.2.0 - Dev: added
.envrc, updated.gitignore(tmp**) 🧰
-
Reviewer tips
- Focus on
input_audio/__init__.py: new streaming loop, periodic processing, VAD emission, and config models - Try:
- Focus on
from input_audio import input_audio
input_audio("./recordings/quick.wav", max_recording_duration_ms=5000)-
Optional VAD run will write segments into
./tmp_vadand/or queue 🎯 -
Why this change
- Cleaner, predictable timing; clearer names; better defaults; safer audio pipeline; easier integration in streaming apps 🚀
What's Changed
New Contributors
Full Changelog: v0.1.0...v0.2.0