Release v0.2.0 · allen2c/input_audio

v0.2.0 — Real-time mic → WAV, with VAD segments and cleaner NR ✨🎤

What’s new
- Millisecond-first API: Simple timing controls; safe integer-millisecond buffering ⏱️
- Stream-to-file: Continuous 16‑bit WAV writing with periodic processing 💾
- Optional VAD: Emit speech segments to a queue and/or save per-segment WAVs 🗣️➡️📂
- Noise reduction first: NR applied before gain for consistent loudness 🔇➜🔊
- Observability: Switched to logging for friendly, toggleable output 🧭
Breaking changes
- Renamed keep_before_speech_ms → pre_speech_padding_ms, keep_after_speech_ms → post_speech_padding_ms for clarity (intuitive naming) [[memory:6212612]] ✅
- Removed VAD-specific sample_rate/buffer_size (VAD shares audio settings) 🧹
- Signature: input_audio(output_audio_filepath, ...) is now positional for output; returns b"" (side-effects are the file and VAD outputs) ⚠️
Key API
- AudioConfig: sample_rate=16000, channels=1, buffer_size=512, batch_process_ms=320, gain_db=20.0
- VADConfig: threshold=0.5, pre_speech_padding_ms=300, post_speech_padding_ms=500
- NoiseReductionConfig: prop_decrease=0.8, etc.
- Constraints enforced:
  - buffer_size * 1000 % sample_rate == 0
  - batch_process_ms is a multiple of buffer duration
  - AudioConfig.sample_rate == NoiseReductionConfig.sample_rate
Tiny example

from input_audio import input_audio, VADConfig

input_audio(
    "./recordings/full.wav",
    enable_vad=True,
    vad_config=VADConfig(pre_speech_padding_ms=300, post_speech_padding_ms=500),
    vad_dirpath="./tmp_vad",  # optional per-segment WAVs
    max_recording_duration_ms=10_000,
)

Quality improvements
- Fade-in/out at segment boundaries to reduce clicks 🎛️
- Accurate float32 pipeline; robust int16 conversion for WAV 📈
- Final flush ensures no frames left behind on exit ✅
- Safe gain with clipping protection 🔧
Dependencies
- Added: pydantic, durl-py, explicit numpy, torch, torchaudio, logging-bullet-train
- Removed: soundfile
- Bumped many versions; see requirements*.txt and pyproject.toml 📦
Docs & meta
- README rewritten: Quick Start, API reference, constraints, and examples 📝
- Version bump: 0.1.0 → 0.2.0
- Dev: added .envrc, updated .gitignore (tmp**) 🧰
Reviewer tips
- Focus on input_audio/__init__.py: new streaming loop, periodic processing, VAD emission, and config models
- Try:

from input_audio import input_audio
input_audio("./recordings/quick.wav", max_recording_duration_ms=5000)

Optional VAD run will write segments into ./tmp_vad and/or queue 🎯
Why this change
- Cleaner, predictable timing; clearer names; better defaults; safer audio pipeline; easier integration in streaming apps 🚀

What's Changed

Dev/v0.2.0 by @allen2c in #1

New Contributors

@allen2c made their first contribution in #1

Full Changelog: v0.1.0...v0.2.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.2.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

v0.2.0 — Real-time mic → WAV, with VAD segments and cleaner NR ✨🎤

What's Changed

New Contributors

Contributors

Uh oh!