Skip to content

v0.2.0

Latest

Choose a tag to compare

@allen2c allen2c released this 14 Aug 16:46
3a9c91e

v0.2.0 — Real-time mic → WAV, with VAD segments and cleaner NR ✨🎤

  • What’s new

    • Millisecond-first API: Simple timing controls; safe integer-millisecond buffering ⏱️
    • Stream-to-file: Continuous 16‑bit WAV writing with periodic processing 💾
    • Optional VAD: Emit speech segments to a queue and/or save per-segment WAVs 🗣️➡️📂
    • Noise reduction first: NR applied before gain for consistent loudness 🔇➜🔊
    • Observability: Switched to logging for friendly, toggleable output 🧭
  • Breaking changes

    • Renamed keep_before_speech_mspre_speech_padding_ms, keep_after_speech_mspost_speech_padding_ms for clarity (intuitive naming) [[memory:6212612]] ✅
    • Removed VAD-specific sample_rate/buffer_size (VAD shares audio settings) 🧹
    • Signature: input_audio(output_audio_filepath, ...) is now positional for output; returns b"" (side-effects are the file and VAD outputs) ⚠️
  • Key API

    • AudioConfig: sample_rate=16000, channels=1, buffer_size=512, batch_process_ms=320, gain_db=20.0
    • VADConfig: threshold=0.5, pre_speech_padding_ms=300, post_speech_padding_ms=500
    • NoiseReductionConfig: prop_decrease=0.8, etc.
    • Constraints enforced:
      • buffer_size * 1000 % sample_rate == 0
      • batch_process_ms is a multiple of buffer duration
      • AudioConfig.sample_rate == NoiseReductionConfig.sample_rate
  • Tiny example

from input_audio import input_audio, VADConfig

input_audio(
    "./recordings/full.wav",
    enable_vad=True,
    vad_config=VADConfig(pre_speech_padding_ms=300, post_speech_padding_ms=500),
    vad_dirpath="./tmp_vad",  # optional per-segment WAVs
    max_recording_duration_ms=10_000,
)
  • Quality improvements

    • Fade-in/out at segment boundaries to reduce clicks 🎛️
    • Accurate float32 pipeline; robust int16 conversion for WAV 📈
    • Final flush ensures no frames left behind on exit ✅
    • Safe gain with clipping protection 🔧
  • Dependencies

    • Added: pydantic, durl-py, explicit numpy, torch, torchaudio, logging-bullet-train
    • Removed: soundfile
    • Bumped many versions; see requirements*.txt and pyproject.toml 📦
  • Docs & meta

    • README rewritten: Quick Start, API reference, constraints, and examples 📝
    • Version bump: 0.1.00.2.0
    • Dev: added .envrc, updated .gitignore (tmp**) 🧰
  • Reviewer tips

    • Focus on input_audio/__init__.py: new streaming loop, periodic processing, VAD emission, and config models
    • Try:
from input_audio import input_audio
input_audio("./recordings/quick.wav", max_recording_duration_ms=5000)
  • Optional VAD run will write segments into ./tmp_vad and/or queue 🎯

  • Why this change

    • Cleaner, predictable timing; clearer names; better defaults; safer audio pipeline; easier integration in streaming apps 🚀

What's Changed

New Contributors

Full Changelog: v0.1.0...v0.2.0