Skip to content

v4.1.0 — Audio quality + read cache

Latest

Choose a tag to compare

@MKS-01 MKS-01 released this 27 Jun 20:02
11754cc

v4.1.0 — audio quality, synthesis performance, and developer experience.

Audio quality

  • Degenerate-chunk guard — all-silence chunks retry synthesis once before being dropped
  • Crossfade joins — 100 ms linear fade-out at chunk tails smooths the voiced→silence transition, completing the post-processing chain alongside peak normalization

Performance

  • Read cache — re-reads skip the entire pipeline (fetch → summarize → synthesize). Cache key (url, mode, voice, llm_model) with a composite index; only hits when the WAV still exists on disk
  • Faster synthesis — fp32 → bf16 default (~6% faster), chunk cap 280 → 400 chars (~30% fewer CSM prefills), sampler cached per (temperature, top_k)
  • New llm_model column on the reads table (auto-migrated)

CLI

  • Generation timer — player shows "Xs to generate" for live reads
  • Library UI revamp — inline mode · duration · words · date per row, space to preview audio without leaving the library, enter for the full player
  • Venv auto-detect — server spawns via .venv/bin/python3 -m readback (no activation needed); startup stderr captured

Tests, CI & docs

  • Test suite trimmed 59 → 38; new docs/TESTS.md catalogue; CI JUnit summaries
  • All doc surfaces synced; JOURNEY.md + finetune README rewritten
  • CLI screenshots refreshed for v4.1.0

Full changelog: #20