Releases: MKS-01/readback
Release list
v4.1.0 — Audio quality + read cache
v4.1.0 — audio quality, synthesis performance, and developer experience.
Audio quality
- Degenerate-chunk guard — all-silence chunks retry synthesis once before being dropped
- Crossfade joins — 100 ms linear fade-out at chunk tails smooths the voiced→silence transition, completing the post-processing chain alongside peak normalization
Performance
- Read cache — re-reads skip the entire pipeline (fetch → summarize → synthesize). Cache key
(url, mode, voice, llm_model)with a composite index; only hits when the WAV still exists on disk - Faster synthesis — fp32 → bf16 default (~6% faster), chunk cap 280 → 400 chars (~30% fewer CSM prefills), sampler cached per (temperature, top_k)
- New
llm_modelcolumn on thereadstable (auto-migrated)
CLI
- Generation timer — player shows "Xs to generate" for live reads
- Library UI revamp — inline
mode · duration · words · dateper row, space to preview audio without leaving the library, enter for the full player - Venv auto-detect — server spawns via
.venv/bin/python3 -m readback(no activation needed); startup stderr captured
Tests, CI & docs
- Test suite trimmed 59 → 38; new
docs/TESTS.mdcatalogue; CI JUnit summaries - All doc surfaces synced; JOURNEY.md + finetune README rewritten
- CLI screenshots refreshed for v4.1.0
Full changelog: #20
v4.0.0 — Full MLX LLM stack
Summary LLM and vision OCR now run in-process via mlx-lm + mlx-vlm on Apple Silicon, unifying with CSM-1B TTS under one framework. Ollama removed — no external daemon needed.
What changed
OllamaConfig→LLMConfig; config keyollama:→llm:(old key auto-migrated)ollamadependency replaced bymlx-lm+mlx-vlm- +25–30% generation speed
- Model discovery scans HF cache instead of Ollama API
- Default model:
mlx-community/Qwen3.5-9B-4bit
v3.7.0 — Design system consistency pass
Presentational-only release — no protocol, API, or config changes.
What's new
- Shared token layer — canonical CSS tokens (
src/design-system/tokens/) for colors, typography, spacing, and motion - Design system viewer — single-page browser (
src/design-system/index.html) with 9 component specimens and 3 interactive UI kits (Terminal, Dashboard, Landing) - Dashboard + landing page now import from / inline the same token set — consistent palette, type scale, and motion curves across all surfaces
- README updated with two full-page design system screenshots
No breaking changes. WS protocol and CLI unchanged.
v3.6.0 — Optimisation + UI polish
What's new
Performance
- Default model →
qwen3.5:9b— faster with comparable quality;LLMClientreused across the pipeline - Timing instrumentation — server logs per-read timings;
doneWS payload includestimings set_temperature/swap_voiceoff the thread pool — plain attribute mutations, noasyncio.to_thread- Model recommendation prefers default family —
/modelrecommendsqwen3.5:27b(same family as default)
Player fix
- Pause/resume: kill+restart replaces SIGSTOP/SIGCONT — eliminates CoreAudio buffer bleed and rapid-toggle audio pops
CLI UI polish
- Responsive progress bars (fill terminal width)
- Transcript scroll window (12-line cap with auto-follow)
- Structured
/helpcomponent (commands + player keys, colored and aligned) - Cleaner
/modellist (no emoji, aligned columns,visionas text tag) /liblibrary: selected-item-only metadata + summary preview, friendly dates/libshortcut in intro hints- Dashboard: pixel wordmark image (matches CLI block-art identity)
Dashboard polish
:focus-visible,::selection, custom scrollbar (parity with landing page)- Hover effects gated behind
@media (hover: hover)for mobile/Pi - Press states on sort buttons
Landing page
- 5-slide CLI stepper (dashboard moved to own section)
- "Beyond the terminal" section: dashboard + PiZoW home server side-by-side, clickable
- Slide crossfade uses blur to mask overlap
- Staggered card entrance
- Tighter hero pitch copy
Tooling
- Unified Bun commands —
bun run dev,start,buildwork in both CLI and dashboard - Drop redundant
bun install(Bun auto-installs) - Incremental
sync-pi.shwith.last-syncmarker - New skills:
ghost-design-system,drive-cli,refresh-screenshots
Full Changelog: v3.5.0...v3.6.0
v3.5.0 — Image OCR, book scans, map-reduce summaries, source-aware tones
What's new
- Image OCR — drop an image path; Ollama vision extracts the text and reads it aloud (
_ocr_via_ollama, autopick_vision_model) - Multi-page / book scans — a folder or glob of page photos is OCR'd in filename order and stitched into one continuous document (
fetch_multi_page) - Map-reduce summarization — long scans summarize end-to-end instead of truncating;
_batches→ condense → combine, recursion depth ≤ 3 - Source-aware tones — URL reads as a livelier article (temp 0.8); image/folder reads as a measured book (temp 0.6) that opens by naming its chapter/topic. Auto by source, no new commands
Details
- New
pipeline/tones.py:Tonedataclass,ARTICLE/BOOKinstances,classify_source,tone_for extract.py:_book_title_from_textderives chapter/topic from first OCR lines; HEIC/TIFF/BMP/WebP → JPEG viasipssummarize.py:systemparam threaded through_summarize_once/_map_reduce; per-batch progress via WSset_temperatureonSynthesizer+CsmEnginefor per-read delivery tuning- CLI input guard extended for absolute paths, globs, and tilde paths
- Tests:
test_tones.py,test_summarize_batches.py
Full Changelog: v3.3.0...v3.5.0
v3.3.0 — codeword voice, loudness normalization, instant CLI quit
Minor release — additive behavior + a default-config change; WS/CLI protocols unchanged.
TTS / voice
- New
codewordclone voice replaceskay— CSM-bootstrapped reference (self-generated from a one-off clone, so no source audio is retained;ref_textexactly matches what's spoken). Defaulttemperaturebumped to 0.7. - Loudness normalization —
_peak_normalizescales every read to 0.95. CSM matches the energy of its reference clip, so clone voices previously read ~18 dB quieter than the built-ins; now every voice lands at the same level.
CLI
- Instant quit —
stopServerSIGKILLs the spawned server outright. The old SIGTERM-then-busy-wait paid ~1.5 s on every quit: uvicorn's graceful shutdown hangs on the open/ws, and the synchronousBun.sleepSyncbusy-wait blocked the very event loop Bun needs to reap the child. Now ~1 ms.
Docs
- doc-sync (
kay → codeword,_peak_normalize, SIGKILL shutdown), ROADMAP updates, and a regenerated landing-page + README demo read in the codeword voice.
🤖 Generated with Claude Code
v3.2.0
Highlights
Pi deployment — The server now runs read-only on a Raspberry Pi host while the Mac stays the generation box (CSM-1B + Ollama). Live on the home network under PiZoW (PM2-managed, reboot-safe).
scripts/deploy-pi.sh— one-command deployment to Pi (build dashboard → rsync source+dist → venv+pip → PM2 start/restart)scripts/sync-pi.sh— sync WAVs + SQLite DB Mac→Pi on demand (keeps audio in sync across restarts)requirements-pi.txt— lightweight deps (no csm-mlx/MLX)config.pi.example.yaml— Pi config template (built-in speaker, same relative reader paths).env.example— Pi host/port/user config for deploy scripts- Mobile-responsive dashboard (existing Vue UI auto-adapts)
- Landing-page refresh: network feature callout, Pi redirect, hero refresh
All MLX/CSM-1B imports stay lazy — server boots on Pi without them. WS protocol + CLI protocol unchanged.
Install & upgrade
pip install -e .
cd src/cli && bun install && bun run start # unchangedFor Pi deployment, see SETUP.md section "Deploy to Pi".
v3.1.0 — Animation pass + landing redesign
What's new
A UI/UX polish pass across both web surfaces — purposeful animations guided by Emil Kowalski's design engineering principles, a fully redesigned landing page, and a refreshed dashboard screenshot. No protocol, API, or config changes.
Dashboard
- Staggered card entrance via Vue
<TransitionGroup>— cards fade + slide up with a per-card--idelay (capped at 8 so a full page doesn't drag) - Smooth accordion player —
grid-template-rows: 0fr↔1frso the expanded player animates to its real height with nomax-heightjump - Correct easing —
--ease-out: cubic-bezier(0.23,1,0.32,1)for entrances,--ease-drawer: cubic-bezier(0.32,0.72,0,1)for the accordion; no spring/bounce on functional UI - 8px corner radius on search box, sort toggle, cards panel, play button, skip controls
- Gentle reduced-motion — keeps opacity fades, drops all movement (slide, accordion height, press scale)
Landing page
- De-boxed layout — structure with whitespace + h2 hairlines; only the screenshot frame and terminal features block stay framed
- Story-grounded hero — "Read it back to me." with copy pulled from the project origin, not generic SaaS taglines
- Terminal features listing —
readback --featuresfaux shell block replaces the old grid - Trimmed to hook-and-redirect — four sections cut (flow diagram, quick-start, timeline, architecture stack) → single Dive-in band with GitHub links
- rAF stepper replacing
setInterval— drives the screenshot crossfade progress bar (transform: scaleX) without timer drift - Mobile fix — waveform player stacks to two rows on ≤600px (was overflowing at 375px)
Other
- Refreshed dashboard screenshot (1500×968, 1.550:1 — fills the demo frame edge-to-edge)
- All four version anchors bumped:
pyproject.toml,src/readback/__init__.py,src/cli/package.json,src/dashboard/package.json
Upgrading
Pull and rebuild the dashboard + CLI binary:
git pull
cd src/dashboard && bun run build && cd ../.. # rebuild dashboard dist/
cd src/cli && ./install.sh && cd ../.. # rebuild CLI binary (banner shows v3.1.0)Server restart picks up the new dashboard automatically.
v3.0.0 — library dashboard + persistence
A new web dashboard to replay past reads, backed by an on-device read library. It's a separate, model-free replay UI — the terminal CLI and the WebSocket protocol are unchanged.
✨ Highlights
- Read library (SQLite). Every synthesized read is now saved — title, summary, source URL, voice, duration, timestamp — in a local stdlib-
sqlite3library. Read-only paged REST:GET /api/library(search + sort + paginate),GET /api/library/{id},DELETE /api/library/{id}. - Web dashboard (Vue 3) — search, sort, and replay any past read with a full player (seek, ±5 s, pause/replay) and a word-by-word synced transcript; delete removes the row and its audio. Same terminal aesthetic as the CLI.
- Generate once, replay anytime. The heavy LLM + neural-TTS work runs on demand; replaying a saved read is light and model-free — so the dashboard stays tiny.
⚠️ Breaking change
- Audio + library location. Generated audio and the new SQLite library are written to a configurable folder via
reader.output_dir/reader.library_db, moved out of the old~/.readback/reader/default. Upgrading? Move your existing WAVs and point those config keys wherever you like (or set them back to~/.readback/readerinconfig.yaml).
Details: PR #12 · compare v2.0.0...v3.0.0
v2.0.0 — CLI-only pivot
CLI-only — the web UI is gone, long live the terminal
readback v2.0.0 commits fully to the terminal. The React/Vite browser UI — the project's original interface — has been removed, the Python server is now a pure WS/API backend for the CLI, and the whole repo has been restructured around that reality.
Breaking changes
- Browser UI removed —
GET /returns 404; the server speaks only/ws,GET /api/config,GET /api/models, and/audio - TLS flags removed —
--auto-cert/--cert/--keyare gone (they existed for LAN browser access);cryptographydependency dropped - Import paths changed —
readback.reader.*→readback.pipeline.*,readback.web.*→readback.server.*
Repo restructure
src/layout — the Python package, CLI, voice clips, and LoRA pipeline now live undersrc/(src/readback,src/cli,src/voice,src/finetune)docs/—ARCHITECTURE.md,SETUP.md,PLAN.md, and all media collected under one roof; README gained a Documentation index- Install is now just
pip install -e .+cd src/cli && ./install.sh— no node_modules, no build step before server start
Also
docs/SETUP.mdanddocs/ARCHITECTURE.mdrewritten for the CLI-only era- Last web-era references purged from every doc surface
Full changelog: v1.1.0...v2.0.0
🤖 Generated with Claude Code
