Skip to content

Releases: MKS-01/readback

v4.1.0 — Audio quality + read cache

Choose a tag to compare

@MKS-01 MKS-01 released this 27 Jun 20:02
11754cc

v4.1.0 — audio quality, synthesis performance, and developer experience.

Audio quality

  • Degenerate-chunk guard — all-silence chunks retry synthesis once before being dropped
  • Crossfade joins — 100 ms linear fade-out at chunk tails smooths the voiced→silence transition, completing the post-processing chain alongside peak normalization

Performance

  • Read cache — re-reads skip the entire pipeline (fetch → summarize → synthesize). Cache key (url, mode, voice, llm_model) with a composite index; only hits when the WAV still exists on disk
  • Faster synthesis — fp32 → bf16 default (~6% faster), chunk cap 280 → 400 chars (~30% fewer CSM prefills), sampler cached per (temperature, top_k)
  • New llm_model column on the reads table (auto-migrated)

CLI

  • Generation timer — player shows "Xs to generate" for live reads
  • Library UI revamp — inline mode · duration · words · date per row, space to preview audio without leaving the library, enter for the full player
  • Venv auto-detect — server spawns via .venv/bin/python3 -m readback (no activation needed); startup stderr captured

Tests, CI & docs

  • Test suite trimmed 59 → 38; new docs/TESTS.md catalogue; CI JUnit summaries
  • All doc surfaces synced; JOURNEY.md + finetune README rewritten
  • CLI screenshots refreshed for v4.1.0

Full changelog: #20

v4.0.0 — Full MLX LLM stack

Choose a tag to compare

@MKS-01 MKS-01 released this 20 Jun 16:28

Summary LLM and vision OCR now run in-process via mlx-lm + mlx-vlm on Apple Silicon, unifying with CSM-1B TTS under one framework. Ollama removed — no external daemon needed.

What changed

  • OllamaConfigLLMConfig; config key ollama:llm: (old key auto-migrated)
  • ollama dependency replaced by mlx-lm + mlx-vlm
  • +25–30% generation speed
  • Model discovery scans HF cache instead of Ollama API
  • Default model: mlx-community/Qwen3.5-9B-4bit

v3.7.0 — Design system consistency pass

Choose a tag to compare

@MKS-01 MKS-01 released this 20 Jun 06:47
5a85801

Presentational-only release — no protocol, API, or config changes.

What's new

  • Shared token layer — canonical CSS tokens (src/design-system/tokens/) for colors, typography, spacing, and motion
  • Design system viewer — single-page browser (src/design-system/index.html) with 9 component specimens and 3 interactive UI kits (Terminal, Dashboard, Landing)
  • Dashboard + landing page now import from / inline the same token set — consistent palette, type scale, and motion curves across all surfaces
  • README updated with two full-page design system screenshots

No breaking changes. WS protocol and CLI unchanged.

v3.6.0 — Optimisation + UI polish

Choose a tag to compare

@MKS-01 MKS-01 released this 17 Jun 21:45
2fd61b7

What's new

Performance

  • Default model → qwen3.5:9b — faster with comparable quality; LLMClient reused across the pipeline
  • Timing instrumentation — server logs per-read timings; done WS payload includes timings
  • set_temperature / swap_voice off the thread pool — plain attribute mutations, no asyncio.to_thread
  • Model recommendation prefers default family/model recommends qwen3.5:27b (same family as default)

Player fix

  • Pause/resume: kill+restart replaces SIGSTOP/SIGCONT — eliminates CoreAudio buffer bleed and rapid-toggle audio pops

CLI UI polish

  • Responsive progress bars (fill terminal width)
  • Transcript scroll window (12-line cap with auto-follow)
  • Structured /help component (commands + player keys, colored and aligned)
  • Cleaner /model list (no emoji, aligned columns, vision as text tag)
  • /lib library: selected-item-only metadata + summary preview, friendly dates
  • /lib shortcut in intro hints
  • Dashboard: pixel wordmark image (matches CLI block-art identity)

Dashboard polish

  • :focus-visible, ::selection, custom scrollbar (parity with landing page)
  • Hover effects gated behind @media (hover: hover) for mobile/Pi
  • Press states on sort buttons

Landing page

  • 5-slide CLI stepper (dashboard moved to own section)
  • "Beyond the terminal" section: dashboard + PiZoW home server side-by-side, clickable
  • Slide crossfade uses blur to mask overlap
  • Staggered card entrance
  • Tighter hero pitch copy

Tooling

  • Unified Bun commandsbun run dev, start, build work in both CLI and dashboard
  • Drop redundant bun install (Bun auto-installs)
  • Incremental sync-pi.sh with .last-sync marker
  • New skills: ghost-design-system, drive-cli, refresh-screenshots

Full Changelog: v3.5.0...v3.6.0

v3.5.0 — Image OCR, book scans, map-reduce summaries, source-aware tones

Choose a tag to compare

@MKS-01 MKS-01 released this 17 Jun 19:28
c1bb357

What's new

  • Image OCR — drop an image path; Ollama vision extracts the text and reads it aloud (_ocr_via_ollama, auto pick_vision_model)
  • Multi-page / book scans — a folder or glob of page photos is OCR'd in filename order and stitched into one continuous document (fetch_multi_page)
  • Map-reduce summarization — long scans summarize end-to-end instead of truncating; _batches → condense → combine, recursion depth ≤ 3
  • Source-aware tones — URL reads as a livelier article (temp 0.8); image/folder reads as a measured book (temp 0.6) that opens by naming its chapter/topic. Auto by source, no new commands

Details

  • New pipeline/tones.py: Tone dataclass, ARTICLE / BOOK instances, classify_source, tone_for
  • extract.py: _book_title_from_text derives chapter/topic from first OCR lines; HEIC/TIFF/BMP/WebP → JPEG via sips
  • summarize.py: system param threaded through _summarize_once / _map_reduce; per-batch progress via WS
  • set_temperature on Synthesizer + CsmEngine for per-read delivery tuning
  • CLI input guard extended for absolute paths, globs, and tilde paths
  • Tests: test_tones.py, test_summarize_batches.py

Full Changelog: v3.3.0...v3.5.0

v3.3.0 — codeword voice, loudness normalization, instant CLI quit

Choose a tag to compare

@MKS-01 MKS-01 released this 15 Jun 21:23

Minor release — additive behavior + a default-config change; WS/CLI protocols unchanged.

TTS / voice

  • New codeword clone voice replaces kay — CSM-bootstrapped reference (self-generated from a one-off clone, so no source audio is retained; ref_text exactly matches what's spoken). Default temperature bumped to 0.7.
  • Loudness normalization_peak_normalize scales every read to 0.95. CSM matches the energy of its reference clip, so clone voices previously read ~18 dB quieter than the built-ins; now every voice lands at the same level.

CLI

  • Instant quitstopServer SIGKILLs the spawned server outright. The old SIGTERM-then-busy-wait paid ~1.5 s on every quit: uvicorn's graceful shutdown hangs on the open /ws, and the synchronous Bun.sleepSync busy-wait blocked the very event loop Bun needs to reap the child. Now ~1 ms.

Docs

  • doc-sync (kay → codeword, _peak_normalize, SIGKILL shutdown), ROADMAP updates, and a regenerated landing-page + README demo read in the codeword voice.

🤖 Generated with Claude Code

v3.2.0

Choose a tag to compare

@MKS-01 MKS-01 released this 14 Jun 21:13
fd58e20

Highlights

Pi deployment — The server now runs read-only on a Raspberry Pi host while the Mac stays the generation box (CSM-1B + Ollama). Live on the home network under PiZoW (PM2-managed, reboot-safe).

  • scripts/deploy-pi.sh — one-command deployment to Pi (build dashboard → rsync source+dist → venv+pip → PM2 start/restart)
  • scripts/sync-pi.sh — sync WAVs + SQLite DB Mac→Pi on demand (keeps audio in sync across restarts)
  • requirements-pi.txt — lightweight deps (no csm-mlx/MLX)
  • config.pi.example.yaml — Pi config template (built-in speaker, same relative reader paths)
  • .env.example — Pi host/port/user config for deploy scripts
  • Mobile-responsive dashboard (existing Vue UI auto-adapts)
  • Landing-page refresh: network feature callout, Pi redirect, hero refresh

All MLX/CSM-1B imports stay lazy — server boots on Pi without them. WS protocol + CLI protocol unchanged.

Install & upgrade

pip install -e .
cd src/cli && bun install && bun run start    # unchanged

For Pi deployment, see SETUP.md section "Deploy to Pi".

v3.1.0 — Animation pass + landing redesign

Choose a tag to compare

@MKS-01 MKS-01 released this 14 Jun 18:25
119156c

What's new

A UI/UX polish pass across both web surfaces — purposeful animations guided by Emil Kowalski's design engineering principles, a fully redesigned landing page, and a refreshed dashboard screenshot. No protocol, API, or config changes.

Dashboard

  • Staggered card entrance via Vue <TransitionGroup> — cards fade + slide up with a per-card --i delay (capped at 8 so a full page doesn't drag)
  • Smooth accordion playergrid-template-rows: 0fr↔1fr so the expanded player animates to its real height with no max-height jump
  • Correct easing--ease-out: cubic-bezier(0.23,1,0.32,1) for entrances, --ease-drawer: cubic-bezier(0.32,0.72,0,1) for the accordion; no spring/bounce on functional UI
  • 8px corner radius on search box, sort toggle, cards panel, play button, skip controls
  • Gentle reduced-motion — keeps opacity fades, drops all movement (slide, accordion height, press scale)

Landing page

  • De-boxed layout — structure with whitespace + h2 hairlines; only the screenshot frame and terminal features block stay framed
  • Story-grounded hero — "Read it back to me." with copy pulled from the project origin, not generic SaaS taglines
  • Terminal features listingreadback --features faux shell block replaces the old grid
  • Trimmed to hook-and-redirect — four sections cut (flow diagram, quick-start, timeline, architecture stack) → single Dive-in band with GitHub links
  • rAF stepper replacing setInterval — drives the screenshot crossfade progress bar (transform: scaleX) without timer drift
  • Mobile fix — waveform player stacks to two rows on ≤600px (was overflowing at 375px)

Other

  • Refreshed dashboard screenshot (1500×968, 1.550:1 — fills the demo frame edge-to-edge)
  • All four version anchors bumped: pyproject.toml, src/readback/__init__.py, src/cli/package.json, src/dashboard/package.json

Upgrading

Pull and rebuild the dashboard + CLI binary:

git pull
cd src/dashboard && bun run build && cd ../..   # rebuild dashboard dist/
cd src/cli && ./install.sh && cd ../..           # rebuild CLI binary (banner shows v3.1.0)

Server restart picks up the new dashboard automatically.

v3.0.0 — library dashboard + persistence

Choose a tag to compare

@MKS-01 MKS-01 released this 13 Jun 15:58
f4899bc

A new web dashboard to replay past reads, backed by an on-device read library. It's a separate, model-free replay UI — the terminal CLI and the WebSocket protocol are unchanged.

✨ Highlights

  • Read library (SQLite). Every synthesized read is now saved — title, summary, source URL, voice, duration, timestamp — in a local stdlib-sqlite3 library. Read-only paged REST: GET /api/library (search + sort + paginate), GET /api/library/{id}, DELETE /api/library/{id}.
  • Web dashboard (Vue 3) — search, sort, and replay any past read with a full player (seek, ±5 s, pause/replay) and a word-by-word synced transcript; delete removes the row and its audio. Same terminal aesthetic as the CLI.
  • Generate once, replay anytime. The heavy LLM + neural-TTS work runs on demand; replaying a saved read is light and model-free — so the dashboard stays tiny.

⚠️ Breaking change

  • Audio + library location. Generated audio and the new SQLite library are written to a configurable folder via reader.output_dir / reader.library_db, moved out of the old ~/.readback/reader/ default. Upgrading? Move your existing WAVs and point those config keys wherever you like (or set them back to ~/.readback/reader in config.yaml).

Details: PR #12 · compare v2.0.0...v3.0.0

v2.0.0 — CLI-only pivot

Choose a tag to compare

@MKS-01 MKS-01 released this 12 Jun 15:35
0e18e3c

CLI-only — the web UI is gone, long live the terminal

readback v2.0.0 commits fully to the terminal. The React/Vite browser UI — the project's original interface — has been removed, the Python server is now a pure WS/API backend for the CLI, and the whole repo has been restructured around that reality.

Breaking changes

  • Browser UI removedGET / returns 404; the server speaks only /ws, GET /api/config, GET /api/models, and /audio
  • TLS flags removed--auto-cert / --cert / --key are gone (they existed for LAN browser access); cryptography dependency dropped
  • Import paths changedreadback.reader.*readback.pipeline.*, readback.web.*readback.server.*

Repo restructure

  • src/ layout — the Python package, CLI, voice clips, and LoRA pipeline now live under src/ (src/readback, src/cli, src/voice, src/finetune)
  • docs/ARCHITECTURE.md, SETUP.md, PLAN.md, and all media collected under one roof; README gained a Documentation index
  • Install is now just pip install -e . + cd src/cli && ./install.sh — no node_modules, no build step before server start

Also

  • docs/SETUP.md and docs/ARCHITECTURE.md rewritten for the CLI-only era
  • Last web-era references purged from every doc surface

Full changelog: v1.1.0...v2.0.0

🤖 Generated with Claude Code