Releases · MKS-01/readback

Release list

v4.1.0 — Audio quality + read cache Latest

Latest

MKS-01 released this 27 Jun 20:02

v4.1.0

11754cc

v4.1.0 — audio quality, synthesis performance, and developer experience.

Audio quality

Degenerate-chunk guard — all-silence chunks retry synthesis once before being dropped
Crossfade joins — 100 ms linear fade-out at chunk tails smooths the voiced→silence transition, completing the post-processing chain alongside peak normalization

Performance

Read cache — re-reads skip the entire pipeline (fetch → summarize → synthesize). Cache key (url, mode, voice, llm_model) with a composite index; only hits when the WAV still exists on disk
Faster synthesis — fp32 → bf16 default (~6% faster), chunk cap 280 → 400 chars (~30% fewer CSM prefills), sampler cached per (temperature, top_k)
New llm_model column on the reads table (auto-migrated)

CLI

Generation timer — player shows "Xs to generate" for live reads
Library UI revamp — inline mode · duration · words · date per row, space to preview audio without leaving the library, enter for the full player
Venv auto-detect — server spawns via .venv/bin/python3 -m readback (no activation needed); startup stderr captured

Tests, CI & docs

Test suite trimmed 59 → 38; new docs/TESTS.md catalogue; CI JUnit summaries
All doc surfaces synced; JOURNEY.md + finetune README rewritten
CLI screenshots refreshed for v4.1.0

Full changelog: #20

Assets 2

v4.0.0 — Full MLX LLM stack

MKS-01 released this 20 Jun 16:28

v4.0.0

b62914e

Summary LLM and vision OCR now run in-process via mlx-lm + mlx-vlm on Apple Silicon, unifying with CSM-1B TTS under one framework. Ollama removed — no external daemon needed.

What changed

OllamaConfig → LLMConfig; config key ollama: → llm: (old key auto-migrated)
ollama dependency replaced by mlx-lm + mlx-vlm
+25–30% generation speed
Model discovery scans HF cache instead of Ollama API
Default model: mlx-community/Qwen3.5-9B-4bit

Assets 2

v3.7.0 — Design system consistency pass

MKS-01 released this 20 Jun 06:47

v3.7.0

5a85801

Presentational-only release — no protocol, API, or config changes.

What's new

Shared token layer — canonical CSS tokens (src/design-system/tokens/) for colors, typography, spacing, and motion
Design system viewer — single-page browser (src/design-system/index.html) with 9 component specimens and 3 interactive UI kits (Terminal, Dashboard, Landing)
Dashboard + landing page now import from / inline the same token set — consistent palette, type scale, and motion curves across all surfaces
README updated with two full-page design system screenshots

No breaking changes. WS protocol and CLI unchanged.

Assets 2

v3.6.0 — Optimisation + UI polish

MKS-01 released this 17 Jun 21:45

v3.6.0

2fd61b7

What's new

Performance

Default model → qwen3.5:9b — faster with comparable quality; LLMClient reused across the pipeline
Timing instrumentation — server logs per-read timings; done WS payload includes timings
set_temperature / swap_voice off the thread pool — plain attribute mutations, no asyncio.to_thread
Model recommendation prefers default family — /model recommends qwen3.5:27b (same family as default)

Player fix

Pause/resume: kill+restart replaces SIGSTOP/SIGCONT — eliminates CoreAudio buffer bleed and rapid-toggle audio pops

CLI UI polish

Responsive progress bars (fill terminal width)
Transcript scroll window (12-line cap with auto-follow)
Structured /help component (commands + player keys, colored and aligned)
Cleaner /model list (no emoji, aligned columns, vision as text tag)
/lib library: selected-item-only metadata + summary preview, friendly dates
/lib shortcut in intro hints
Dashboard: pixel wordmark image (matches CLI block-art identity)

Dashboard polish

:focus-visible, ::selection, custom scrollbar (parity with landing page)
Hover effects gated behind @media (hover: hover) for mobile/Pi
Press states on sort buttons

Landing page

5-slide CLI stepper (dashboard moved to own section)
"Beyond the terminal" section: dashboard + PiZoW home server side-by-side, clickable
Slide crossfade uses blur to mask overlap
Staggered card entrance
Tighter hero pitch copy

Tooling

Unified Bun commands — bun run dev, start, build work in both CLI and dashboard
Drop redundant bun install (Bun auto-installs)
Incremental sync-pi.sh with .last-sync marker
New skills: ghost-design-system, drive-cli, refresh-screenshots

Full Changelog: v3.5.0...v3.6.0

Assets 2

v3.5.0 — Image OCR, book scans, map-reduce summaries, source-aware tones

MKS-01 released this 17 Jun 19:28

v3.5.0

c1bb357

What's new

Image OCR — drop an image path; Ollama vision extracts the text and reads it aloud (_ocr_via_ollama, auto pick_vision_model)
Multi-page / book scans — a folder or glob of page photos is OCR'd in filename order and stitched into one continuous document (fetch_multi_page)
Map-reduce summarization — long scans summarize end-to-end instead of truncating; _batches → condense → combine, recursion depth ≤ 3
Source-aware tones — URL reads as a livelier article (temp 0.8); image/folder reads as a measured book (temp 0.6) that opens by naming its chapter/topic. Auto by source, no new commands

Details

New pipeline/tones.py: Tone dataclass, ARTICLE / BOOK instances, classify_source, tone_for
extract.py: _book_title_from_text derives chapter/topic from first OCR lines; HEIC/TIFF/BMP/WebP → JPEG via sips
summarize.py: system param threaded through _summarize_once / _map_reduce; per-batch progress via WS
set_temperature on Synthesizer + CsmEngine for per-read delivery tuning
CLI input guard extended for absolute paths, globs, and tilde paths
Tests: test_tones.py, test_summarize_batches.py

Full Changelog: v3.3.0...v3.5.0

Assets 2

v3.3.0 — codeword voice, loudness normalization, instant CLI quit

MKS-01 released this 15 Jun 21:23

v3.3.0

7fa3b04

Minor release — additive behavior + a default-config change; WS/CLI protocols unchanged.

TTS / voice

New codeword clone voice replaces kay — CSM-bootstrapped reference (self-generated from a one-off clone, so no source audio is retained; ref_text exactly matches what's spoken). Default temperature bumped to 0.7.
Loudness normalization — _peak_normalize scales every read to 0.95. CSM matches the energy of its reference clip, so clone voices previously read ~18 dB quieter than the built-ins; now every voice lands at the same level.

CLI

Instant quit — stopServer SIGKILLs the spawned server outright. The old SIGTERM-then-busy-wait paid ~1.5 s on every quit: uvicorn's graceful shutdown hangs on the open /ws, and the synchronous Bun.sleepSync busy-wait blocked the very event loop Bun needs to reap the child. Now ~1 ms.

Docs

doc-sync (kay → codeword, _peak_normalize, SIGKILL shutdown), ROADMAP updates, and a regenerated landing-page + README demo read in the codeword voice.

🤖 Generated with Claude Code

Assets 2

v3.2.0

MKS-01 released this 14 Jun 21:13

3.2.0

fd58e20

Highlights

Pi deployment — The server now runs read-only on a Raspberry Pi host while the Mac stays the generation box (CSM-1B + Ollama). Live on the home network under PiZoW (PM2-managed, reboot-safe).

scripts/deploy-pi.sh — one-command deployment to Pi (build dashboard → rsync source+dist → venv+pip → PM2 start/restart)
scripts/sync-pi.sh — sync WAVs + SQLite DB Mac→Pi on demand (keeps audio in sync across restarts)
requirements-pi.txt — lightweight deps (no csm-mlx/MLX)
config.pi.example.yaml — Pi config template (built-in speaker, same relative reader paths)
.env.example — Pi host/port/user config for deploy scripts
Mobile-responsive dashboard (existing Vue UI auto-adapts)
Landing-page refresh: network feature callout, Pi redirect, hero refresh

All MLX/CSM-1B imports stay lazy — server boots on Pi without them. WS protocol + CLI protocol unchanged.

Install & upgrade

pip install -e .
cd src/cli && bun install && bun run start    # unchanged

For Pi deployment, see SETUP.md section "Deploy to Pi".

Assets 2

v3.1.0 — Animation pass + landing redesign

MKS-01 released this 14 Jun 18:25

v3.1.0

119156c

What's new

A UI/UX polish pass across both web surfaces — purposeful animations guided by Emil Kowalski's design engineering principles, a fully redesigned landing page, and a refreshed dashboard screenshot. No protocol, API, or config changes.

Dashboard

Staggered card entrance via Vue <TransitionGroup> — cards fade + slide up with a per-card --i delay (capped at 8 so a full page doesn't drag)
Smooth accordion player — grid-template-rows: 0fr↔1fr so the expanded player animates to its real height with no max-height jump
Correct easing — --ease-out: cubic-bezier(0.23,1,0.32,1) for entrances, --ease-drawer: cubic-bezier(0.32,0.72,0,1) for the accordion; no spring/bounce on functional UI
8px corner radius on search box, sort toggle, cards panel, play button, skip controls
Gentle reduced-motion — keeps opacity fades, drops all movement (slide, accordion height, press scale)

Landing page

De-boxed layout — structure with whitespace + h2 hairlines; only the screenshot frame and terminal features block stay framed
Story-grounded hero — "Read it back to me." with copy pulled from the project origin, not generic SaaS taglines
Terminal features listing — readback --features faux shell block replaces the old grid
Trimmed to hook-and-redirect — four sections cut (flow diagram, quick-start, timeline, architecture stack) → single Dive-in band with GitHub links
rAF stepper replacing setInterval — drives the screenshot crossfade progress bar (transform: scaleX) without timer drift
Mobile fix — waveform player stacks to two rows on ≤600px (was overflowing at 375px)

Other

Refreshed dashboard screenshot (1500×968, 1.550:1 — fills the demo frame edge-to-edge)
All four version anchors bumped: pyproject.toml, src/readback/__init__.py, src/cli/package.json, src/dashboard/package.json

Upgrading

Pull and rebuild the dashboard + CLI binary:

git pull
cd src/dashboard && bun run build && cd ../..   # rebuild dashboard dist/
cd src/cli && ./install.sh && cd ../..           # rebuild CLI binary (banner shows v3.1.0)

Server restart picks up the new dashboard automatically.

Assets 2

v3.0.0 — library dashboard + persistence

MKS-01 released this 13 Jun 15:58

v3.0.0

f4899bc

A new web dashboard to replay past reads, backed by an on-device read library. It's a separate, model-free replay UI — the terminal CLI and the WebSocket protocol are unchanged.

✨ Highlights

Read library (SQLite). Every synthesized read is now saved — title, summary, source URL, voice, duration, timestamp — in a local stdlib-sqlite3 library. Read-only paged REST: GET /api/library (search + sort + paginate), GET /api/library/{id}, DELETE /api/library/{id}.
Web dashboard (Vue 3) — search, sort, and replay any past read with a full player (seek, ±5 s, pause/replay) and a word-by-word synced transcript; delete removes the row and its audio. Same terminal aesthetic as the CLI.
Generate once, replay anytime. The heavy LLM + neural-TTS work runs on demand; replaying a saved read is light and model-free — so the dashboard stays tiny.

⚠️ Breaking change

Audio + library location. Generated audio and the new SQLite library are written to a configurable folder via reader.output_dir / reader.library_db, moved out of the old ~/.readback/reader/ default. Upgrading? Move your existing WAVs and point those config keys wherever you like (or set them back to ~/.readback/reader in config.yaml).

Details: PR #12 · compare v2.0.0...v3.0.0

Assets 2

v2.0.0 — CLI-only pivot

MKS-01 released this 12 Jun 15:35

v2.0.0

0e18e3c

CLI-only — the web UI is gone, long live the terminal

readback v2.0.0 commits fully to the terminal. The React/Vite browser UI — the project's original interface — has been removed, the Python server is now a pure WS/API backend for the CLI, and the whole repo has been restructured around that reality.

Breaking changes

Browser UI removed — GET / returns 404; the server speaks only /ws, GET /api/config, GET /api/models, and /audio
TLS flags removed — --auto-cert / --cert / --key are gone (they existed for LAN browser access); cryptography dependency dropped
Import paths changed — readback.reader.* → readback.pipeline.*, readback.web.* → readback.server.*

Repo restructure

src/ layout — the Python package, CLI, voice clips, and LoRA pipeline now live under src/ (src/readback, src/cli, src/voice, src/finetune)
docs/ — ARCHITECTURE.md, SETUP.md, PLAN.md, and all media collected under one roof; README gained a Documentation index
Install is now just pip install -e . + cd src/cli && ./install.sh — no node_modules, no build step before server start

Also

docs/SETUP.md and docs/ARCHITECTURE.md rewritten for the CLI-only era
Last web-era references purged from every doc surface

Full changelog: v1.1.0...v2.0.0

🤖 Generated with Claude Code

Assets 2

Releases: MKS-01/readback

Release list

v4.1.0 — Audio quality + read cache

Audio quality

Performance

CLI

Tests, CI & docs

Uh oh!

v4.0.0 — Full MLX LLM stack

What changed

Uh oh!

v3.7.0 — Design system consistency pass

What's new

Uh oh!

v3.6.0 — Optimisation + UI polish

What's new

Performance

Player fix

CLI UI polish

Dashboard polish

Landing page

Tooling

Uh oh!

v3.5.0 — Image OCR, book scans, map-reduce summaries, source-aware tones

What's new

Details

Uh oh!

v3.3.0 — codeword voice, loudness normalization, instant CLI quit

TTS / voice

CLI

Docs

Uh oh!

v3.2.0

Highlights

Install & upgrade

Uh oh!

v3.1.0 — Animation pass + landing redesign

What's new

Dashboard

Landing page

Other

Upgrading

Uh oh!

v3.0.0 — library dashboard + persistence

✨ Highlights

⚠️ Breaking change

Uh oh!

v2.0.0 — CLI-only pivot

CLI-only — the web UI is gone, long live the terminal

Breaking changes

Repo restructure

Also

Uh oh!