Releases: dudarenok-maker/Castwright
Castwright v1.8.0
The open-beta release. Castwright reaches more machines this cycle — an early AMD GPU preview and a one-click Pinokio install — on top of a deep round of analysis honesty, multilingual depth, and GPU-contention resilience that keeps long runs upright on an 8 GB card.
✨ Headline features
🟧 AMD GPU support — early preview (new)
Castwright can reach for an AMD GPU when it finds one, with a safe net under it.
- Auto-detect with CPU fallback — ROCm / DirectML is detected on its own; if the GPU path isn't ready it quietly falls back to the processor so the app still runs (#818).
- Accelerator control — an
ACCELERATORknob + in-app picker, with the resolved per-engine profile surfaced on/health. Kokoro stays on CPU under DirectML (a documented DirectML limitation, not a fault). NVIDIA and Apple Silicon remain the smoothest paths.
📦 One-click install with Pinokio (new)
A self-contained conda install (ops-16) built from the latest published release — no terminal, nothing to install by hand — landing in the same guided first-run as the desktop installers (#821).
🧠 Pick the model that reads your book (new)
The local analyzer is now yours to choose (plan 221).
- Installed-only model picker — pick any Ollama model you've pulled per run; not-yet-pulled curated models install from the Model Manager (#851, #859, #860).
- Honest residency + label — warm/residency and the analysing chip key to the model actually doing the reading, not the configured default; an
ANALYZER_KEEP_ALIVEknob; per-phase model support.
🩺 Honest engine health + one-tap Repair
The Model Manager stops showing a hopeful green light (plan 220).
- True per-engine health — package / weights / integrity, with a "Needs repair" badge and a one-click Repair + sidecar restart (#837).
- Re-tiered engines — Qwen → standard (GPU), Coqui → opt-in, Whisper = base; fail-open readiness + diagnostics.
🌍 Multilingual & attribution (plan 221)
- Cyrillic, end to end — character names, ids and cross-book keys handle non-Latin scripts (#852).
- Steadier non-English attribution — a deterministic narrator-default heuristic, a Russian dash-dialogue preamble guard, and script-aware attribution + ASR normalizers (#852, #824).
- Localized cast review — language-aware minor-cast fold buckets, so a Russian book's grouped roles read in Russian (e.g. Незнакомый Парень / Незнакомая Девушка) instead of English (#856).
📊 Analysing view — honesty & live progress
- Truthful progress — a per-chapter section sub-bar and counts, live ETA refinement, and a model-label chip that mirrors the server-resolved analyzer model (#841, #864, #826).
- Reload-proof — a reconnecting bridge so refreshing the page no longer blanks the elapsed timer (#869).
- Big-chapter handling — Stage-1 chunking for oversized chapters and cast-detection name-fidelity guards (#825, #827).
🎮 GPU residency & resilience (plan 222)
- Waits its turn instead of crashing —
withGpuLoaddoes an atomic evict+verify+load and refuses on a busy card (409) rather than OOM-crashing; a top-bar "GPU busy · N waiting" pill says why (#840, #841). - Smarter residency — a VRAM-threshold policy keeps the analyzer resident across the analysis loop on a GPU; voice-design and generation preload run through the same gated path.
- VRAM telemetry substrate — passive, env-gated per-engine VRAM sampling (fs-45, record-only; MB-accounting deferred) with a clean-process gate (#861, #863).
🎙️ Voice design & casting
- A/B compare modal fixed — portaled out of the clip-path drawer so it no longer renders clipped; Play-current resolves the Qwen voice and shows the descriptor on Side A; play errors surface (#832, #834).
- Age made audible — Qwen voice-design personas describe age acoustically, not just as a label (#831).
🎧 Listening & companion
- Offline waveform — downloaded chapters persist their peaks, so the phone scrubber stays drawn with no signal.
🏗️ Under the hood
- Kokoro uses the NVIDIA GPU — forces
onnxruntime-gpuvia an ORT swap (notkokoro-onnx[gpu]); a failed swap is fatal, so it can't silently run on the CPU (#828). - Analyzer tolerates stray model keys instead of failing the run (#839).
- Test resilience —
test:serverauto-retries once on a vitest fork-pool worker-crash (#850); theanalysis-pipeliningrolling-roster CPU-contention timeout flake is quarantined in CI (#875). - Release hygiene —
.gitignorenow covers the renamedcastwright-workspace/(#867); a CodeQL workflow and a pre-push commit-subject guard land (#858). - Docs & Help — CODE_OF_CONDUCT, a repo-opening-public checklist, repo legal pointers, plan-221/222 reconciliations, and new offline Help topics for analysis model-reload / "GPU busy" and an engine that reads "Needs repair".
Full changelog: v1.7.0...v1.8.0
Castwright v1.7.0
The Castwright release. The project grew up this cycle: a new name and identity,
a companion app for your phone and car, guided first-run setup, a one-click sample
book, and first-class support for Apple Silicon Macs — on top of a deep round of
generation-quality and reliability work.
⚠️ Upgrade note: 1.6.0 installs cannot self-upgrade across the rename.
Alpha installs reinstall fresh as Castwright. Your library/data directories move
to the new Castwright paths on first run.
✨ Headline features
📱 A mobile companion app (new)
Castwright now has a native Flutter companion for Android (and iOS), paired to your
desktop server over your home network — no cloud, no account.
- One-scan pairing — pair by scanning a compact QR code; the channel is
certificate-pinned and protected by a short-lived pairing code, with per-device
tokens you can revoke (#562, #565–#567, #679, #696, #591). - Take your library offline — delta sync downloads books for offline listening,
with range-resume, atomic swaps, accounting and automatic eviction (#572, #573,
#606, #614, #616). - Native player — lock-screen / media-key control, per-book resume, two-way
resume sync between phone and desktop, per-chapter waveforms (#575, #576, #604, #610). - Listen in the car — Android Auto / CarPlay in-car browser with a downloaded-only
2-tab "This Book / Library" layout and current-chapter highlighting (#588, app-9). - Browse, search & continue — author → series → book hierarchy, a home shelf with
"Continue listening", and multi-book switching (#577, #582, #615, #618). - Stream over LAN for instant play without a full download (#589).
- Distribution — signed release APK with an alpha channel, plus a Google Play
AAB lane (#586, #777); the APK is also offered as a download from the desktop app (#661).
🚀 First-run setup & onboarding (new)
Getting from a fresh clone to a working install is now guided end-to-end.
- First-run setup wizard (fs-21) — a five-step wizard that checks everything
Castwright needs, installs the default Kokoro voice engine in-app, bootstraps the
Python sidecar venv, and runs a two-tier smoke test — with plain-language remediation
when something's missing (#744, #748, #749, #750, #751). - In-app guided tour (fe-38) — a spotlight walkthrough of the core flow, launchable
any time from the top-bar ? menu (#765, #772). - In-app Help / troubleshooting view (fe-29) with a unified, friendly analysis-failure
taxonomy so errors tell you what to do next (#740, #741).
📖 Try it in one click (new)
A bundled original sample — The Coalfall Commission, a 2-chapter / 14-character
showcase — ships with every voice pre-designed, so you can generate and hear a full-cast
performance before importing anything of your own (#727, #728). You can also replace the
manuscript on an existing book while preserving its designed cast (#724).
🍎 Runs on a Mac
First-class Apple Silicon support — the sidecar auto-detects Metal (mps), with graceful
CUDA → mps → CPU fallback and cross-platform launch scripts. Intel Macs work too (#702, #703).
🎙️ Voice design & casting
- Design full cast — one click designs a bespoke Qwen voice for every "Needs voice"
character as a background, reload-resilient job (#637). - Single voice design is now background-survivable with honest live progress (#639).
- Per-book bulk emotion-variant design with a per-character cast-table glyph strip,
and emotion variants that travel across linked books in a series (#687). - Has / Needs emotion-variant filters and per-card "Needs variant" badges (#642, #643).
- Per-quote emotion loop completed — detect-emotions + UX (fs-33/fs-34) (#596).
- Cross-book voice reuse hardened — match by stable voice id when names drift, scope
generic role-names (e.g. "Narrator") to the same series, and stop cross-series mismatches
on the confirm screen (#634, #681, #689, #693, #694).
🔊 Generation quality & reliability
- Per-sentence QA gate — every line is acoustically checked and (with srv-31)
transcript-verified via Whisper ASR before a chapter is assembled, with automatic
re-record of the broken ones (#513, #526, #531, #646). - Generation stall protection — long runs ride out sidecar recycles and recover on
their own (defense-in-depth, three waves) (#673, #677). - Voice-design VRAM contention robustness — engine mutual-exclusion, liveness timeouts,
honest progress (#685). - Attribution coverage guards for large chapters — re-split under-budget stage-2
chapters, recover dropped speakers, preserve tagged speakers (#516, #520, #532, #609, #678). - Golden-audio regression harness (ops-11) — opt-in acoustic regression gate (#527).
🎧 Listening experience (web)
- Continue-listening rail + reading-stats dashboard (fs-15 / fs-16) — your shelf
remembers what you were mid-way through;#/statsshows streaks and hours, fed by a
wall-clock accumulator and offline buffering (#783, #792). - Listen download section finalized with truthful, store-level export progress (#675).
- Real per-chapter waveform bars in the Listen view (fe-33) (#585).
⚙️ Models, settings & covers
- In-app Model Manager (fs-23) — load/unload engines and per-model Ollama residency
from the app (#581, #766). - In-app Advanced Settings — ~70 model/generation/QA knobs with an env-precedence
resolver and drift-guarded.env.example(#669). - Multi-source cover search — OpenLibrary + Apple + Google, with free-text matching
and per-source badges (#697). - Device ground-truth on
/healthplus a diagnostics panel (side-14) (#718).
🔌 Sync & server infrastructure
- LAN security — opt-in shared-secret token guard and per-device tokens with revoke
(srv-20 / srv-33) (#561, #564, #591). - Sync primitives — stable per-chapter UUIDs, a delta-friendly per-chapter sync
manifest with durations, and guarded listen-progress writes (srv-32/34/35) (#558, #569, #570, #601). - Merge journal for deterministic alias un-linking (srv-1) (#793).
🏗️ Under the hood
- Castwright rebrand end-to-end — package names, release artifact
(castwright-vX.Y.Z.zip), data dirs, startup banner, in-app/aboutpage, branded
narrator-credit default, and self-hosted General Sans + Lora fonts (no runtime
font CDN) (#623, #629, #653, #657, #660, #698, #713). - Public-readiness docs + licensing (FSL-1.1-ALv2) (#663, #664).
- Dependency majors round 3 (#712).
Full changelog: v1.6.0...v1.7.0
Castwright 1.6.0
Castwright 1.6.0
Seed release for in-app self-upgrade. From here, future versions install
themselves from a hand-delivered bundle (Account → Application updates), on top
of a round of reliability, observability and listening polish.
⚠️ Upgrade note: the jump into 1.6.0 is manual; 1.6.0 → 1.7.0 is the
first self-upgrade.
✨ Headline features
🚀 Update from inside the app (new)
One-click cross-version upgrades for hand-delivered alpha bundles (fs-1).
- Versioned-directory install layout —
releases/vX.Y.Z/+ a stable
launch.mjs+ sharedworkspace//venv//models/siblings, so the
running release is never touched and rollback is just not flipping the pointer. - Safe migrations — a boot coordinator backs up every workspace JSON before
migrating; a top-bar version pill + what's-new banner surface the running version.
🌍 Multi-language — Russian (new)
A book-level BCP-47 language field, end to end (fs-2, language half).
- Cyrillic auto-detection + a confirm-step selector; designed Qwen voices
speak the book's language; Cyrillic-aware analyzer token estimates and
per-language attribution; a Listen language badge. - Never-cross-language invariant force-routes non-English books to Qwen
(Kokoro is English-only) and blocks any silent cross-language fallback. English
books are byte-identical to before.
🩺 Admin watch console (new)
The former dev-only Worktrees view is now an all-users Admin console at
#/admin (fs-18).
- Health board — green / amber / red on GPU / VRAM, TTS sidecar + resident
models, analyzer connectivity, ffmpeg, free disk, from a newGET /api/diagnostics,
plus generation throughput. The top-bar pill carries a health status dot.
🎧 Listening experience
- Auto-advance / continuous playback — the mini-player advances to the next
chapter and keeps playing, behind a default-on toggle (Account → Advanced), so
a book plays hands-free end to end (fe-23). - Skip forward / back — intra-chapter ±15s / ±30s seek in the mini-player
with rebindable shortcuts (defaults J / L) and configurable deltas (fe-24).
🔊 Generation quality & reliability
- Post-synthesis audio QA — each finished chapter gets a cheap automated
check (near-silent / clipped / truncated / runaway duration) and an advisory
"Suspect" badge with the reason in Generate + Listen, so garbled or empty
renders are flagged before the listener hits them (srv-27). - Pre-flight disk-space guard — free space is checked against an estimate
before a run or export and a warning is surfaced when it's tight (configurable
to block) — no more failing 40 chapters into a run (srv-28). - Silent generation stalls could leave a chapter showing a misleading
"Queued" forever. Now a per-chapter no-progress watchdog records a real
failure, leak-saturated orphan sidecars are no longer adopted, and the
supervisor health-polls adopted sidecars. - A Qwen reload doubled VRAM into the Windows sysmem-spill stall. Now the
reload no longer doubles VRAM and the recycle watchdog also keys on reserved VRAM. - "Design & compare" broke on a missing voice or a design-model race. Now it
no longer breaks, and the persona prompt aligns with the official VoiceDesign format. - Ungenerated chapters masqueraded as playable in the Listen view. Now Play /
Share are gated on generated audio.
⚙️ Settings, observability & models
- Plain-language failure messages — recurring failures (sidecar down, VRAM
spill, rate-limit, OOM, disk-full, model-not-loaded, timeouts) now show a
human-readable message plus a "what to do next" line instead of a raw error
string (fs-19). - Per-run resource telemetry — per-chapter RTF, VRAM, host RAM and wall-time
are logged and charted in a new Admin "Resource trends" panel for
perf-regression visibility (fs-20). - Auto-backup of
state.json— scheduled per-book snapshots on a
configurable cadence with retention + one-click restore from Account (srv-2). - Power-user tuning — rebindable shortcuts, accessibility toggles
(high-contrast + larger text) and an autosave-debounce knob, device-local (fe-2). - In-app Coqui XTTS v2 installer, plus an A/B current-vs-proposed voice
audition in the Qwen voice-design flow.
🏗️ Under the hood
- Dependency major upgrades — React 18→19, Vite 5→8 (Rolldown), Vitest 2→4,
react-router 6→7, TypeScript 5→6 (plan 167); Zod 3→4, Express 4→5, pdfjs-dist
4→5, Tailwind 3→4 (plan 170); GitHub Actions runners off the deprecated Node-20 majors. - CI cost — path-filtered per-PR verify, draft-by-default + integration
batching, a doc-only CI fast-path, and a local pre-push guard that refuses
force-push / deletion ofmain. - Release pipeline — three-way version lockstep (root + server + sidecar
version.py), a cross-OS gate fired before tagging, andRELEASE_NOTES.md
baked into the zip.
Full changelog: v1.5.1...v1.6.0
Castwright 1.5.1
Castwright 1.5.1
A stability + hardening release. The bulk of this release hardens the
Qwen3-TTS default from v1.5.0: it drives long-run sidecar memory pressure down
to a survivable, self-recovering state, makes a Qwen→Kokoro fallback loud
instead of silent, finishes the reused-voice / persona consistency work, and
closes a default-mode LAN exposure. No data migration required.
✨ Headline features
🔌 Default-bind to loopback (new)
A security fix that closes the 2026-05-31 review's top findings.
- In default mode the server now binds
127.0.0.1only, so the unauthenticated
API and the/workspacestatic mount (manuscripts, audio,state.json/
cast.json) are no longer reachable from other machines on a shared Wi-Fi.
The opt-innpm run start:lanmobile flow is unchanged;BIND_HOST=0.0.0.0
restores all-interface HTTP.
▶️ Resume generation (new)
- A one-click way to continue a book whose run was interrupted (queued chapters
left over, nothing in flight). Opening a book still never auto-starts
generation — this is the explicit recovery affordance.
🔊 Generation reliability
- Long Qwen runs climbed host RAM until the server was OOM-killed mid-book.
Now bounded by host-RAM reclaim on model unload, an RSS / committed watchdog, a
/debug/memoryreadout, and a process-recycle keyed on committed-private memory. - A crashed sidecar used to drop the in-flight book. Now it respawns on
unexpected exit, the readiness gate polls through a respawn instead of failing
fast, and the in-flight + queued chapters started during a recycle drain are
recovered. The server is authoritative for queue completion. - Recycles interrupted a chapter mid-render. Now crossing the memory
soft-threshold drains and recycles cleanly between chapters, so a long book
rides out the pressure without a dropped chapter. - A Qwen book could silently downgrade to Kokoro. Now a
/healthhandshake
plus a loud per-chapter fallback gate (with a resident-model pill showing every
loaded model) means a fallback is always visible, never silent. - A local Qwen timeout was misreported as "Gemini rate-limited" and halted
the whole book. Now a stalled chapter waits on the readiness gate and
re-renders, then skips non-fatally if it can't recover. - Non-narration chapters no longer queue or hang the parallel synth tail.
- CUDA-fragmentation OOM fixed via
expandable_segments.
🎙️ Voice & cast
- Reused characters lost their designed voice / persona. Now reused
characters keep their bespoke voice and persona — the voice /voiceStyleare
denormalised at the link and auto-match write sites, and the designed persona
is shown for reused characters.
🎧 Listening & recovery
- A failed chapter forgot its state on reload. Now it shows "Failed · reason"
with Retry after a reload, plus a per-row "Generate this chapter" escape hatch
and a "Generated " line on done rows. - Per-book
state.jsonauto-backup — a scheduled background sweep snapshots
each book'sstate.json(daily / weekly, newest-N retained) with a manual
restore picker in Account. - Crash diagnostics — FATAL crashes and unhandled rejections are captured; a
startup port collision now prints an actionable message instead of a cryptic stack. - Assorted dark-mode contrast and cast-row layout fixes.
🏗️ Under the hood
- Qwen performance — token-budget batch packing is now the default (cap 32 /
budget 3600) plus TF32 + high fp32-matmul precision; an overnight full-book run
held aggregate RTF ≈ 1.04 (~realtime). - Pre-commit scope filter + GPU-contention throttle — a staged-diff scope
filter skips out-of-scope test legs, and a softnvidia-smiprobe lowers test
concurrency when a run is hammering the box. - Test reliability — broke a
tts/index↔ provider import cycle that
intermittently failed the cross-OS gate; pinned with a re-export identity guard.
Per-chapter RTF history table in the developer Worktrees view. - Archived 56 shipped feature plans; filed a security review and its follow-up backlog.
Full changelog: v1.5.0...v1.5.1
Castwright 1.5.0
Castwright 1.5.0
The largest TTS round since the engine first shipped. v1.5.0 adds Qwen3-TTS
— a bespoke per-character voice engine that designs a unique voice from each
cast member's persona instead of picking from a preset catalogue, caches the
embedding, and reuses it across the book and series for vocal consistency.
Generation moves onto a single persisted cross-book queue, cast management gains
"rebaseline the series" + cross-book duplicate review, and the whole synthesis
path reports live real-time-factor (RTF) telemetry.
⚠️ Upgrade note: theGEN_CHAPTER_CONCURRENCYenv var was retired and
renamed toGEN_WORKERS(default 2) — rename it in yourserver/.env. No
BookStateJsonschema change;.queue.jsonis created on first enqueue;
legacy single-fieldoverrideTtsVoicerows migrate lazily on read. See
INSTALL.md "v1.4.0 → v1.5.0 notes".
✨ Headline features
🎙️ Qwen3-TTS bespoke voices (new)
A new local engine that designs a unique voice per character from the cast
persona rather than picking from a preset catalogue (plan 108).
- Design → clone → cache → reuse — the VoiceDesign 1.7B model synthesises a
calibration reference from the persona, the Base 0.6B model auditions and
renders, and the designed embedding is cached and reused for every line that
character speaks across the book and series. A Gemini-backed persona generator
fillsCharacter.voiceStyleto seed the design. - Default when installed — Qwen becomes the default engine for new books once
installed (resolved live; an explicit Account pick is honoured forever). Install
in one click from Account → Models; ~5 GB of weights download on demand and
aren't bundled in the zip. - Per-character engine mixing — each cast member carries a per-engine
overrideTtsVoices: { coqui?, kokoro?, gemini?, qwen? }map, so one book can
mix a designed Qwen principal against a Kokoro narrator. - Graceful Kokoro fallback — a Qwen render with no designed voice (or when
the engine isn't installed / loaded) renders in Kokoro instead of failing,
shown as "Fallback (Kokoro)" (plan 130). - Under the hood — batched forward passes with length-bucketing
(QWEN_BATCH_SIZE, default 8), an SDPA attention path + prompt cache, an
optional FlashAttention-2 wheel for Windows, and a VoiceDesign model that frees
~4–5 GB on idle (QWEN_DESIGN_IDLE_TTL) (plans 112, 113, 115, 117, 128).
🗂️ Persistent cross-book generation queue (new)
Generation is now driven by a single durable queue that survives restarts
(plans 102, 111).
<workspace>/.queue.jsonis the sole source of truth — crash-orphaned entries
reset to queued on boot, failed chapters persist as "Failed" with one-click Retry.- A bounded worker pool runs N chapters concurrently (
GEN_WORKERS, default 2). - A global queue modal + top-bar chip show every book's queued / in-progress /
done / failed rows, with drag-to-reorder, Clear-queue, and a force-remove escape
hatch for stuckin_progressentries.
📊 Live RTF telemetry (new)
- The synthesis path reports live per-batch real-time-factor up the stack
(sidecar → server → frontend) so a deployer can watch how fast their GPU is
rendering — surfaced in the generation UI and structured logs (plan 127).
🎙️ Voice & cast management
- Rebaseline the series — a modal designs bespoke Qwen voices for the
principal cast with a current-vs-proposed audition before regenerating,
collapsing recurring members by name / alias (plans 95, 96, 99, 101). - Cross-book duplicate review — hydrates both casts and lets you merge a
duplicated character / link a shared voice from the Voices pill, with the link +
aliases carried on the Voice payload so the warning stays gone on reload. - Rename + alias promotion — cast members can be renamed and an alias promoted
to the primary name (logged as aname_changeevent); aliases are editable with
a reattribute-lines modal. - Voice status leads — a new "Sampled" tier joins Matched / Generated; the
cast table sorts by line count; toggle chips filter by voice-matching status
(multi-select OR, live counts), so undesigned voices stop getting lost. - One-chapter A/B preview — audition a profile-change regen on a single
chapter before committing the full regenerate.
🔊 Generation performance
- Within-chapter sentence parallelism overlaps synthesis across a chapter's
sentences (plan 107); a GPU-arbitration semaphore (GPU_CONCURRENCY, with
optional VRAM-weighted budgeting viaGPU_VRAM_BUDGET) keeps parallel sessions
and the analyzer from double-booking an 8 GB card (plans 100, 108). - Per-character progress is now monotonic and exact under parallel synth — no
counter that jumps backwards or double-counts when several chapters are in flight.
🩺 Analyzer & ingest
- Multi-model analysing view — a per-phase model chip shows the effective
Phase 0 / Phase 1 model with an inline swap control, plus a sticky status bar
(plan 94). Phase 0a keeps narrator-only named characters so they aren't dropped. - Low-confidence triage gains sticky next / previous navigation with a
per-chapter badge; the Voice Drift Detector can be scoped to a single character. - EPUB / MOBI robustness — namespace-prefixed OPF EPUBs recover through a
raw-zip fallback that also reads NCX titles; DRM-protected MOBI uploads return a
clean 415 with an actionable message.
🎧 Listening fixes
- MP3 chapters showed the wrong / zero duration and couldn't be scrubbed —
libmp3lame wasn't emitting the Xing VBR header. Now chapters are seekable MP3
with the Xing header, the mini-player trusts the serverdurationSec, and
scripts/rexing-existing.mjsre-headers old files (plan 109). - A fully-rendered chapter showed
00:00in the chapter list until reload —
thechapter_completeduration was dropped by the parallel-chapter coalesce.
Nowchapter_completecarriesdurationSecdirectly, with a library-scan backfill. - The Loudness Report Card flagged nearly every chapter as ~6 LU off-target
even though the audio was correctly normalised — the.lufs.jsonstored the
pre-normalisation measurement. Now the post-pass measurement is persisted and
scripts/relufs-existing.mjsrefreshes old chapters (plan 77). - The Listen row time didn't track the player and a stale "Resume" pill stayed
visible during playback. Now the row time live-syncs and the resume pill hides. - Per-phase analyzer model picks had no effect — the selection was unreachable
and the chips showed a fabricated name. Now the picks drive each phase and the
chips show the truly-effective model (plan 88). - The "carried in from prior books" pill over-counted (a 4th-book series
claimed 136 carried-in characters) and bloated the Phase 0a prompt ~4×. Now the
roster dedupes by name / alias. - "Apply all N matches" auto-ticked profile-sync for every match. Now it only
auto-ticks sync for lower-confidence rows (< 0.9). - The low-confidence reassign picker was clipped / closed mid-gesture / invisible
in dark mode. Now it portals to the body with viewport-aware flipping,
dismisses on outside-click / Esc, and has a contrasted dark surface (plan 90). - The Voice Drift Detector showed a raw workspace slug and listed the same
chapter multiple times. Now it shows the clean book title and rolls events up
per chapter with an accurate flagged count (plan 91). - Navigating between views left the new view mid-scroll. Now the window scrolls
to the top on every hash-route change.
🏗️ Under the hood
- CI cost reduction — doc-only skip + path-filtered per-PR verify +
cross-OS consolidation intocross-os.yml(plans 101, 103); draft-PRs-by-default,
--changed-scoped vitest legs, per-job timeouts, integration-branch batching (plan 118). - Cross-OS gate on the release cut —
bump-version.mjsfirescross-os.yml
onorigin/mainand BLOCKS on green before the tag is created;--skip-cross-os
is the escape hatch (plan 127). - Test-harness tiers — new
test:server-slow(timeout-prone files pinned to
one fork) andtest:e2e:visual(chromium,--workers=1) so font-hinting drift
can't race the parallel battery; 42 committed Linux baseline PNGs. - Dependency / security bumps —
multer1→2 (upload-route CVE), ESLint 8→9
flat-config,jsdom/archiver. - Build-version footer — every view stamps the running build (e.g.
v1.5.0 (a1b2c3d)) so a deployer can confirm an upgraded bundle extracted (plan 124). - Per-user settings moved to
~/.audiobook-generator/user-settings.jsonso
multiple checkouts share one config. New repair scripts underscripts/
(rexing-existing,relufs-existing,recover-missing-character, …).
Full changelog: v1.4.0...v1.5.0
Castwright 1.4.0
Castwright 1.4.0
The largest round since the v1.3.1 listening overhaul. The app now drives on
phone + tablet over LAN HTTPS, chapter audio is loudness-normalised against EBU
R128 with a per-chapter drift report card, AAC/M4A and Opus join MP3 as
first-class codecs, the analyzer pipelines a two-model split that nearly doubles
quota, and generation parallelises across chapters.
✨ Headline features
📱 Mobile + tablet over LAN HTTPS (new)
A six-wave round bringing the app to phone + tablet (plan 81).
- One-command LAN bootstrap —
mkcertdrops a local root CA on every dev box;
npm run install:cert-mobileprints LAN URL + QR + per-OS steps;dev:lan/
start:lanserve HMR Vite + Node athttps://0.0.0.0:5173/:8443. - Three viewport tiers —
<640pxphone (single-column, drawers + bottom
sheets, full-screen modals),640–1024pxtablet (two-column, dialog modals,
right drawer),≥1024pxdesktop (three-pane). Every view re-laid out. - Touch-equivalence rule — every desktop drag / hover affordance ships a tap
replacement (tap-to-assign voice pills, PointerEvent manuscript boundaries
covering mouse + touch + pen); hover labels stay faintly visible via a new
coarse-pointer:Tailwind variant; controls ≥ 44×44 px per WCAG 2.5.5.
🔊 EBU R128 loudness + new codecs (new)
- Two-pass
loudnormtargeting -16 LUFS / 11 LU / -1.5 dBTP on every
newly-rendered chapter (AUDIO_LOUDNORM_ENABLED), surfaced as a colour-coded
per-row drift pill + an expandable report card with sparkline (plans 71, 77). - AAC/M4A + Opus join MP3 as first-class chapter codecs via
BookStateJson.audioFormat, with matching export shapes (plan 72).
🩺 Pipelined two-model analyzer (new)
- Phase 0 (cast detection,
gemma-4-31b-it) and Phase 1 (attribution,
gemini-3.1-flash-lite) run in parallel with a configurable min-lag; the two
phases hit independent rate-limit buckets, so effective quota nearly doubles.
Legacy single-model path preserved verbatim when unset (plan 88).
📖 Library & manuscript
- Library search + table view — debounced title / author search, a tag-chip
filter row (tags persist onBookStateJson.tags), and a card↔table view toggle
with a series-grouped dense table (plans 73, 76). - Portable book bundle —
GET /export/portablestreams a single.zip
(state + manuscript + audio + cover + change-log + MANIFEST);POST /import/portableaccepts it with rename / replace / skip conflict modes (plan 75). - Manuscript re-upload diff — a side-by-side sentence-level diff gates
re-upload before any state mutation, warning when chapter title overrides won't
match the new content (plans 74, 84). - Per-chapter rename — a pencil affordance on every chapter row, persisted with
a stickytitleOverriddenflag that survives heuristic refresh-titles passes (plan 78). - Low-confidence triage polish — J / K jump to next / previous misattribution
and auto-open the inspector; a typeahead picker materialises a missing
series-mate from the prior roster viaPOST /cast/add-from-roster(plan 90).
🩺 Cast drift & multi-book
- Drift modal collapses ~300 events into ~6–18 cards — one card per
(book × character × snapshot)instead of one per event (~7,200 DOM nodes → ~200), with
bulk Regen-all / Dismiss-all / Auto-regen-all (plan 91). - Background drift polling — a bulk
GET /api/revisions?bookIds=...+ a
two-tier poller (30s active, 120s background) surfaces Book B's drift in Book A's
modal within ~2 min, active-book latency unchanged (plan 83).
🔊 Generation
- Bounded worker pool over chapters via
GEN_CHAPTER_CONCURRENCY(default 2),
with per-chapter SSE tracks kept isolated (plan 87). - Export queue Retry + Download wired — failed rows re-fire the original POST;
done rows without a signed URL stream directly (plan 82). - Kokoro stop pill in the top bar frees ~1 GB VRAM for an XTTS warm or a
heavier analyzer model without restarting the sidecar.
🎧 Generation fixes
- Edited speaker reassignments / split sentences were silently dropped on
regenerate — the analyser cache wasn't rebuilt after a manuscript edit. Now
regenerate applies the manuscript-edits overlay before synth (plan 80). - Chapters had no audible boundary and titles weren't spoken — players
cross-faded straight from one into the next. Now each chapter opens with its
title voiced in the narrator voice + a baked-in inter-chapter silence (plan 101). - Force-regen on a range re-ran the catch-up replay against the just-chosen
chapters, racing the new run. Now the replay skips in-scope chapters so the
user-selected scope wins. - The generation pill froze at its last snapshot and never drained after
completion. Now it drains to zero across excluded + idle gaps. - Two tabs on one workspace fanned out idle cross-tab updates. Now the
broadcast layer diffs snapshots before posting and debounces phase progress.
🏗️ Under the hood
- Frontend perf pass — broadcast-middleware shallow-diffs
activeStream,
shallow-equality selector wraps, and route-levelReact.lazydrop the main
bundle 410 → 345 kB (gzip 108 → 91 kB) (plan 89); manuscript / confirm-cast /
listen lists virtualise via@tanstack/react-virtualabove their cutoffs (plans 92, 93). - Exports moved out of the hidden jail to
<bookDir>/exports/<slug>.<ext>,
with a sync-folder "Test" probe and widened rename retries for Drive / OneDrive (plan 79). - CI Node 20 → 22 → 24;
scripts/wt-merge.mjsreconciliation helper (plan 85);
dev-only worktree dashboard at#/worktrees(plan 86); ~14 new e2e specs +
the responsive harness; LAN-cert bootstrap scripts + Playwright mobile / tablet projects.
Full changelog: v1.3.1...v1.4.0
Castwright 1.3.1
Castwright 1.3.1
The biggest single step since the analysis pipeline was rebuilt in phases.
New capabilities across listening, voice, cast and onboarding; dark mode
stabilised; the install path is now end-to-end in-app for the alpha audience;
restructure + TTS hardened for long structured manuscripts.
✨ Headline features
🎧 Listening surface overhaul (new)
- Playback speed picker (0.75×–2×) persisted per book, user-placed markers
(note / rerecord) with a sidebar, and a sleep timer with countdown presets +
end-of-chapter mode (plan 53). - True RMS waveform peaks computed at encode time and persisted (plan 56);
per-book resume bookmarks (plan 47); collapsible editorial Notes card (plan 67). - Share a 30-second clip of any chapter as MP3 (
ffmpeg -ss / -t / -c copy,
no re-encode) (plan 69); mint a slugged streaming link for the whole book's
M4B (plan 68); M4B + MP3-ZIP download tiles (plan 57).
🛠️ In-app multi-model management (new)
Install Ollama, pull a model, and pre-fetch Coqui XTTS without a terminal
(Account → Models card, plan 61).
- Per-platform install state machine (idle → detecting → downloading → installing
→ installed); model-pull consumes Ollama's NDJSON progress stream; bootstraps are
dependency-injectable so tests run offline. Newinstall-coqui.{sh,ps1}.
🔌 Cross-tab state sync (new)
- Open the same book in two tabs and the analysis / generation pills update in
lockstep viaBroadcastChannel, with two-layer echo suppression and a narrow
broadcast scope (onlyactiveStreamslots) preserving the single-user contract (plan 63).
🎙️ Voice & cast
- Per-candidate "Play sample" inside the profile drawer — audition voices
against a user-editable sample line, no commit until Save (plan 64). - Same-book Compare lifted to the global Voices tab; cross-book pairs remain
disabled (plan 65). - Revision history timeline — a read-only chronological log of accept / reject
events per chapter via the A/B player modal (plan 55). - The drift modal's Listen button opens the A/B compare player directly;
bulk-apply on confirm-cast ticks both Reuse and Sync in one click.
📖 Ingest & themes
- EPUB series auto-extracted from title parentheticals; MOBI / AZW3 upload
covered by real-binary e2e fixtures; cast-view book cards show a title + series
metadata strip below the cover. - Dark mode reached
stableafter a three-pass contrast bundle — full
per-utility override coverage across the white / amber / red / rose ladders + their
alpha + hover variants, a bespokefloating-pill-inverseutility, and corrected
match-detail drawer z-index (plan 42).
🔊 Generation & restructure fixes
- A structural edit (merge / split / reorder) forced a full re-analysis —
Generate halted on "No analysed sentences cached. Re-run analysis first",
burning quota and destroying manual cast tweaks. Now the restructure path
re-derives the cache frommanuscript-edits.jsonin place and the generation
route auto-heals an empty cache. No re-analysis required (plan 70c). - Long single-speaker chapters froze on the "Worker has gone quiet" banner —
the TTS path folded consecutive same-speaker sentences into one giant synth call
that ran past the watchdog. Now one call per sentence, so the caption advances
continuously and same-speaker prosody drift disappears (plan 70d). - Bracketed audio tags like
[empathic]were spoken aloud literally — no
engine in this app interprets bracket markup. Now the closed-vocabulary tag list
is stripped at the TTS boundary; arbitrary bracketed prose is preserved (plan 70d). - Sequential merges on long manuscripts dropped orphaned sentences, surfaced
empty0 sentences · 00:00rows, and left stale "Chapter N" titles. Now orphans
are recovered onto the nearest preceding chapter, empty rows auto-pruned, and
generic titles re-derived (plan 70a). - Several dark-mode surfaces were unreadable — the Halted pill / panel
(red-on-red), the Analysing connection pill + profile-drawer engine tabs
(near-white on near-whitebg-white/{40,60,70,95}), the voice-drift banner, the
cast-view selection bar, and the confirm-metadata inputs. All now legible (plan 42). - The match-detail drawer opened underneath the profile drawer. Now it stacks
correctly in both themes. - A cover-bearing card hid its title + series (gated on the no-cover
fallback). Now an always-visible metadata strip below the artwork. - Editorial notes were dropped on save (missing
descriptionon the
bookMeta/commitDraftrule). Now both fields round-trip (plan 67). - The generation pill counters froze when navigating to a different book
mid-run, then flipped to "Stalled" on a healthy run. Now they update cross-book.
🏗️ Under the hood
- GitHub Actions runs
npm run verifyon every PR targetingmain, the same
gate the pre-push hook runs locally (plan 62). - Parallel-session worktrees —
scripts/wt-new.mjs/wt-list.mjsspawn
worktrees with non-colliding dev-server ports (plan 59). src/views/listen.tsxdecomposed 1136 → 319 lines into three orchestrated
sub-components undersrc/components/listen/, zero spec files modified (plan 60).- TTS sidecar starts with the server — per-user
autoStartSidecar(default
on);npm startbrings up frontend + server + sidecar in one shot (plan 43). - Verify-cache
--steps=<csv>subset selection; de-flaked e2e suites
(flakes-per-run 3–5 → 0).
Full changelog: v1.2.2...v1.3.1
Castwright 1.2.2
Castwright 1.2.2
MOBI/AZW3 ingest, chapter restructure, listening progress, and
release-packaging end-to-end. User-visible features, fixes and infrastructure
since v1.1.0.
⚠️ Upgrade note: v1.2.0 and v1.2.1 were tagged but never published — the
release workflow surfaced cross-platform CI gaps that this release closes.
✨ Headline features
📖 MOBI / AZW3 ingest + chapter restructure (new)
- MOBI / AZW3 upload — Kindle / Calibre files drop directly into the upload
screen; DRM-protected files are rejected up-front with a clear message rather
than failing deep inside the parser (plan 52). - Chapter restructure panel — merge, split and reorder chapters post-import
without re-uploading or re-running analysis. Sentences remap via(chapterId, id)rewrite so no analyzer quota is spent and manual cast tweaks are preserved
(plan 51).
📦 Release packaging end-to-end (new)
- Tag-triggered release —
node scripts/bump-version.mjs --level minor --notes-file <path>advances bothpackage.jsons in lockstep, regenerates
lockfiles, commits, and writes the annotated tag; pushing the tag fires
release.yml, which verifies on Ubuntu / macOS / Windows, builds the zip +
SHA-256, and publishes the GitHub Release from the tag annotation (plan 49).
🎧 Listening & cast
- Listening progress / resume — close the app mid-chapter and get a "Resume at
H:MM:SS" pill; the mini-player saves with a debounced PUT, refresh rehydrates the
seek position (plan 47). - Cover Replace / Regenerate from the listen view; per-chapter queue drawer
gains copy / remove on each queued export (plan 18a). - Bulk-sync on confirm-cast — a "Sync N profiles from library" pill collapses
N per-card ticks into one click, per-card untick preserved (plan 41). - Rollback fsck + mid-flight Reject toast — a rolled-back revision fsck-walks
the audio directory and reports drift; an in-flight Reject surfaces a toast (plan 20).
🔔 Cross-cutting toast surface
- Transient stream / export failures now surface as a 6s auto-dismissing
notification stack (bottom-right,role="status"); repeated errors dedupe by
key instead of stacking (plan 48).
🎙️ Fixes
- The Gemini analysis stream could hang forever on an idle / dead-write window
— the analyser pill sat on a stale stream with no progress and no error. Now the
stream detects the silence, cancels, and re-issues automatically (plan 06). - In dark mode, failed / stale sentence-status badges were nearly invisible
(red / rose on dark), as were the streaming progress pill and several
translucentbg-white/{60,70}panels. All now legible (plan 42).
🏗️ Under the hood
- Cross-platform release workflow — installs ffmpeg per-OS (
apt / brew / choco),test:scripts/test:sidecarpickpwsh/powershell.exe; v1.2.0
/ v1.2.1 were the dry-runs that surfaced these gaps (plan 49). - Production-mode server —
npm run start:prodlaunches API + Vite-built
frontend off one process on:8080(plan 49). - UI-managed Gemini API key — a writable Account field;
GET /api/user/settings
never echoes the plaintext (redactedapiKeyStatus); env-var still wins (plan 49). - Verify-cache for cheap retries (per-step input-hash cached) (plan 50); lint +
Prettier + axe-core a11y on four core views (plan 46); PR-title lint + body
template, squash / rebase merges disabled (plan 44);INSTALL.mddeployer one-pager.
Full changelog: v1.1.0...v1.2.2
Castwright 1.1.0
Castwright 1.1.0
Polish, resilience, and the v1 ergonomics gaps. Dark mode, an auto-starting
sidecar, cover framing, sticky analysis, and state.json resilience.
✨ Headline features
🌗 Dark mode (new)
- Light / Dark / System toggle in the top bar with an account-managed first-visit
default; a[data-theme="dark"]token-override block reflects every shipped
surface without per-component dark-class plumbing (plan 42).
🔌 TTS sidecar starts with the server (new)
- Per-user
autoStartSidecar;start-app.batbrings up frontend + server +
sidecar in one shot, Node owning the child-process lifecycle (port-9000 probe →
spawn →.run/tts.pid→ tree-kill on SIGINT / SIGTERM) (plan 43).
🖼️ Cover artwork framing (new)
- A three-tab
CoverPicker(Search / Upload / Frame) — drag-pan + zoom keeps the
meaningful part of portrait covers inside square / landscape frames; framing is
metadata-only (applied at render time viaobject-position+transform,
no re-encode); local-disk upload covers self-pub + non-English titles (plan 40).
🎙️ Voice & continuity
- Manual continuity link to a prior series book on the Profile Drawer's "Merge
into" dropdown — closes thenameScore < 0.34floor gap where the matcher
dropped a legitimate match (e.g. "Dexter Alvin Diznee" vs "Dex") and the
duplicate was unreachable (plan 09).
🩺 Resilience
- Sticky analysis across navigation — re-running a failed sentence subset
survives leaving the analysing view; cold-boot rehydrate scans every book's
analysis-state on boot, with "Paused — resume?" / "Halted — review?" card badges
(plan 32). - Auto-retry transient TTS failures — per-group bounded retry absorbs sidecar
503s / connection-refused without wedging the queue. - Rotating
state.jsonbackups + torn-read recovery — the single most
valuable file keeps a rolling history; a torn write falls back to the most recent
intact backup. Redux-persist on ui + manuscript restores the last stage / chapter
on refresh (plan 27).
🎧 Fixes
- Reassigning a sentence on chapter 2+ reassigned the same-id sentence in
chapter 1 — the reducer keyed only by id. Now all three manuscript-slice
reducers key by(chapterId, id), so reassignments land on the clicked sentence
(plan 12a). - Cold-boot rehydrating a book with a live local analysis auto-fired generation
— the analyzer and TTS fought over a single GPU and both halted. Now the
implicit-reconcile seam is gated like explicit TTS-start callsites (plan 32). - Confirm / Ready could land with empty
cast.charactersafter a Phase 0
cache resume (manuscript hydrated faster than cast). Now the views re-fetch when
cast is empty.
🏗️ Under the hood
state.jsonschema seam — aschemafield with a v1 → vN migration seam,
stampingschema: 1now for clean room later (plan 27).- Five new Playwright specs (golden path, listen playback, per-stage Redux +
refresh, cover framing, manual-continuity link) with light + dark visual
baselines; sidecar pytest pins thread-pool saturation for/synthesize. - Release-notes conventions documented in
CONTRIBUTING.md; a realREADME.md
landing page with install / run / verify.
Full changelog: v1.0.0...v1.1.0
Castwright 1.0.0
Castwright 1.0.0
Initial release. The full v1 pipeline takes a manuscript from upload to a
chaptered audiobook on disk: parse, attribute each line to a character, audition
voices, generate per-chapter audio via a local TTS sidecar, listen / revise, and
export to M4B or MP3 zip with cover artwork and chapter atoms.
✨ Headline features
📖 Manuscript → audiobook pipeline (new)
- Ingest — upload accepts
.md/.txt/.epub/.pdf; chapter names are
extracted at parse time; parse-only import lets the user confirm metadata
(author / series / standalone) before the book lands on disk (plans 02, 12). - Analysis pipeline — cast detection runs through one of three analyser modes:
Gemini cloud (default for hosted, free-tier rate-limited), local Ollama (default
for self-host, auto-falls back to Gemini when the daemon is unreachable), or
manual file-drop coworking. Sticky analysis survives leaving the analysing view
(plans 06, 29, 32). - Generation stream — per-chapter SSE (
progress/chapter_complete/
chapter_failed/idle); sticky generation survives every navigation except an
explicit Stop or queue drain, with Pause / Resume viaPOST /pause(plans 16, 32).
🎙️ Local TTS sidecar + voice library (new)
- Sidecar — a local Python FastAPI sidecar hosting two engines: Coqui XTTS v2
(zero-shot cloning) and Kokoro v1 (English-only, eager-loaded, ~1 GB VRAM). The
analyser and Coqui auto-evict each other to free VRAM with an inline banner
(plans 14, 14a, 30). - Voice library — per-engine catalogs, family grouping (
af_*/am_*/
bf_*/bm_*), drag-to-assign onto cast members, per-character overrides scoped
per-engine so a Coqui ↔ Kokoro switch preserves assignments, sample playback
against a user-editable line, and compare-two-cast-members (plan 22a).
🎧 Revisions, export & persistence (new)
- Revisions and drift — A/B audio audition before accept / reject; rollback
preserves the prior MP3 as<slug>.previous.mp3so it's non-destructive; a
pending-revisions pill + full diff view (plan 20). - Audiobook export — M4B (embedded cover, per-chapter
chapatoms, optional
desc/ldesmetadata), MP3 zip (chaptered MP3s for Smart AudioBook Player /
Audiobookshelf), and sync-folder save (drops the M4B into a sync directory). A
LAN download tile generates a QR for sideloading; jobs can be cancelled /
retried (plans 32, 33, 39). - Workspace persistence — per-book on-disk state (
cast.json,
manuscript-edits.json,revisions.json,change-log.json, audio renders),
round-tripped through atomic JSON writes withrenameWithRetryfor Windows /
OneDrive races (plan 27).
🏗️ Under the hood
- Five-tier test harness — Vitest frontend + server, pytest sidecar, Pester
for PowerShell helpers, Playwright e2e in mock mode; one-shot vianpm run verify(plan 37). - Three-tier commit gate (husky v9) —
commit-msgConventional-Commits
validator,pre-commitverify:fast,pre-pushfullverify(plan 38). - OpenAPI as the single source of truth at
openapi.yaml;src/lib/api-types.ts
generated vianpm run openapi:types(plan 24). - Mock mode round-trips against an in-memory map for jsdom tests + design
fixtures (VITE_USE_MOCKS=true).
Full changelog: initial release