Skip to content

Releases: dudarenok-maker/Castwright

Castwright v1.8.0

17 Jun 07:37

Choose a tag to compare

The open-beta release. Castwright reaches more machines this cycle — an early AMD GPU preview and a one-click Pinokio install — on top of a deep round of analysis honesty, multilingual depth, and GPU-contention resilience that keeps long runs upright on an 8 GB card.


✨ Headline features

🟧 AMD GPU support — early preview (new)

Castwright can reach for an AMD GPU when it finds one, with a safe net under it.

  • Auto-detect with CPU fallback — ROCm / DirectML is detected on its own; if the GPU path isn't ready it quietly falls back to the processor so the app still runs (#818).
  • Accelerator control — an ACCELERATOR knob + in-app picker, with the resolved per-engine profile surfaced on /health. Kokoro stays on CPU under DirectML (a documented DirectML limitation, not a fault). NVIDIA and Apple Silicon remain the smoothest paths.

📦 One-click install with Pinokio (new)

A self-contained conda install (ops-16) built from the latest published release — no terminal, nothing to install by hand — landing in the same guided first-run as the desktop installers (#821).

🧠 Pick the model that reads your book (new)

The local analyzer is now yours to choose (plan 221).

  • Installed-only model picker — pick any Ollama model you've pulled per run; not-yet-pulled curated models install from the Model Manager (#851, #859, #860).
  • Honest residency + label — warm/residency and the analysing chip key to the model actually doing the reading, not the configured default; an ANALYZER_KEEP_ALIVE knob; per-phase model support.

🩺 Honest engine health + one-tap Repair

The Model Manager stops showing a hopeful green light (plan 220).

  • True per-engine health — package / weights / integrity, with a "Needs repair" badge and a one-click Repair + sidecar restart (#837).
  • Re-tiered engines — Qwen → standard (GPU), Coqui → opt-in, Whisper = base; fail-open readiness + diagnostics.

🌍 Multilingual & attribution (plan 221)

  • Cyrillic, end to end — character names, ids and cross-book keys handle non-Latin scripts (#852).
  • Steadier non-English attribution — a deterministic narrator-default heuristic, a Russian dash-dialogue preamble guard, and script-aware attribution + ASR normalizers (#852, #824).
  • Localized cast review — language-aware minor-cast fold buckets, so a Russian book's grouped roles read in Russian (e.g. Незнакомый Парень / Незнакомая Девушка) instead of English (#856).

📊 Analysing view — honesty & live progress

  • Truthful progress — a per-chapter section sub-bar and counts, live ETA refinement, and a model-label chip that mirrors the server-resolved analyzer model (#841, #864, #826).
  • Reload-proof — a reconnecting bridge so refreshing the page no longer blanks the elapsed timer (#869).
  • Big-chapter handling — Stage-1 chunking for oversized chapters and cast-detection name-fidelity guards (#825, #827).

🎮 GPU residency & resilience (plan 222)

  • Waits its turn instead of crashingwithGpuLoad does an atomic evict+verify+load and refuses on a busy card (409) rather than OOM-crashing; a top-bar "GPU busy · N waiting" pill says why (#840, #841).
  • Smarter residency — a VRAM-threshold policy keeps the analyzer resident across the analysis loop on a GPU; voice-design and generation preload run through the same gated path.
  • VRAM telemetry substrate — passive, env-gated per-engine VRAM sampling (fs-45, record-only; MB-accounting deferred) with a clean-process gate (#861, #863).

🎙️ Voice design & casting

  • A/B compare modal fixed — portaled out of the clip-path drawer so it no longer renders clipped; Play-current resolves the Qwen voice and shows the descriptor on Side A; play errors surface (#832, #834).
  • Age made audible — Qwen voice-design personas describe age acoustically, not just as a label (#831).

🎧 Listening & companion

  • Offline waveform — downloaded chapters persist their peaks, so the phone scrubber stays drawn with no signal.

🏗️ Under the hood

  • Kokoro uses the NVIDIA GPU — forces onnxruntime-gpu via an ORT swap (not kokoro-onnx[gpu]); a failed swap is fatal, so it can't silently run on the CPU (#828).
  • Analyzer tolerates stray model keys instead of failing the run (#839).
  • Test resiliencetest:server auto-retries once on a vitest fork-pool worker-crash (#850); the analysis-pipelining rolling-roster CPU-contention timeout flake is quarantined in CI (#875).
  • Release hygiene.gitignore now covers the renamed castwright-workspace/ (#867); a CodeQL workflow and a pre-push commit-subject guard land (#858).
  • Docs & Help — CODE_OF_CONDUCT, a repo-opening-public checklist, repo legal pointers, plan-221/222 reconciliations, and new offline Help topics for analysis model-reload / "GPU busy" and an engine that reads "Needs repair".

Full changelog: v1.7.0...v1.8.0

Castwright v1.7.0

14 Jun 11:46

Choose a tag to compare

The Castwright release. The project grew up this cycle: a new name and identity,
a companion app for your phone and car, guided first-run setup, a one-click sample
book, and first-class support for Apple Silicon Macs — on top of a deep round of
generation-quality and reliability work.

⚠️ Upgrade note: 1.6.0 installs cannot self-upgrade across the rename.
Alpha installs reinstall fresh as Castwright. Your library/data directories move
to the new Castwright paths on first run.


✨ Headline features

📱 A mobile companion app (new)

Castwright now has a native Flutter companion for Android (and iOS), paired to your
desktop server over your home network — no cloud, no account.

  • One-scan pairing — pair by scanning a compact QR code; the channel is
    certificate-pinned and protected by a short-lived pairing code, with per-device
    tokens you can revoke (#562, #565#567, #679, #696, #591).
  • Take your library offline — delta sync downloads books for offline listening,
    with range-resume, atomic swaps, accounting and automatic eviction (#572, #573,
    #606, #614, #616).
  • Native player — lock-screen / media-key control, per-book resume, two-way
    resume sync between phone and desktop, per-chapter waveforms (#575, #576, #604, #610).
  • Listen in the car — Android Auto / CarPlay in-car browser with a downloaded-only
    2-tab "This Book / Library" layout and current-chapter highlighting (#588, app-9).
  • Browse, search & continue — author → series → book hierarchy, a home shelf with
    "Continue listening", and multi-book switching (#577, #582, #615, #618).
  • Stream over LAN for instant play without a full download (#589).
  • Distribution — signed release APK with an alpha channel, plus a Google Play
    AAB lane (#586, #777); the APK is also offered as a download from the desktop app (#661).

🚀 First-run setup & onboarding (new)

Getting from a fresh clone to a working install is now guided end-to-end.

  • First-run setup wizard (fs-21) — a five-step wizard that checks everything
    Castwright needs, installs the default Kokoro voice engine in-app, bootstraps the
    Python sidecar venv, and runs a two-tier smoke test — with plain-language remediation
    when something's missing (#744, #748, #749, #750, #751).
  • In-app guided tour (fe-38) — a spotlight walkthrough of the core flow, launchable
    any time from the top-bar ? menu (#765, #772).
  • In-app Help / troubleshooting view (fe-29) with a unified, friendly analysis-failure
    taxonomy so errors tell you what to do next (#740, #741).

📖 Try it in one click (new)

A bundled original sample — The Coalfall Commission, a 2-chapter / 14-character
showcase — ships with every voice pre-designed, so you can generate and hear a full-cast
performance before importing anything of your own (#727, #728). You can also replace the
manuscript
on an existing book while preserving its designed cast (#724).

🍎 Runs on a Mac

First-class Apple Silicon support — the sidecar auto-detects Metal (mps), with graceful
CUDA → mps → CPU fallback and cross-platform launch scripts. Intel Macs work too (#702, #703).


🎙️ Voice design & casting

  • Design full cast — one click designs a bespoke Qwen voice for every "Needs voice"
    character as a background, reload-resilient job (#637).
  • Single voice design is now background-survivable with honest live progress (#639).
  • Per-book bulk emotion-variant design with a per-character cast-table glyph strip,
    and emotion variants that travel across linked books in a series (#687).
  • Has / Needs emotion-variant filters and per-card "Needs variant" badges (#642, #643).
  • Per-quote emotion loop completed — detect-emotions + UX (fs-33/fs-34) (#596).
  • Cross-book voice reuse hardened — match by stable voice id when names drift, scope
    generic role-names (e.g. "Narrator") to the same series, and stop cross-series mismatches
    on the confirm screen (#634, #681, #689, #693, #694).

🔊 Generation quality & reliability

  • Per-sentence QA gate — every line is acoustically checked and (with srv-31)
    transcript-verified via Whisper ASR before a chapter is assembled, with automatic
    re-record of the broken ones (#513, #526, #531, #646).
  • Generation stall protection — long runs ride out sidecar recycles and recover on
    their own (defense-in-depth, three waves) (#673, #677).
  • Voice-design VRAM contention robustness — engine mutual-exclusion, liveness timeouts,
    honest progress (#685).
  • Attribution coverage guards for large chapters — re-split under-budget stage-2
    chapters, recover dropped speakers, preserve tagged speakers (#516, #520, #532, #609, #678).
  • Golden-audio regression harness (ops-11) — opt-in acoustic regression gate (#527).

🎧 Listening experience (web)

  • Continue-listening rail + reading-stats dashboard (fs-15 / fs-16) — your shelf
    remembers what you were mid-way through; #/stats shows streaks and hours, fed by a
    wall-clock accumulator and offline buffering (#783, #792).
  • Listen download section finalized with truthful, store-level export progress (#675).
  • Real per-chapter waveform bars in the Listen view (fe-33) (#585).

⚙️ Models, settings & covers

  • In-app Model Manager (fs-23) — load/unload engines and per-model Ollama residency
    from the app (#581, #766).
  • In-app Advanced Settings — ~70 model/generation/QA knobs with an env-precedence
    resolver and drift-guarded .env.example (#669).
  • Multi-source cover search — OpenLibrary + Apple + Google, with free-text matching
    and per-source badges (#697).
  • Device ground-truth on /health plus a diagnostics panel (side-14) (#718).

🔌 Sync & server infrastructure

  • LAN security — opt-in shared-secret token guard and per-device tokens with revoke
    (srv-20 / srv-33) (#561, #564, #591).
  • Sync primitives — stable per-chapter UUIDs, a delta-friendly per-chapter sync
    manifest with durations, and guarded listen-progress writes (srv-32/34/35) (#558, #569, #570, #601).
  • Merge journal for deterministic alias un-linking (srv-1) (#793).

🏗️ Under the hood

  • Castwright rebrand end-to-end — package names, release artifact
    (castwright-vX.Y.Z.zip), data dirs, startup banner, in-app /about page, branded
    narrator-credit default, and self-hosted General Sans + Lora fonts (no runtime
    font CDN) (#623, #629, #653, #657, #660, #698, #713).
  • Public-readiness docs + licensing (FSL-1.1-ALv2) (#663, #664).
  • Dependency majors round 3 (#712).

Full changelog: v1.6.0...v1.7.0

Castwright 1.6.0

03 Jun 03:12

Choose a tag to compare

Castwright 1.6.0

Seed release for in-app self-upgrade. From here, future versions install
themselves from a hand-delivered bundle (Account → Application updates), on top
of a round of reliability, observability and listening polish.

⚠️ Upgrade note: the jump into 1.6.0 is manual; 1.6.0 → 1.7.0 is the
first self-upgrade.


✨ Headline features

🚀 Update from inside the app (new)

One-click cross-version upgrades for hand-delivered alpha bundles (fs-1).

  • Versioned-directory install layoutreleases/vX.Y.Z/ + a stable
    launch.mjs + shared workspace/ / venv/ / models/ siblings, so the
    running release is never touched and rollback is just not flipping the pointer.
  • Safe migrations — a boot coordinator backs up every workspace JSON before
    migrating; a top-bar version pill + what's-new banner surface the running version.

🌍 Multi-language — Russian (new)

A book-level BCP-47 language field, end to end (fs-2, language half).

  • Cyrillic auto-detection + a confirm-step selector; designed Qwen voices
    speak the book's language; Cyrillic-aware analyzer token estimates and
    per-language attribution; a Listen language badge.
  • Never-cross-language invariant force-routes non-English books to Qwen
    (Kokoro is English-only) and blocks any silent cross-language fallback. English
    books are byte-identical to before.

🩺 Admin watch console (new)

The former dev-only Worktrees view is now an all-users Admin console at
#/admin (fs-18).

  • Health board — green / amber / red on GPU / VRAM, TTS sidecar + resident
    models, analyzer connectivity, ffmpeg, free disk, from a new GET /api/diagnostics,
    plus generation throughput. The top-bar pill carries a health status dot.

🎧 Listening experience

  • Auto-advance / continuous playback — the mini-player advances to the next
    chapter and keeps playing, behind a default-on toggle (Account → Advanced), so
    a book plays hands-free end to end (fe-23).
  • Skip forward / back — intra-chapter ±15s / ±30s seek in the mini-player
    with rebindable shortcuts (defaults J / L) and configurable deltas (fe-24).

🔊 Generation quality & reliability

  • Post-synthesis audio QA — each finished chapter gets a cheap automated
    check (near-silent / clipped / truncated / runaway duration) and an advisory
    "Suspect" badge with the reason in Generate + Listen, so garbled or empty
    renders are flagged before the listener hits them (srv-27).
  • Pre-flight disk-space guard — free space is checked against an estimate
    before a run or export and a warning is surfaced when it's tight (configurable
    to block) — no more failing 40 chapters into a run (srv-28).
  • Silent generation stalls could leave a chapter showing a misleading
    "Queued" forever. Now a per-chapter no-progress watchdog records a real
    failure, leak-saturated orphan sidecars are no longer adopted, and the
    supervisor health-polls adopted sidecars.
  • A Qwen reload doubled VRAM into the Windows sysmem-spill stall. Now the
    reload no longer doubles VRAM and the recycle watchdog also keys on reserved VRAM.
  • "Design & compare" broke on a missing voice or a design-model race. Now it
    no longer breaks, and the persona prompt aligns with the official VoiceDesign format.
  • Ungenerated chapters masqueraded as playable in the Listen view. Now Play /
    Share are gated on generated audio.

⚙️ Settings, observability & models

  • Plain-language failure messages — recurring failures (sidecar down, VRAM
    spill, rate-limit, OOM, disk-full, model-not-loaded, timeouts) now show a
    human-readable message plus a "what to do next" line instead of a raw error
    string (fs-19).
  • Per-run resource telemetry — per-chapter RTF, VRAM, host RAM and wall-time
    are logged and charted in a new Admin "Resource trends" panel for
    perf-regression visibility (fs-20).
  • Auto-backup of state.json — scheduled per-book snapshots on a
    configurable cadence with retention + one-click restore from Account (srv-2).
  • Power-user tuning — rebindable shortcuts, accessibility toggles
    (high-contrast + larger text) and an autosave-debounce knob, device-local (fe-2).
  • In-app Coqui XTTS v2 installer, plus an A/B current-vs-proposed voice
    audition in the Qwen voice-design flow.

🏗️ Under the hood

  • Dependency major upgrades — React 18→19, Vite 5→8 (Rolldown), Vitest 2→4,
    react-router 6→7, TypeScript 5→6 (plan 167); Zod 3→4, Express 4→5, pdfjs-dist
    4→5, Tailwind 3→4 (plan 170); GitHub Actions runners off the deprecated Node-20 majors.
  • CI cost — path-filtered per-PR verify, draft-by-default + integration
    batching, a doc-only CI fast-path, and a local pre-push guard that refuses
    force-push / deletion of main.
  • Release pipeline — three-way version lockstep (root + server + sidecar
    version.py), a cross-OS gate fired before tagging, and RELEASE_NOTES.md
    baked into the zip.

Full changelog: v1.5.1...v1.6.0

Castwright 1.5.1

01 Jun 03:19

Choose a tag to compare

Castwright 1.5.1

A stability + hardening release. The bulk of this release hardens the
Qwen3-TTS default from v1.5.0: it drives long-run sidecar memory pressure down
to a survivable, self-recovering state, makes a Qwen→Kokoro fallback loud
instead of silent, finishes the reused-voice / persona consistency work, and
closes a default-mode LAN exposure. No data migration required.


✨ Headline features

🔌 Default-bind to loopback (new)

A security fix that closes the 2026-05-31 review's top findings.

  • In default mode the server now binds 127.0.0.1 only, so the unauthenticated
    API and the /workspace static mount (manuscripts, audio, state.json /
    cast.json) are no longer reachable from other machines on a shared Wi-Fi.
    The opt-in npm run start:lan mobile flow is unchanged; BIND_HOST=0.0.0.0
    restores all-interface HTTP.

▶️ Resume generation (new)

  • A one-click way to continue a book whose run was interrupted (queued chapters
    left over, nothing in flight). Opening a book still never auto-starts
    generation — this is the explicit recovery affordance.

🔊 Generation reliability

  • Long Qwen runs climbed host RAM until the server was OOM-killed mid-book.
    Now bounded by host-RAM reclaim on model unload, an RSS / committed watchdog, a
    /debug/memory readout, and a process-recycle keyed on committed-private memory.
  • A crashed sidecar used to drop the in-flight book. Now it respawns on
    unexpected exit, the readiness gate polls through a respawn instead of failing
    fast, and the in-flight + queued chapters started during a recycle drain are
    recovered. The server is authoritative for queue completion.
  • Recycles interrupted a chapter mid-render. Now crossing the memory
    soft-threshold drains and recycles cleanly between chapters, so a long book
    rides out the pressure without a dropped chapter.
  • A Qwen book could silently downgrade to Kokoro. Now a /health handshake
    plus a loud per-chapter fallback gate (with a resident-model pill showing every
    loaded model) means a fallback is always visible, never silent.
  • A local Qwen timeout was misreported as "Gemini rate-limited" and halted
    the whole book. Now a stalled chapter waits on the readiness gate and
    re-renders, then skips non-fatally if it can't recover.
  • Non-narration chapters no longer queue or hang the parallel synth tail.
  • CUDA-fragmentation OOM fixed via expandable_segments.

🎙️ Voice & cast

  • Reused characters lost their designed voice / persona. Now reused
    characters keep their bespoke voice and persona — the voice / voiceStyle are
    denormalised at the link and auto-match write sites, and the designed persona
    is shown for reused characters.

🎧 Listening & recovery

  • A failed chapter forgot its state on reload. Now it shows "Failed · reason"
    with Retry after a reload, plus a per-row "Generate this chapter" escape hatch
    and a "Generated " line on done rows.
  • Per-book state.json auto-backup — a scheduled background sweep snapshots
    each book's state.json (daily / weekly, newest-N retained) with a manual
    restore picker in Account.
  • Crash diagnostics — FATAL crashes and unhandled rejections are captured; a
    startup port collision now prints an actionable message instead of a cryptic stack.
  • Assorted dark-mode contrast and cast-row layout fixes.

🏗️ Under the hood

  • Qwen performance — token-budget batch packing is now the default (cap 32 /
    budget 3600) plus TF32 + high fp32-matmul precision; an overnight full-book run
    held aggregate RTF ≈ 1.04 (~realtime).
  • Pre-commit scope filter + GPU-contention throttle — a staged-diff scope
    filter skips out-of-scope test legs, and a soft nvidia-smi probe lowers test
    concurrency when a run is hammering the box.
  • Test reliability — broke a tts/index ↔ provider import cycle that
    intermittently failed the cross-OS gate; pinned with a re-export identity guard.
    Per-chapter RTF history table in the developer Worktrees view.
  • Archived 56 shipped feature plans; filed a security review and its follow-up backlog.

Full changelog: v1.5.0...v1.5.1

Castwright 1.5.0

29 May 03:36

Choose a tag to compare

Castwright 1.5.0

The largest TTS round since the engine first shipped. v1.5.0 adds Qwen3-TTS
— a bespoke per-character voice engine that designs a unique voice from each
cast member's persona instead of picking from a preset catalogue, caches the
embedding, and reuses it across the book and series for vocal consistency.
Generation moves onto a single persisted cross-book queue, cast management gains
"rebaseline the series" + cross-book duplicate review, and the whole synthesis
path reports live real-time-factor (RTF) telemetry.

⚠️ Upgrade note: the GEN_CHAPTER_CONCURRENCY env var was retired and
renamed to GEN_WORKERS (default 2) — rename it in your server/.env. No
BookStateJson schema change; .queue.json is created on first enqueue;
legacy single-field overrideTtsVoice rows migrate lazily on read. See
INSTALL.md "v1.4.0 → v1.5.0 notes".


✨ Headline features

🎙️ Qwen3-TTS bespoke voices (new)

A new local engine that designs a unique voice per character from the cast
persona rather than picking from a preset catalogue (plan 108).

  • Design → clone → cache → reuse — the VoiceDesign 1.7B model synthesises a
    calibration reference from the persona, the Base 0.6B model auditions and
    renders, and the designed embedding is cached and reused for every line that
    character speaks across the book and series. A Gemini-backed persona generator
    fills Character.voiceStyle to seed the design.
  • Default when installed — Qwen becomes the default engine for new books once
    installed (resolved live; an explicit Account pick is honoured forever). Install
    in one click from Account → Models; ~5 GB of weights download on demand and
    aren't bundled in the zip.
  • Per-character engine mixing — each cast member carries a per-engine
    overrideTtsVoices: { coqui?, kokoro?, gemini?, qwen? } map, so one book can
    mix a designed Qwen principal against a Kokoro narrator.
  • Graceful Kokoro fallback — a Qwen render with no designed voice (or when
    the engine isn't installed / loaded) renders in Kokoro instead of failing,
    shown as "Fallback (Kokoro)" (plan 130).
  • Under the hood — batched forward passes with length-bucketing
    (QWEN_BATCH_SIZE, default 8), an SDPA attention path + prompt cache, an
    optional FlashAttention-2 wheel for Windows, and a VoiceDesign model that frees
    ~4–5 GB on idle (QWEN_DESIGN_IDLE_TTL) (plans 112, 113, 115, 117, 128).

🗂️ Persistent cross-book generation queue (new)

Generation is now driven by a single durable queue that survives restarts
(plans 102, 111).

  • <workspace>/.queue.json is the sole source of truth — crash-orphaned entries
    reset to queued on boot, failed chapters persist as "Failed" with one-click Retry.
  • A bounded worker pool runs N chapters concurrently (GEN_WORKERS, default 2).
  • A global queue modal + top-bar chip show every book's queued / in-progress /
    done / failed rows, with drag-to-reorder, Clear-queue, and a force-remove escape
    hatch for stuck in_progress entries.

📊 Live RTF telemetry (new)

  • The synthesis path reports live per-batch real-time-factor up the stack
    (sidecar → server → frontend) so a deployer can watch how fast their GPU is
    rendering — surfaced in the generation UI and structured logs (plan 127).

🎙️ Voice & cast management

  • Rebaseline the series — a modal designs bespoke Qwen voices for the
    principal cast with a current-vs-proposed audition before regenerating,
    collapsing recurring members by name / alias (plans 95, 96, 99, 101).
  • Cross-book duplicate review — hydrates both casts and lets you merge a
    duplicated character / link a shared voice from the Voices pill, with the link +
    aliases carried on the Voice payload so the warning stays gone on reload.
  • Rename + alias promotion — cast members can be renamed and an alias promoted
    to the primary name (logged as a name_change event); aliases are editable with
    a reattribute-lines modal.
  • Voice status leads — a new "Sampled" tier joins Matched / Generated; the
    cast table sorts by line count; toggle chips filter by voice-matching status
    (multi-select OR, live counts), so undesigned voices stop getting lost.
  • One-chapter A/B preview — audition a profile-change regen on a single
    chapter before committing the full regenerate.

🔊 Generation performance

  • Within-chapter sentence parallelism overlaps synthesis across a chapter's
    sentences (plan 107); a GPU-arbitration semaphore (GPU_CONCURRENCY, with
    optional VRAM-weighted budgeting via GPU_VRAM_BUDGET) keeps parallel sessions
    and the analyzer from double-booking an 8 GB card (plans 100, 108).
  • Per-character progress is now monotonic and exact under parallel synth — no
    counter that jumps backwards or double-counts when several chapters are in flight.

🩺 Analyzer & ingest

  • Multi-model analysing view — a per-phase model chip shows the effective
    Phase 0 / Phase 1 model with an inline swap control, plus a sticky status bar
    (plan 94). Phase 0a keeps narrator-only named characters so they aren't dropped.
  • Low-confidence triage gains sticky next / previous navigation with a
    per-chapter badge; the Voice Drift Detector can be scoped to a single character.
  • EPUB / MOBI robustness — namespace-prefixed OPF EPUBs recover through a
    raw-zip fallback that also reads NCX titles; DRM-protected MOBI uploads return a
    clean 415 with an actionable message.

🎧 Listening fixes

  • MP3 chapters showed the wrong / zero duration and couldn't be scrubbed
    libmp3lame wasn't emitting the Xing VBR header. Now chapters are seekable MP3
    with the Xing header, the mini-player trusts the server durationSec, and
    scripts/rexing-existing.mjs re-headers old files (plan 109).
  • A fully-rendered chapter showed 00:00 in the chapter list until reload —
    the chapter_complete duration was dropped by the parallel-chapter coalesce.
    Now chapter_complete carries durationSec directly, with a library-scan backfill.
  • The Loudness Report Card flagged nearly every chapter as ~6 LU off-target
    even though the audio was correctly normalised — the .lufs.json stored the
    pre-normalisation measurement. Now the post-pass measurement is persisted and
    scripts/relufs-existing.mjs refreshes old chapters (plan 77).
  • The Listen row time didn't track the player and a stale "Resume" pill stayed
    visible during playback. Now the row time live-syncs and the resume pill hides.
  • Per-phase analyzer model picks had no effect — the selection was unreachable
    and the chips showed a fabricated name. Now the picks drive each phase and the
    chips show the truly-effective model (plan 88).
  • The "carried in from prior books" pill over-counted (a 4th-book series
    claimed 136 carried-in characters) and bloated the Phase 0a prompt ~4×. Now the
    roster dedupes by name / alias.
  • "Apply all N matches" auto-ticked profile-sync for every match. Now it only
    auto-ticks sync for lower-confidence rows (< 0.9).
  • The low-confidence reassign picker was clipped / closed mid-gesture / invisible
    in dark mode.
    Now it portals to the body with viewport-aware flipping,
    dismisses on outside-click / Esc, and has a contrasted dark surface (plan 90).
  • The Voice Drift Detector showed a raw workspace slug and listed the same
    chapter multiple times.
    Now it shows the clean book title and rolls events up
    per chapter with an accurate flagged count (plan 91).
  • Navigating between views left the new view mid-scroll. Now the window scrolls
    to the top on every hash-route change.

🏗️ Under the hood

  • CI cost reduction — doc-only skip + path-filtered per-PR verify +
    cross-OS consolidation into cross-os.yml (plans 101, 103); draft-PRs-by-default,
    --changed-scoped vitest legs, per-job timeouts, integration-branch batching (plan 118).
  • Cross-OS gate on the release cutbump-version.mjs fires cross-os.yml
    on origin/main and BLOCKS on green before the tag is created; --skip-cross-os
    is the escape hatch (plan 127).
  • Test-harness tiers — new test:server-slow (timeout-prone files pinned to
    one fork) and test:e2e:visual (chromium, --workers=1) so font-hinting drift
    can't race the parallel battery; 42 committed Linux baseline PNGs.
  • Dependency / security bumpsmulter 1→2 (upload-route CVE), ESLint 8→9
    flat-config, jsdom / archiver.
  • Build-version footer — every view stamps the running build (e.g.
    v1.5.0 (a1b2c3d)) so a deployer can confirm an upgraded bundle extracted (plan 124).
  • Per-user settings moved to ~/.audiobook-generator/user-settings.json so
    multiple checkouts share one config. New repair scripts under scripts/
    (rexing-existing, relufs-existing, recover-missing-character, …).

Full changelog: v1.4.0...v1.5.0

Castwright 1.4.0

22 May 01:06
5798d42

Choose a tag to compare

Castwright 1.4.0

The largest round since the v1.3.1 listening overhaul. The app now drives on
phone + tablet over LAN HTTPS, chapter audio is loudness-normalised against EBU
R128 with a per-chapter drift report card, AAC/M4A and Opus join MP3 as
first-class codecs, the analyzer pipelines a two-model split that nearly doubles
quota, and generation parallelises across chapters.


✨ Headline features

📱 Mobile + tablet over LAN HTTPS (new)

A six-wave round bringing the app to phone + tablet (plan 81).

  • One-command LAN bootstrapmkcert drops a local root CA on every dev box;
    npm run install:cert-mobile prints LAN URL + QR + per-OS steps; dev:lan /
    start:lan serve HMR Vite + Node at https://0.0.0.0:5173 / :8443.
  • Three viewport tiers<640px phone (single-column, drawers + bottom
    sheets, full-screen modals), 640–1024px tablet (two-column, dialog modals,
    right drawer), ≥1024px desktop (three-pane). Every view re-laid out.
  • Touch-equivalence rule — every desktop drag / hover affordance ships a tap
    replacement (tap-to-assign voice pills, PointerEvent manuscript boundaries
    covering mouse + touch + pen); hover labels stay faintly visible via a new
    coarse-pointer: Tailwind variant; controls ≥ 44×44 px per WCAG 2.5.5.

🔊 EBU R128 loudness + new codecs (new)

  • Two-pass loudnorm targeting -16 LUFS / 11 LU / -1.5 dBTP on every
    newly-rendered chapter (AUDIO_LOUDNORM_ENABLED), surfaced as a colour-coded
    per-row drift pill + an expandable report card with sparkline (plans 71, 77).
  • AAC/M4A + Opus join MP3 as first-class chapter codecs via
    BookStateJson.audioFormat, with matching export shapes (plan 72).

🩺 Pipelined two-model analyzer (new)

  • Phase 0 (cast detection, gemma-4-31b-it) and Phase 1 (attribution,
    gemini-3.1-flash-lite) run in parallel with a configurable min-lag; the two
    phases hit independent rate-limit buckets, so effective quota nearly doubles.
    Legacy single-model path preserved verbatim when unset (plan 88).

📖 Library & manuscript

  • Library search + table view — debounced title / author search, a tag-chip
    filter row (tags persist on BookStateJson.tags), and a card↔table view toggle
    with a series-grouped dense table (plans 73, 76).
  • Portable book bundleGET /export/portable streams a single .zip
    (state + manuscript + audio + cover + change-log + MANIFEST); POST /import/portable accepts it with rename / replace / skip conflict modes (plan 75).
  • Manuscript re-upload diff — a side-by-side sentence-level diff gates
    re-upload before any state mutation, warning when chapter title overrides won't
    match the new content (plans 74, 84).
  • Per-chapter rename — a pencil affordance on every chapter row, persisted with
    a sticky titleOverridden flag that survives heuristic refresh-titles passes (plan 78).
  • Low-confidence triage polish — J / K jump to next / previous misattribution
    and auto-open the inspector; a typeahead picker materialises a missing
    series-mate from the prior roster via POST /cast/add-from-roster (plan 90).

🩺 Cast drift & multi-book

  • Drift modal collapses ~300 events into ~6–18 cards — one card per (book × character × snapshot) instead of one per event (~7,200 DOM nodes → ~200), with
    bulk Regen-all / Dismiss-all / Auto-regen-all (plan 91).
  • Background drift polling — a bulk GET /api/revisions?bookIds=... + a
    two-tier poller (30s active, 120s background) surfaces Book B's drift in Book A's
    modal within ~2 min, active-book latency unchanged (plan 83).

🔊 Generation

  • Bounded worker pool over chapters via GEN_CHAPTER_CONCURRENCY (default 2),
    with per-chapter SSE tracks kept isolated (plan 87).
  • Export queue Retry + Download wired — failed rows re-fire the original POST;
    done rows without a signed URL stream directly (plan 82).
  • Kokoro stop pill in the top bar frees ~1 GB VRAM for an XTTS warm or a
    heavier analyzer model without restarting the sidecar.

🎧 Generation fixes

  • Edited speaker reassignments / split sentences were silently dropped on
    regenerate
    — the analyser cache wasn't rebuilt after a manuscript edit. Now
    regenerate applies the manuscript-edits overlay before synth (plan 80).
  • Chapters had no audible boundary and titles weren't spoken — players
    cross-faded straight from one into the next. Now each chapter opens with its
    title voiced in the narrator voice + a baked-in inter-chapter silence (plan 101).
  • Force-regen on a range re-ran the catch-up replay against the just-chosen
    chapters, racing the new run. Now the replay skips in-scope chapters so the
    user-selected scope wins.
  • The generation pill froze at its last snapshot and never drained after
    completion. Now it drains to zero across excluded + idle gaps.
  • Two tabs on one workspace fanned out idle cross-tab updates. Now the
    broadcast layer diffs snapshots before posting and debounces phase progress.

🏗️ Under the hood

  • Frontend perf pass — broadcast-middleware shallow-diffs activeStream,
    shallow-equality selector wraps, and route-level React.lazy drop the main
    bundle 410 → 345 kB (gzip 108 → 91 kB) (plan 89); manuscript / confirm-cast /
    listen lists virtualise via @tanstack/react-virtual above their cutoffs (plans 92, 93).
  • Exports moved out of the hidden jail to <bookDir>/exports/<slug>.<ext>,
    with a sync-folder "Test" probe and widened rename retries for Drive / OneDrive (plan 79).
  • CI Node 20 → 22 → 24; scripts/wt-merge.mjs reconciliation helper (plan 85);
    dev-only worktree dashboard at #/worktrees (plan 86); ~14 new e2e specs +
    the responsive harness; LAN-cert bootstrap scripts + Playwright mobile / tablet projects.

Full changelog: v1.3.1...v1.4.0

Castwright 1.3.1

19 May 22:03

Choose a tag to compare

Castwright 1.3.1

The biggest single step since the analysis pipeline was rebuilt in phases.
New capabilities across listening, voice, cast and onboarding; dark mode
stabilised; the install path is now end-to-end in-app for the alpha audience;
restructure + TTS hardened for long structured manuscripts.


✨ Headline features

🎧 Listening surface overhaul (new)

  • Playback speed picker (0.75×–2×) persisted per book, user-placed markers
    (note / rerecord) with a sidebar, and a sleep timer with countdown presets +
    end-of-chapter mode (plan 53).
  • True RMS waveform peaks computed at encode time and persisted (plan 56);
    per-book resume bookmarks (plan 47); collapsible editorial Notes card (plan 67).
  • Share a 30-second clip of any chapter as MP3 (ffmpeg -ss / -t / -c copy,
    no re-encode) (plan 69); mint a slugged streaming link for the whole book's
    M4B (plan 68); M4B + MP3-ZIP download tiles (plan 57).

🛠️ In-app multi-model management (new)

Install Ollama, pull a model, and pre-fetch Coqui XTTS without a terminal
(Account → Models card, plan 61).

  • Per-platform install state machine (idle → detecting → downloading → installing
    → installed); model-pull consumes Ollama's NDJSON progress stream; bootstraps are
    dependency-injectable so tests run offline. New install-coqui.{sh,ps1}.

🔌 Cross-tab state sync (new)

  • Open the same book in two tabs and the analysis / generation pills update in
    lockstep via BroadcastChannel, with two-layer echo suppression and a narrow
    broadcast scope (only activeStream slots) preserving the single-user contract (plan 63).

🎙️ Voice & cast

  • Per-candidate "Play sample" inside the profile drawer — audition voices
    against a user-editable sample line, no commit until Save (plan 64).
  • Same-book Compare lifted to the global Voices tab; cross-book pairs remain
    disabled (plan 65).
  • Revision history timeline — a read-only chronological log of accept / reject
    events per chapter via the A/B player modal (plan 55).
  • The drift modal's Listen button opens the A/B compare player directly;
    bulk-apply on confirm-cast ticks both Reuse and Sync in one click.

📖 Ingest & themes

  • EPUB series auto-extracted from title parentheticals; MOBI / AZW3 upload
    covered by real-binary e2e fixtures; cast-view book cards show a title + series
    metadata strip below the cover.
  • Dark mode reached stable after a three-pass contrast bundle — full
    per-utility override coverage across the white / amber / red / rose ladders + their
    alpha + hover variants, a bespoke floating-pill-inverse utility, and corrected
    match-detail drawer z-index (plan 42).

🔊 Generation & restructure fixes

  • A structural edit (merge / split / reorder) forced a full re-analysis
    Generate halted on "No analysed sentences cached. Re-run analysis first",
    burning quota and destroying manual cast tweaks. Now the restructure path
    re-derives the cache from manuscript-edits.json in place and the generation
    route auto-heals an empty cache. No re-analysis required (plan 70c).
  • Long single-speaker chapters froze on the "Worker has gone quiet" banner
    the TTS path folded consecutive same-speaker sentences into one giant synth call
    that ran past the watchdog. Now one call per sentence, so the caption advances
    continuously and same-speaker prosody drift disappears (plan 70d).
  • Bracketed audio tags like [empathic] were spoken aloud literally — no
    engine in this app interprets bracket markup. Now the closed-vocabulary tag list
    is stripped at the TTS boundary; arbitrary bracketed prose is preserved (plan 70d).
  • Sequential merges on long manuscripts dropped orphaned sentences, surfaced
    empty 0 sentences · 00:00 rows, and left stale "Chapter N" titles. Now orphans
    are recovered onto the nearest preceding chapter, empty rows auto-pruned, and
    generic titles re-derived (plan 70a).
  • Several dark-mode surfaces were unreadable — the Halted pill / panel
    (red-on-red), the Analysing connection pill + profile-drawer engine tabs
    (near-white on near-white bg-white/{40,60,70,95}), the voice-drift banner, the
    cast-view selection bar, and the confirm-metadata inputs. All now legible (plan 42).
  • The match-detail drawer opened underneath the profile drawer. Now it stacks
    correctly in both themes.
  • A cover-bearing card hid its title + series (gated on the no-cover
    fallback). Now an always-visible metadata strip below the artwork.
  • Editorial notes were dropped on save (missing description on the
    bookMeta/commitDraft rule). Now both fields round-trip (plan 67).
  • The generation pill counters froze when navigating to a different book
    mid-run, then flipped to "Stalled" on a healthy run. Now they update cross-book.

🏗️ Under the hood

  • GitHub Actions runs npm run verify on every PR targeting main, the same
    gate the pre-push hook runs locally (plan 62).
  • Parallel-session worktreesscripts/wt-new.mjs / wt-list.mjs spawn
    worktrees with non-colliding dev-server ports (plan 59).
  • src/views/listen.tsx decomposed 1136 → 319 lines into three orchestrated
    sub-components under src/components/listen/, zero spec files modified (plan 60).
  • TTS sidecar starts with the server — per-user autoStartSidecar (default
    on); npm start brings up frontend + server + sidecar in one shot (plan 43).
  • Verify-cache --steps=<csv> subset selection; de-flaked e2e suites
    (flakes-per-run 3–5 → 0).

Full changelog: v1.2.2...v1.3.1

Castwright 1.2.2

18 May 21:07

Choose a tag to compare

Castwright 1.2.2

MOBI/AZW3 ingest, chapter restructure, listening progress, and
release-packaging end-to-end.
User-visible features, fixes and infrastructure
since v1.1.0.

⚠️ Upgrade note: v1.2.0 and v1.2.1 were tagged but never published — the
release workflow surfaced cross-platform CI gaps that this release closes.


✨ Headline features

📖 MOBI / AZW3 ingest + chapter restructure (new)

  • MOBI / AZW3 upload — Kindle / Calibre files drop directly into the upload
    screen; DRM-protected files are rejected up-front with a clear message rather
    than failing deep inside the parser (plan 52).
  • Chapter restructure panel — merge, split and reorder chapters post-import
    without re-uploading or re-running analysis. Sentences remap via (chapterId, id) rewrite so no analyzer quota is spent and manual cast tweaks are preserved
    (plan 51).

📦 Release packaging end-to-end (new)

  • Tag-triggered releasenode scripts/bump-version.mjs --level minor --notes-file <path> advances both package.jsons in lockstep, regenerates
    lockfiles, commits, and writes the annotated tag; pushing the tag fires
    release.yml, which verifies on Ubuntu / macOS / Windows, builds the zip +
    SHA-256, and publishes the GitHub Release from the tag annotation (plan 49).

🎧 Listening & cast

  • Listening progress / resume — close the app mid-chapter and get a "Resume at
    H:MM:SS" pill; the mini-player saves with a debounced PUT, refresh rehydrates the
    seek position (plan 47).
  • Cover Replace / Regenerate from the listen view; per-chapter queue drawer
    gains copy / remove on each queued export (plan 18a).
  • Bulk-sync on confirm-cast — a "Sync N profiles from library" pill collapses
    N per-card ticks into one click, per-card untick preserved (plan 41).
  • Rollback fsck + mid-flight Reject toast — a rolled-back revision fsck-walks
    the audio directory and reports drift; an in-flight Reject surfaces a toast (plan 20).

🔔 Cross-cutting toast surface

  • Transient stream / export failures now surface as a 6s auto-dismissing
    notification stack (bottom-right, role="status"); repeated errors dedupe by
    key instead of stacking (plan 48).

🎙️ Fixes

  • The Gemini analysis stream could hang forever on an idle / dead-write window
    — the analyser pill sat on a stale stream with no progress and no error. Now the
    stream detects the silence, cancels, and re-issues automatically (plan 06).
  • In dark mode, failed / stale sentence-status badges were nearly invisible
    (red / rose on dark), as were the streaming progress pill and several
    translucent bg-white/{60,70} panels. All now legible (plan 42).

🏗️ Under the hood

  • Cross-platform release workflow — installs ffmpeg per-OS (apt / brew / choco), test:scripts / test:sidecar pick pwsh / powershell.exe; v1.2.0
    / v1.2.1 were the dry-runs that surfaced these gaps (plan 49).
  • Production-mode servernpm run start:prod launches API + Vite-built
    frontend off one process on :8080 (plan 49).
  • UI-managed Gemini API key — a writable Account field; GET /api/user/settings
    never echoes the plaintext (redacted apiKeyStatus); env-var still wins (plan 49).
  • Verify-cache for cheap retries (per-step input-hash cached) (plan 50); lint +
    Prettier + axe-core a11y on four core views (plan 46); PR-title lint + body
    template, squash / rebase merges disabled (plan 44); INSTALL.md deployer one-pager.

Full changelog: v1.1.0...v1.2.2

Castwright 1.1.0

17 May 22:36
b3691f5

Choose a tag to compare

Castwright 1.1.0

Polish, resilience, and the v1 ergonomics gaps. Dark mode, an auto-starting
sidecar, cover framing, sticky analysis, and state.json resilience.


✨ Headline features

🌗 Dark mode (new)

  • Light / Dark / System toggle in the top bar with an account-managed first-visit
    default; a [data-theme="dark"] token-override block reflects every shipped
    surface without per-component dark-class plumbing (plan 42).

🔌 TTS sidecar starts with the server (new)

  • Per-user autoStartSidecar; start-app.bat brings up frontend + server +
    sidecar in one shot, Node owning the child-process lifecycle (port-9000 probe →
    spawn → .run/tts.pid → tree-kill on SIGINT / SIGTERM) (plan 43).

🖼️ Cover artwork framing (new)

  • A three-tab CoverPicker (Search / Upload / Frame) — drag-pan + zoom keeps the
    meaningful part of portrait covers inside square / landscape frames; framing is
    metadata-only (applied at render time via object-position + transform,
    no re-encode); local-disk upload covers self-pub + non-English titles (plan 40).

🎙️ Voice & continuity

  • Manual continuity link to a prior series book on the Profile Drawer's "Merge
    into" dropdown — closes the nameScore < 0.34 floor gap where the matcher
    dropped a legitimate match (e.g. "Dexter Alvin Diznee" vs "Dex") and the
    duplicate was unreachable (plan 09).

🩺 Resilience

  • Sticky analysis across navigation — re-running a failed sentence subset
    survives leaving the analysing view; cold-boot rehydrate scans every book's
    analysis-state on boot, with "Paused — resume?" / "Halted — review?" card badges
    (plan 32).
  • Auto-retry transient TTS failures — per-group bounded retry absorbs sidecar
    503s / connection-refused without wedging the queue.
  • Rotating state.json backups + torn-read recovery — the single most
    valuable file keeps a rolling history; a torn write falls back to the most recent
    intact backup. Redux-persist on ui + manuscript restores the last stage / chapter
    on refresh (plan 27).

🎧 Fixes

  • Reassigning a sentence on chapter 2+ reassigned the same-id sentence in
    chapter 1
    — the reducer keyed only by id. Now all three manuscript-slice
    reducers key by (chapterId, id), so reassignments land on the clicked sentence
    (plan 12a).
  • Cold-boot rehydrating a book with a live local analysis auto-fired generation
    — the analyzer and TTS fought over a single GPU and both halted. Now the
    implicit-reconcile seam is gated like explicit TTS-start callsites (plan 32).
  • Confirm / Ready could land with empty cast.characters after a Phase 0
    cache resume (manuscript hydrated faster than cast). Now the views re-fetch when
    cast is empty.

🏗️ Under the hood

  • state.json schema seam — a schema field with a v1 → vN migration seam,
    stamping schema: 1 now for clean room later (plan 27).
  • Five new Playwright specs (golden path, listen playback, per-stage Redux +
    refresh, cover framing, manual-continuity link) with light + dark visual
    baselines; sidecar pytest pins thread-pool saturation for /synthesize.
  • Release-notes conventions documented in CONTRIBUTING.md; a real README.md
    landing page with install / run / verify.

Full changelog: v1.0.0...v1.1.0

Castwright 1.0.0

17 May 05:02

Choose a tag to compare

Castwright 1.0.0

Initial release. The full v1 pipeline takes a manuscript from upload to a
chaptered audiobook on disk: parse, attribute each line to a character, audition
voices, generate per-chapter audio via a local TTS sidecar, listen / revise, and
export to M4B or MP3 zip with cover artwork and chapter atoms.


✨ Headline features

📖 Manuscript → audiobook pipeline (new)

  • Ingest — upload accepts .md / .txt / .epub / .pdf; chapter names are
    extracted at parse time; parse-only import lets the user confirm metadata
    (author / series / standalone) before the book lands on disk (plans 02, 12).
  • Analysis pipeline — cast detection runs through one of three analyser modes:
    Gemini cloud (default for hosted, free-tier rate-limited), local Ollama (default
    for self-host, auto-falls back to Gemini when the daemon is unreachable), or
    manual file-drop coworking. Sticky analysis survives leaving the analysing view
    (plans 06, 29, 32).
  • Generation stream — per-chapter SSE (progress / chapter_complete /
    chapter_failed / idle); sticky generation survives every navigation except an
    explicit Stop or queue drain, with Pause / Resume via POST /pause (plans 16, 32).

🎙️ Local TTS sidecar + voice library (new)

  • Sidecar — a local Python FastAPI sidecar hosting two engines: Coqui XTTS v2
    (zero-shot cloning) and Kokoro v1 (English-only, eager-loaded, ~1 GB VRAM). The
    analyser and Coqui auto-evict each other to free VRAM with an inline banner
    (plans 14, 14a, 30).
  • Voice library — per-engine catalogs, family grouping (af_* / am_* /
    bf_* / bm_*), drag-to-assign onto cast members, per-character overrides scoped
    per-engine so a Coqui ↔ Kokoro switch preserves assignments, sample playback
    against a user-editable line, and compare-two-cast-members (plan 22a).

🎧 Revisions, export & persistence (new)

  • Revisions and drift — A/B audio audition before accept / reject; rollback
    preserves the prior MP3 as <slug>.previous.mp3 so it's non-destructive; a
    pending-revisions pill + full diff view (plan 20).
  • Audiobook export — M4B (embedded cover, per-chapter chap atoms, optional
    desc / ldes metadata), MP3 zip (chaptered MP3s for Smart AudioBook Player /
    Audiobookshelf), and sync-folder save (drops the M4B into a sync directory). A
    LAN download tile generates a QR for sideloading; jobs can be cancelled /
    retried (plans 32, 33, 39).
  • Workspace persistence — per-book on-disk state (cast.json,
    manuscript-edits.json, revisions.json, change-log.json, audio renders),
    round-tripped through atomic JSON writes with renameWithRetry for Windows /
    OneDrive races (plan 27).

🏗️ Under the hood

  • Five-tier test harness — Vitest frontend + server, pytest sidecar, Pester
    for PowerShell helpers, Playwright e2e in mock mode; one-shot via npm run verify (plan 37).
  • Three-tier commit gate (husky v9) — commit-msg Conventional-Commits
    validator, pre-commit verify:fast, pre-push full verify (plan 38).
  • OpenAPI as the single source of truth at openapi.yaml; src/lib/api-types.ts
    generated via npm run openapi:types (plan 24).
  • Mock mode round-trips against an in-memory map for jsdom tests + design
    fixtures (VITE_USE_MOCKS=true).

Full changelog: initial release