Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,13 @@ All notable changes to this project will be documented in this file. The format

### Added

- **First-run UX overhaul.** Three new pieces close the path from `git clone` to a working `dpub convert --transcribe ...` without manual treasure hunts:
- **`dpub doctor`** — read-only diagnostic showing build state (version, GPU acceleration: Metal / CUDA / CPU), every runtime prerequisite (`epubcheck`, `ace`, `ffmpeg`), and Whisper model cache contents. `--json` for CI use; same stable schema pattern as `dpub validate --json`.
- **`dpub setup --whisper-model <size>`** — downloads any of `tiny`, `base`, `small`, `medium`, `large-v3` from huggingface.co/ggerganov/whisper.cpp into `~/.cache/dpub/models/` (or `%LOCALAPPDATA%\dpub\models\` on Windows) with SHA256 verification and an atomic `.partial` rename. Re-running on an already-cached model skips the download after a hash check.
- **`scripts/build.sh`** — host-aware release build that auto-picks `--features metal` on Apple Silicon and `--features cuda` on Linux+nvcc, falling back to CPU-only otherwise. Pre-flights `cmake`. Documented as the recommended build command.
- **Auto-discovery for `--transcribe`.** Calling `dpub convert --transcribe nl` without `--whisper-model` now picks the most-recently-modified `ggml-*.bin` from the cache. The `--whisper-model <path>` override stays for explicit control.
- **Interactive first-run prompt.** When `--transcribe` is used on a TTY with no cached model, dpub offers to download `ggml-medium.bin` instead of failing. Set `DPUB_NONINTERACTIVE=1` (or run in a non-TTY pipe) to suppress the prompt; the failure message points at `dpub setup`.
- **`dpub doctor --install`** — opt-in best-effort installer for missing runtime tools. Uses `brew` on macOS, `apt-get` / `dnf` on Linux (with `sudo`), and prints commands on Windows. Per-tool consent unless `--yes` is passed. Never tries to install Java directly. Whisper models are handled by `dpub setup` rather than the OS package manager.
- **Word-level Media Overlay sync** (M6.5). When `--transcribe` runs and the cleanup path is active, dpub now extracts per-token timestamps from whisper.cpp, coalesces BPE pieces back into whole words via a leading-space rule (with punctuation attachment and degenerate-timing clamping), wraps each word in a `<span id="w-NNN-MMM-KKK">` inside the cleaned `<p id="tx-NNN-MMM">`, and emits one SMIL `<par>` per word — wrapped in nested `<seq epub:textref="...#tx-...">` per paragraph. The result is karaoke-style highlight-along-with-audio in compatible reading systems (Thorium, Readium). Default-on; pass `--no-word-sync` to fall back to per-paragraph sync. Workspace EPUBCheck assertions extended to gate the new overlay shape; reference book stays 0/0/0.
- `dpub-whisper` exposes a public `Word { start_seconds, end_seconds, text }` struct and `Segment.words: Vec<Word>` populated by the new BPE coalescer (`crates/dpub-whisper/src/words.rs`). Eight unit tests cover the BPE coalescing rules.

Expand Down
74 changes: 74 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ uuid = { version = "1", features = ["v4"] }
serde_json = "1"
which = "7"
rayon = "1"
sha2 = "0.10"
walkdir = "2"
whisper-rs = "0.16"
symphonia = { version = "0.5", default-features = false, features = ["mp3", "ogg", "isomp4", "vorbis"] }
Expand Down
26 changes: 25 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,31 @@ The reference toolchain for this conversion is the [DAISY Pipeline 2](https://da

## Quickstart

Build from source (requires `cmake` for the bundled `dpub-whisper` crate):
### First-time setup

Five commands from a fresh clone to a fully-working dpub:

```sh
# macOS — install build + runtime prerequisites
brew install cmake epubcheck ffmpeg
npm install -g @daisy/ace # optional: enables `dpub a11y`

# Build with the right GPU acceleration for the host
git clone https://github.com/11ways/dpub && cd dpub
./scripts/build.sh

# Download a Whisper model (only needed if you'll use --transcribe)
./target/release/dpub setup --whisper-model medium

# Confirm everything's green
./target/release/dpub doctor
```

`./scripts/build.sh` auto-detects Apple Silicon (Metal) / Linux+nvcc (CUDA) / falls back to CPU-only. Power users who want different feature flags call `cargo build --release -p dpub-cli` directly.

`dpub doctor` shows the status of every prerequisite with platform-specific install hints. `dpub setup --whisper-model <size>` downloads a Whisper model into `~/.cache/dpub/models/` with SHA256 verification — `--transcribe` then auto-discovers the most recent cached model so you don't have to thread `--whisper-model <path>` through every invocation. Sizes: `tiny`, `base`, `small`, `medium` (recommended for Dutch), `large-v3`.

### Manual build (if you prefer)

```sh
git clone https://github.com/11ways/dpub
Expand Down
3 changes: 3 additions & 0 deletions crates/dpub-cli/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -25,12 +25,15 @@ cuda = ["dpub-convert/cuda"]
dpub-audio = { path = "../dpub-audio", version = "0.5.0" }
dpub-core = { path = "../dpub-core", version = "0.5.0" }
dpub-convert = { path = "../dpub-convert", version = "0.5.0" }
dpub-meta = { path = "../dpub-meta", version = "0.5.0" }
dpub-validate = { path = "../dpub-validate", version = "0.5.0" }
sha2 = { workspace = true }
clap = { workspace = true }
anyhow = { workspace = true }
rayon = { workspace = true }
serde = { workspace = true }
serde_json = { workspace = true }
which = { workspace = true }
tracing = { workspace = true }
tracing-subscriber = { workspace = true }
walkdir = { workspace = true }
Expand Down
Loading
Loading