Skip to content

Cover lookup: --cover <path> + --auto-cover via Open Library #11

@roelvangils

Description

@roelvangils

Summary

Add cover-image support to dpub convert. Two flags, the second opt-in:

  • `--cover ` — embed a local JPEG/PNG as the EPUB cover. Always works, no network. Lands first.
  • `--auto-cover` — best-effort lookup via Open Library covers API using DAISY metadata (title, creator, ISBN-shaped `dc:identifier`). Free, no API key. Privacy-relevant (leaks metadata to a third party) so it stays opt-in.

EPUB output gains a manifest item with `properties="cover-image"` plus the legacy `<meta name="cover"/>` for older readers. Must stay EPUBCheck-clean.

Current state

`epub3-writer` has no cover concept at all today. `Publication` carries no cover field; the OPF manifest doesn't emit a cover-image item. So this is a new feature surface end-to-end, not a refactor.

Implementation outline

Detailed plan exists. Highlights:

  • New crate `dpub-meta` with a `network` feature flag (default off) — keeps HTTP/JSON out of the future WASM target and mirrors the per-concern crate pattern (`dpub-validate`, `dpub-audio`, `dpub-whisper`).
  • HTTP via `ureq` 2.x (sync, ~1 MB compiled). Deliberately rejects `reqwest`+tokio: `dpub-convert` is sync end-to-end (rayon for parallelism, blocking I/O for ZIP); adding an async runtime for one HTTP call would dwarf the rest of the dep tree.
  • Open Library lookup: `/search.json?title=…&author=…` then `/b/id/<cover_i>-L.jpg`. ISBN-shaped `dc:identifier` triggers a direct `?isbn=…` query; bare integers (e.g. the "5485" some DAISY producers emit) fall back to title+author. Ambiguity guard: language match + last-name token overlap before accepting a hit.
  • Image handling v1 is pass-through: sniff JPEG/PNG magic, embed as-is. No re-encoding, no resize. Logs the size so the operator notices outliers.
  • `--cover X --auto-cover` rejected at CLI parse time via clap `conflicts_with`.
  • `--cover` is loud on failure (missing file → abort). `--auto-cover` is silent on miss (timeout / 0 hits / low confidence → log warn, continue without cover).
  • Test gating: `DPUB_TEST_OPENLIBRARY=1` opt-in env var for the network-dependent test, matching the existing `DPUB_TEST_*` pattern. Synthetic-fixture tests use a tiny in-tree PNG so EPUBCheck assertions stay green on every CI run.

Order of implementation

Plan supports a single PR or a clean split:

  1. Writer plumbing + `--cover ` (PR-1).
  2. `dpub-meta` crate + `--auto-cover` (PR-2).

Files touched (per plan)

  • `crates/dpub-cli/src/main.rs`
  • `crates/dpub-convert/src/lib.rs` (+ `Cargo.toml`)
  • `crates/epub3-writer/src/{model,writers,zip_assembly}.rs`
  • New `crates/dpub-meta/` (Cargo.toml + src/lib.rs + tests)
  • Workspace `Cargo.toml` (members + `ureq` pin)
  • `CHANGELOG.md`, `README.md`

Open questions to resolve before merge

  • Cover-XHTML wrapper in spine: some readers want this, EPUB 3 doesn't require it. Recommend skipping in v1; confirm.
  • Should DAISY-internal covers (rare but exist on some publications) become a third path `--cover daisy`? Out of scope for this issue but worth a follow-up if encountered.

Companion to #10 (which fixes the per-file Whisper model reload).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions