Skip to content

Add manual verifier UI with jobs.db persistence and request-o-matic check#53

Merged
jakebromberg merged 5 commits into
mainfrom
verifier-ui
May 12, 2026
Merged

Add manual verifier UI with jobs.db persistence and request-o-matic check#53
jakebromberg merged 5 commits into
mainfrom
verifier-ui

Conversation

@jakebromberg
Copy link
Copy Markdown
Member

Summary

  • New static SPA at verifier/ for row-by-row manual verification of pipeline output; each row shows a cropped image strip beside an editable text field, with mark-as-deleted, add-row, and per-page meta editing.
  • POST /api/save persists the session: writes <stem>.verified.json and <stem>.corrections.json to data/verifier/, and (when the bundle carries pdf_path/page_number) records the verification in jobs.db via the new JobStore.mark_verified method. Schema migration adds verified_at, verified_path, corrections_path columns with an idempotent ALTER TABLE pass.
  • Check artists button looks each row up via request-o-matic /request (proxied same-origin). Per-row badges layer three gating signals on top of confidence: postdates (release year > page year), artist-only fallback (search_type=song_as_artist or song_not_found=true), and different artist (zero shared tokens between input and resolved artist). Badge labels distinguish album: / sample release: from the flowsheet's Artist - Track shape.
  • Geometry: partition_row_lines_by_quadrant gains a correction pass for the bottom-block hour-jock-cell baseline (_detect_body_mid_y sometimes lands body_mid_y below it, misattributing the line to the top quadrant and shifting bottom-quadrant row crops up by one). _merge_with_spans propagates notes="double_height" to continuation-merged entries.
  • scripts/derive_truth.py produces tests/golden/<stem>.truth.json from a verified PageResult with substring rules pinned by parametrized tests.

Closes #52.

Test plan

  • pytest -q — 482 pass, ruff/mypy clean.
  • Smoke: regenerate 5 bundles via scripts/make_verifier_bundle.py; load each in the SPA via python verifier/serve.py; confirm row crops align (the page25 bottom_left row 0 regression was the original symptom), notes="double_height" pre-selects on continuation-merged rows, page-view panel opens by default.
  • Smoke: POST /api/save round-trip — file lands in data/verifier/; db_updated: false for test goldens (no job row), payload validates as PageResult server-side before write.
  • Smoke: Check artists on page25 — Beatnigs lands strong (model right, library hit); Pure Joy → Coldcut flags ⚠ different artist; Beastie Boy flags ⚠ artist-only + ⚠ postdates; Little John → Little Joy flags ⚠ postdates.
  • Round-trip: derive_truth on a generated verified.json parses back via GoldenTruth.load.
  • Manual: hand-verify all 5 goldens through the UI on a second pass, exercising Save + Check artists end-to-end.

Notes for review

  • Bundle SCHEMA_VERSION is 2; the UI rejects unknown versions. Re-generate any in-flight bundles after merge.
  • data/jobs.db migration is idempotent — first init() against a pre-verification DB picks up the columns; no manual intervention.
  • The static SPA assumes python verifier/serve.py (FastAPI), not bare python -m http.server. The Check artists button is the load-bearing reason — request-o-matic doesn't emit CORS.
  • Verifier badges intentionally don't auto-fire on row load (cost: each row = one LLM call). They're explicit per-page batches.

Adds a serverless verifier UI that shows each row's cropped image strip next to its model-detected text in editable fields. Three new pieces wired together:

- scripts/make_verifier_bundle.py — pre-processor: PageResult JSON + page PNG into a bundle.json with per-quadrant and per-row pixel bboxes. Continuation-merged entries get a physical-row span so their crops cover the wrapped lines instead of nudging subsequent rows out of alignment; double_height entries get span=2 inherently.
- scripts/derive_truth.py — verified.json into tests/golden/<stem>.truth.json by extracting short uppercased substrings (whitespace-tokenized date, 4-char jock prefix, 24-char artist portion via parse_artist_track). Substring rules live in Python so they're testable in one place.
- verifier/ — static HTML/JS/CSS SPA (no build step). Loads a bundle via ?bundle= URL param or file picker, canvas-crops each row, lets the user edit raw_text / type_raw / notes / hour_raw / jock_raw / page meta, mark hallucinations (x), and add missed rows (+). Export emits two files: <stem>.verified.json (PageResult-shaped, plugs back into the pipeline) and <stem>.corrections.json (delta vs the immutable bundle snapshot — page/quadrant/row corrections, added_rows, deleted_rows).

Also lifts a public partition_row_lines_by_quadrant helper out of core/page_layout's private detection internals — same row-line detector, partitioned per quadrant by body_mid_y and column-side ink density.

Bundle layout: data/verifier/<stem>.bundle.json with image_path computed as os.path.relpath to data/pages/<rel-pdf>/<stem>.png so bundles are portable. SCHEMA_VERSION = 1 hardcoded; UI rejects unknown versions.

465 tests pass, ruff/mypy clean.
…fixes

DB-backed Save replaces file-download Export. The UI now POSTs to /api/save (verifier/serve.py — a small FastAPI server that also same-origin-proxies request-o-matic and serves the static SPA + data + tests dirs). Save writes <stem>.verified.json and <stem>.corrections.json into data/verifier/ and, when the bundle carries pdf_path + page_number, records the verification in jobs.db via the new JobStore.mark_verified method. jobs.db gains verified_at, verified_path, corrections_path columns plus a partial index on verified_at; init() runs ALTER TABLE for legacy DBs so existing data is preserved. Bundles bump to SCHEMA_VERSION=2 with optional pdf_path/page_number auto-detected when the result JSON lives under data/results/<rel-pdf>/page-NN.json (null for test fixtures, where Save falls back to file-only persistence).

Check artists button looks each row up via request-o-matic's /request endpoint through the same-origin proxy. Per-row badges show resolved artist + matched release with three gating signals stacked on top of the raw confidence: postdates (release_year > the year parsed from bundle.page_date_raw), artist-only fallback (search_type=song_as_artist or song_not_found=true — request-o-matic found the artist but not the played track, and the release shown is one of theirs picked arbitrarily), and disjoint-artist tokens (no shared tokens after stop-word and trailing-s normalization — catches request-o-matic fuzzy-matching on a track word and returning an unrelated artist, e.g. "Pure Joy - Pieces" -> "Coldcut - More Beats & Pieces"). The badge labels resolved release names with "album:" or "sample release:" prefix so they can't be mistaken for a track-level match (the flowsheet records artist - track, but the library matches at release level).

partition_row_lines_by_quadrant gets a correction pass for the bottom-block hour-jock-cell baseline. _detect_body_mid_y's gap-by-anchor heuristic sometimes lands body_mid_y BELOW the bottom block's hour-jock baseline, which misattributes that line to the top quadrant — shifting every bottom-quadrant row crop up by one (a quadrant's row 0 crop showed row 1 content). Fix: when the top quadrant's last spacing exceeds 1.3x the median row spacing, that line moves to the corresponding bottom quadrant. _merge_with_spans propagates notes="double_height" to entries that absorbed a continuation, so the notes dropdown reflects multi-physical-row entries instead of showing (none).

UI polish: page-view side panel opens by default on bundle load (verifiers need the full-page reference); notes select shows "(none)" instead of blank and the row gets a tinted background when notes is non-null; type_raw is a free-text input matching the schema's str | None (covers doodles like "hand-drawn smiley" and compound values like "O/std"); each row stacks crop above editable field so the layout reflows cleanly when the page-view panel is open.

474 tests, ruff/mypy clean.
Pre-PR review caught three issues. Fixed:

- verifier/README.md bundle-schema example showed schema_version=1 — code requires 2, and the UI rejects any other value. Updated the example to v2, added the new pdf_path and page_number fields, and revised the version-history paragraph.
- README documented a verified_rows key in corrections.json plus a per-row "verified" checkbox with auto-set semantics that were removed in an earlier UX pass. The actual buildCorrectionsExport emits {page_corrections, quadrant_corrections, row_corrections, added_rows, deleted_rows} with no verified_rows. Stripped the stale docs.
- Added tests/unit/test_verifier_serve.py exercising /api/save: missing-field rejection, PageResult validation rejection, path-traversal guard on stem, both files written on success, overwrite semantics, jobs.db updated when a row matches, db_updated=false when no row matches, db_updated=false when jobs.db doesn't exist. Uses httpx ASGITransport for in-process testing, no live server.

Drive-by: response paths now use relative_to(DATA_ROOT.parent) instead of relative_to(REPO_ROOT) so the response works whether DATA_ROOT lives under the repo (production) or a tmp dir (tests).

482 tests, ruff/mypy clean.
…ic writes

Addresses code review feedback on #53.

HIGH:
- /api/save now writes the Pydantic-validated round-trip (validated.model_dump_json) instead of the raw client dict. A client that leaks bundle-only fields (schema_version, stem, image_path, per-quadrant bbox) no longer pollutes the on-disk verified.json — Pydantic's default extra='ignore' strips them. The on-disk file becomes a canonical representation that bit-matches what core/pipeline.py writes. New test exercises this with deliberately-polluted input.
- /data static mount now honors DATA_ROOT (matching the write side), so a moved DATA_ROOT doesn't cause the UI to read from REPO_ROOT/data while the save endpoint writes elsewhere. StaticFiles uses check_dir=False so an empty fresh DATA_ROOT doesn't blow up at server start.

MEDIUM:
- Atomic file writes via .tmp + os.replace. A failed second write or process kill leaves either both files at their pre-save state or both at the new state, never a half-updated state where verified.json reflects the edit but corrections.json doesn't. New test asserts no .tmp files leak after successful saves.
- page_number now rejects bool. isinstance(x, int) is True for bool in Python, so a malformed page_number: true previously coerced to 1 and looked up the wrong job row. Defensive `not isinstance(page_number, bool)` guard; new test confirms files still write but db_updated stays False on bool input.
- JobStore.init() result cached per (db_path, process). _open_jobs_store re-checks is_file() each call so a DB created mid-session is picked up without restart, but the migration round trip only runs once.

LOW:
- _safe_stem rejects whitespace-only stems (would produce confusing " .verified.json" files). Test extended with empty-string, all-spaces, tab.
- 1.3 magic threshold in partition_row_lines_by_quadrant promoted to module-level _BOTTOM_BASELINE_REATTRIBUTION_RATIO alongside the other detection constants.
- app.js header comment refreshed: references Save not Export, drops the removed _verified flag.

485 tests pass; ruff/mypy clean.
CI install path (.[dev]) didn't pick these up — they were transitive in my local venv but not declared as project dependencies. Result: ModuleNotFoundError on `import uvicorn` at import-time in verifier/serve.py, cascading to every tests/unit/test_verifier_serve.py case.

These are runtime deps of the verifier feature now that verifier/ is in the repo: fastapi + uvicorn run the dev server, httpx powers the /api/lookup proxy and httpx.ASGITransport in the test suite. Putting them under main `dependencies` mirrors how library-metadata-lookup declares the same trio.

Versions pinned to floors compatible with current pins of pydantic v2 and starlette.
@jakebromberg jakebromberg merged commit 71a0591 into main May 12, 2026
3 checks passed
jakebromberg added a commit that referenced this pull request May 12, 2026
…ic writes

Addresses code review feedback on #53.

HIGH:
- /api/save now writes the Pydantic-validated round-trip (validated.model_dump_json) instead of the raw client dict. A client that leaks bundle-only fields (schema_version, stem, image_path, per-quadrant bbox) no longer pollutes the on-disk verified.json — Pydantic's default extra='ignore' strips them. The on-disk file becomes a canonical representation that bit-matches what core/pipeline.py writes. New test exercises this with deliberately-polluted input.
- /data static mount now honors DATA_ROOT (matching the write side), so a moved DATA_ROOT doesn't cause the UI to read from REPO_ROOT/data while the save endpoint writes elsewhere. StaticFiles uses check_dir=False so an empty fresh DATA_ROOT doesn't blow up at server start.

MEDIUM:
- Atomic file writes via .tmp + os.replace. A failed second write or process kill leaves either both files at their pre-save state or both at the new state, never a half-updated state where verified.json reflects the edit but corrections.json doesn't. New test asserts no .tmp files leak after successful saves.
- page_number now rejects bool. isinstance(x, int) is True for bool in Python, so a malformed page_number: true previously coerced to 1 and looked up the wrong job row. Defensive `not isinstance(page_number, bool)` guard; new test confirms files still write but db_updated stays False on bool input.
- JobStore.init() result cached per (db_path, process). _open_jobs_store re-checks is_file() each call so a DB created mid-session is picked up without restart, but the migration round trip only runs once.

LOW:
- _safe_stem rejects whitespace-only stems (would produce confusing " .verified.json" files). Test extended with empty-string, all-spaces, tab.
- 1.3 magic threshold in partition_row_lines_by_quadrant promoted to module-level _BOTTOM_BASELINE_REATTRIBUTION_RATIO alongside the other detection constants.
- app.js header comment refreshed: references Save not Export, drops the removed _verified flag.

485 tests pass; ruff/mypy clean.
@jakebromberg jakebromberg deleted the verifier-ui branch May 12, 2026 15:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add manual verifier UI for flowsheet extraction output

1 participant