Skip to content

feat: on-device OCR for image items, searchable (P13b-1)#139

Merged
blokzdev merged 1 commit into
mainfrom
claude/p13b1-ocr
Jun 2, 2026
Merged

feat: on-device OCR for image items, searchable (P13b-1)#139
blokzdev merged 1 commit into
mainfrom
claude/p13b1-ocr

Conversation

@blokzdev
Copy link
Copy Markdown
Owner

@blokzdev blokzdev commented Jun 2, 2026

What & why

First half of P13b — and per your call, OCR leads (it de-risks the ML Kit dependency before the more complex translation). A "Scan text" action on image item detail extracts text with ML Kit on-device text recognition, caches it, surfaces it, and makes it full-text searchable.

Critically, it uses the bundled Latin model — no Google Play Services, fully offline — which fits GrabBit's sideloaded/de-Googled posture. On-demand only; opt-in auto-OCR-on-download is the planned P13b-3 follow-up (mirroring P13a → P13a-2).

Changes

  • Dependency: google_mlkit_text_recognition (MIT plugin; ML Kit binaries free/proprietary, on-device). Recorded in docs/SPEC.md.
  • Engine seam: OcrEngine interface + MlKitOcrEngine (Android, bundled Latin) + UnavailableOcrEngine + factory/provider — mirrors the transcription engine seam; graceful (no crash) off-Android.
  • Schema v11→v12: MediaMetadata.ocrText; media_fts gains an ocr column. Since FTS5 can't ALTER ADD COLUMN, the v12 migration drops + rebuilds the FTS table, triggers, and backfill (now including ocr) so search covers image text. ocrText is also added (capped) to the embedding doc so semantic search benefits. MetadataRepository.updateOcrText (upsert/clear).
  • UI: _OcrSection ("Scan text" / "Rescan") on image item detail; hidden when the engine can't run on this host. No opt-in toggle — OCR is free + offline, so it's an always-available action.

Tests

dart format clean · flutter analyze No issues · flutter test 779 passed, including: OCR-only word found via full-text search, updateOcrText round-trip (+ FTS), the v11→v12 migration (asserts the rebuilt media_fts has the ocr column and indexes it), embed-doc inclusion, and engine availability/graceful-off-Android.

Honest notes

  • Owed APK spot-check (VERIFICATION → P13b-1): scan a real image on-device → text appears, persists, and becomes searchable, offline; non-text image → "no readable text"; video/audio → no OCR section. CI can't run the native ML Kit call or build the APK; the _OcrSection widget + the native call are APK-verified (consistent with the whisper/generation boundary).
  • Bundled (not GMS) text recognition needs no AndroidManifest meta-data; the APK build confirms the bundled model packs correctly (and its ~4 MB size impact).
  • Non-Latin scripts (CJK/Devanagari) are deferred to BACKLOG.
  • Process note: I briefly committed to local main before branching; caught it before any push, moved the commit onto claude/p13b1-ocr, and reset local main to origin/main. Nothing reached the remote main.

https://claude.ai/code/session_013JoYmLCosYt5tQ8qwdbL1T


Generated by Claude Code

First half of P13b (OCR-first, maintainer call): a "Scan text" action on
image detail extracts text with ML Kit on-device text recognition, caches
it, surfaces it, and makes it full-text searchable. Uses the bundled Latin
model — no Google Play Services, fully offline — fitting the sideloaded
posture. On-demand only; opt-in auto-OCR-on-download is the P13b-3 follow-up.

- Dep: google_mlkit_text_recognition (MIT plugin; bundled Latin model).
- Engine seam: OcrEngine + MlKitOcrEngine (Android) + UnavailableOcrEngine +
  factory/provider, mirroring the transcription engine; graceful off-Android.
- Schema v11→v12: MediaMetadata.ocrText; media_fts gains an `ocr` column
  (table + triggers + backfill rebuilt in the migration — FTS5 can't ALTER
  ADD COLUMN) so search covers image text; ocrText added (capped) to the
  embed doc for semantic search; MetadataRepository.updateOcrText.
- UI: _OcrSection "Scan text"/"Rescan" on image item detail; hidden when the
  engine can't run here. No opt-in toggle (OCR is free + offline).
- Tests: OCR FTS search, updateOcrText round-trip, v11→v12 migration (incl.
  the FTS `ocr` column), embed-doc inclusion, engine availability.
- Docs: P13-PLAN (OCR-first reorder + P13b-1 status), VERIFICATION P13b-1,
  SPEC dependency row, BACKLOG (non-Latin scripts).

https://claude.ai/code/session_013JoYmLCosYt5tQ8qwdbL1T
@blokzdev blokzdev merged commit c5233c3 into main Jun 2, 2026
1 check passed
@blokzdev blokzdev deleted the claude/p13b1-ocr branch June 2, 2026 06:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants