feat: on-device OCR for image items, searchable (P13b-1) by blokzdev · Pull Request #139 · blokzdev/GrabBit

blokzdev · 2026-06-02T05:08:24Z

What & why

First half of P13b — and per your call, OCR leads (it de-risks the ML Kit dependency before the more complex translation). A "Scan text" action on image item detail extracts text with ML Kit on-device text recognition, caches it, surfaces it, and makes it full-text searchable.

Critically, it uses the bundled Latin model — no Google Play Services, fully offline — which fits GrabBit's sideloaded/de-Googled posture. On-demand only; opt-in auto-OCR-on-download is the planned P13b-3 follow-up (mirroring P13a → P13a-2).

Changes

Dependency: google_mlkit_text_recognition (MIT plugin; ML Kit binaries free/proprietary, on-device). Recorded in docs/SPEC.md.
Engine seam: OcrEngine interface + MlKitOcrEngine (Android, bundled Latin) + UnavailableOcrEngine + factory/provider — mirrors the transcription engine seam; graceful (no crash) off-Android.
Schema v11→v12: MediaMetadata.ocrText; media_fts gains an ocr column. Since FTS5 can't ALTER ADD COLUMN, the v12 migration drops + rebuilds the FTS table, triggers, and backfill (now including ocr) so search covers image text. ocrText is also added (capped) to the embedding doc so semantic search benefits. MetadataRepository.updateOcrText (upsert/clear).
UI: _OcrSection ("Scan text" / "Rescan") on image item detail; hidden when the engine can't run on this host. No opt-in toggle — OCR is free + offline, so it's an always-available action.

Tests

dart format clean · flutter analyze No issues · flutter test 779 passed, including: OCR-only word found via full-text search, updateOcrText round-trip (+ FTS), the v11→v12 migration (asserts the rebuilt media_fts has the ocr column and indexes it), embed-doc inclusion, and engine availability/graceful-off-Android.

Honest notes

Owed APK spot-check (VERIFICATION → P13b-1): scan a real image on-device → text appears, persists, and becomes searchable, offline; non-text image → "no readable text"; video/audio → no OCR section. CI can't run the native ML Kit call or build the APK; the _OcrSection widget + the native call are APK-verified (consistent with the whisper/generation boundary).
Bundled (not GMS) text recognition needs no AndroidManifest meta-data; the APK build confirms the bundled model packs correctly (and its ~4 MB size impact).
Non-Latin scripts (CJK/Devanagari) are deferred to BACKLOG.
Process note: I briefly committed to local main before branching; caught it before any push, moved the commit onto claude/p13b1-ocr, and reset local main to origin/main. Nothing reached the remote main.

https://claude.ai/code/session_013JoYmLCosYt5tQ8qwdbL1T

Generated by Claude Code

First half of P13b (OCR-first, maintainer call): a "Scan text" action on image detail extracts text with ML Kit on-device text recognition, caches it, surfaces it, and makes it full-text searchable. Uses the bundled Latin model — no Google Play Services, fully offline — fitting the sideloaded posture. On-demand only; opt-in auto-OCR-on-download is the P13b-3 follow-up. - Dep: google_mlkit_text_recognition (MIT plugin; bundled Latin model). - Engine seam: OcrEngine + MlKitOcrEngine (Android) + UnavailableOcrEngine + factory/provider, mirroring the transcription engine; graceful off-Android. - Schema v11→v12: MediaMetadata.ocrText; media_fts gains an `ocr` column (table + triggers + backfill rebuilt in the migration — FTS5 can't ALTER ADD COLUMN) so search covers image text; ocrText added (capped) to the embed doc for semantic search; MetadataRepository.updateOcrText. - UI: _OcrSection "Scan text"/"Rescan" on image item detail; hidden when the engine can't run here. No opt-in toggle (OCR is free + offline). - Tests: OCR FTS search, updateOcrText round-trip, v11→v12 migration (incl. the FTS `ocr` column), embed-doc inclusion, engine availability. - Docs: P13-PLAN (OCR-first reorder + P13b-1 status), VERIFICATION P13b-1, SPEC dependency row, BACKLOG (non-Latin scripts). https://claude.ai/code/session_013JoYmLCosYt5tQ8qwdbL1T

blokzdev merged commit c5233c3 into main Jun 2, 2026
1 check passed

blokzdev deleted the claude/p13b1-ocr branch June 2, 2026 06:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: on-device OCR for image items, searchable (P13b-1)#139

feat: on-device OCR for image items, searchable (P13b-1)#139
blokzdev merged 1 commit into
mainfrom
claude/p13b1-ocr

blokzdev commented Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

blokzdev commented Jun 2, 2026

What & why

Changes

Tests

Honest notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants