feat: on-device OCR for image items, searchable (P13b-1)#139
Merged
Conversation
First half of P13b (OCR-first, maintainer call): a "Scan text" action on image detail extracts text with ML Kit on-device text recognition, caches it, surfaces it, and makes it full-text searchable. Uses the bundled Latin model — no Google Play Services, fully offline — fitting the sideloaded posture. On-demand only; opt-in auto-OCR-on-download is the P13b-3 follow-up. - Dep: google_mlkit_text_recognition (MIT plugin; bundled Latin model). - Engine seam: OcrEngine + MlKitOcrEngine (Android) + UnavailableOcrEngine + factory/provider, mirroring the transcription engine; graceful off-Android. - Schema v11→v12: MediaMetadata.ocrText; media_fts gains an `ocr` column (table + triggers + backfill rebuilt in the migration — FTS5 can't ALTER ADD COLUMN) so search covers image text; ocrText added (capped) to the embed doc for semantic search; MetadataRepository.updateOcrText. - UI: _OcrSection "Scan text"/"Rescan" on image item detail; hidden when the engine can't run here. No opt-in toggle (OCR is free + offline). - Tests: OCR FTS search, updateOcrText round-trip, v11→v12 migration (incl. the FTS `ocr` column), embed-doc inclusion, engine availability. - Docs: P13-PLAN (OCR-first reorder + P13b-1 status), VERIFICATION P13b-1, SPEC dependency row, BACKLOG (non-Latin scripts). https://claude.ai/code/session_013JoYmLCosYt5tQ8qwdbL1T
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What & why
First half of P13b — and per your call, OCR leads (it de-risks the ML Kit dependency before the more complex translation). A "Scan text" action on image item detail extracts text with ML Kit on-device text recognition, caches it, surfaces it, and makes it full-text searchable.
Critically, it uses the bundled Latin model — no Google Play Services, fully offline — which fits GrabBit's sideloaded/de-Googled posture. On-demand only; opt-in auto-OCR-on-download is the planned P13b-3 follow-up (mirroring P13a → P13a-2).
Changes
google_mlkit_text_recognition(MIT plugin; ML Kit binaries free/proprietary, on-device). Recorded indocs/SPEC.md.OcrEngineinterface +MlKitOcrEngine(Android, bundled Latin) +UnavailableOcrEngine+ factory/provider — mirrors the transcription engine seam; graceful (no crash) off-Android.MediaMetadata.ocrText;media_ftsgains anocrcolumn. Since FTS5 can'tALTER ADD COLUMN, the v12 migration drops + rebuilds the FTS table, triggers, and backfill (now includingocr) so search covers image text.ocrTextis also added (capped) to the embedding doc so semantic search benefits.MetadataRepository.updateOcrText(upsert/clear)._OcrSection("Scan text" / "Rescan") on image item detail; hidden when the engine can't run on this host. No opt-in toggle — OCR is free + offline, so it's an always-available action.Tests
dart formatclean ·flutter analyzeNo issues ·flutter test779 passed, including: OCR-only word found via full-text search,updateOcrTextround-trip (+ FTS), the v11→v12 migration (asserts the rebuiltmedia_ftshas theocrcolumn and indexes it), embed-doc inclusion, and engine availability/graceful-off-Android.Honest notes
_OcrSectionwidget + the native call are APK-verified (consistent with the whisper/generation boundary).mainbefore branching; caught it before any push, moved the commit ontoclaude/p13b1-ocr, and reset localmaintoorigin/main. Nothing reached the remotemain.https://claude.ai/code/session_013JoYmLCosYt5tQ8qwdbL1T
Generated by Claude Code