blokzdev · blokzdev · Jun 2, 2026 · Jun 2, 2026 · Jun 2, 2026 · Jun 2, 2026
diff --git a/docs/BACKLOG.md b/docs/BACKLOG.md
@@ -16,6 +16,15 @@ _(nothing active — pick the next batch from below)_
 - [ ] **OCR — non-Latin scripts.** P13b-1 ships the **bundled Latin** ML Kit recognizer (no Google Play
       Services, offline). Chinese/Japanese/Korean/Devanagari need their own ML Kit script models (extra APK
       size or a download). Add a script choice if users want non-Latin OCR. *(From P13b-1.)*
+- [ ] **Unconditional `--write-thumbnail` for image downloads.** `YtDlpHost.kt` passes
+      `--write-thumbnail --convert-thumbnails jpg` for every download, so an image download wastes a fetch
+      writing a thumbnail of the photo. P13b-3 handles this defensively in Dart (the classifier keeps the
+      largest image as the photo and the smaller as its thumbnail), but a cleaner fix would gate the flag off
+      at request time for image downloads (needs an `isImage`/`writeThumbnail` hint through the Pigeon
+      `DownloadRequest`). *(From P13b-3 sweep.)*
+- [ ] **Image formats outside `mediaTypeForExt`.** `.heic`/`.heif`/`.avif`/`.tiff` aren't in the image set,
+      so such a download is classified as a `video` item. Add them (+ confirm the player/thumbnail handle
+      them) if real downloads produce them. *(From P13b-3 sweep.)*
 - [ ] **Auto-summarize — queue-decoupled background run.** P13a-2 generates the auto-summary **inline** in
       `_persistCompleted` before the next download pumps (gated on "model present" so it can't stall on a
       fetch), exactly like `autoTranscribe`. Generation is heavier than whisper-tiny, so a fuller design

diff --git a/docs/VERIFICATION.md b/docs/VERIFICATION.md
@@ -954,6 +954,20 @@ entries, or verify after P11c lands.)*
       "Couldn't detect the language".
 - [ ] On a host without ML Kit, the **Translate…** action is absent (graceful).
 
+### P13b-3 — Auto-OCR on download (+ image-download fix)  *(install `app-arm64-v8a-debug.apk`)*
+- [ ] **Image download fix:** download a single image (e.g. an Instagram/X photo, or a photo carousel) →
+      it now appears in the library as an **image item** (previously it produced nothing), shows **its own
+      picture as the thumbnail** in the grid/dashboard/collections (not a movie-icon placeholder), and is
+      exactly **one** item even though yt-dlp also writes a thumbnail sidecar. The video case is unchanged
+      (the video is the item; its thumbnail is still a thumbnail).
+- [ ] **Export:** export a downloaded image item to the gallery → it lands in the **Images** collection
+      and opens in the device gallery.
+- [ ] AI & graph settings → enable **Image text (OCR) · Auto-scan new image downloads**. Download an image
+      with legible text → its text becomes **searchable** + a "Text found in image" Activity Inbox entry,
+      **fully offline**.
+- [ ] **Default off:** with the toggle off, image downloads are not auto-scanned (on-demand "Scan text"
+      still works). A **video** download is never auto-OCR'd. The queue still drains normally.
+
 ### P13 (later subphases)
 - [ ] **Transcription / summarization / translation / OCR** each work (capability-gated) and write
       results back to the item.

diff --git a/docs/design/P13-PLAN.md b/docs/design/P13-PLAN.md
@@ -150,12 +150,29 @@ target-language UX + GMS nuance). Measure APK-size impact in the first ML Kit bu
   BCP mapping, `translateReadiness` truth table, controller with a fake engine. **Pending APK spot-check**
   (the native ML Kit translate/language-id + the pack download); the widget flow is APK-verified.
 
-#### `[ ]` P13b-3 — Auto-OCR on download *(follow-up; native; APK)*
+#### `[~]` P13b-3 — Auto-OCR on download (+ image-download fix) *(follow-up; native; APK)*
 - Opt-in (default off) auto-scan of **image** downloads, mirroring P13a-2 auto-summarize: a settings toggle +
   a gated block in `queue_controller._persistCompleted` (runs inline; OCR is cheap + offline) → `updateOcrText`
   → an Activity Inbox entry. Grows search coverage automatically.
-- **Exit / review:** with auto-OCR on, a finished image download is scanned + becomes searchable offline;
-  default-off does nothing; the queue still drains.
+- **Precursor fix (maintainer call):** `classifyDownloadOutputs` routed **all** image extensions to `thumb`,
+  so a single-image download (a photo/carousel) produced **no media item** — auto-OCR would never fire.
+  Fixed: image files are tentative thumbnails, but when a download has **no video/audio**, the images **are**
+  the media (→ `image` items). Reuses `mediaTypeForExt` for consistency. This also fixes image downloads
+  generally (they now appear in the library, with dimensions, OCR, etc.).
+- **Exit / review:** an image-only download becomes an `image` item; with auto-OCR on, it's scanned + becomes
+  searchable offline; default-off / video items do nothing; the queue still drains.
+- **Status:** implemented (CI-green) — classifier fix (+ tests); `autoOcrOnDownload` setting + setter; pure
+  `shouldAutoOcr`; gated auto-OCR block in `_persistCompleted` (`ocrCount` in `_PersistResult`) + an `ai`
+  success inbox entry when text is found; an "Image text (OCR)" auto-scan card in AI settings (shown where ML
+  Kit runs). Tests: classifier image cases, `shouldAutoOcr` truth table, settings round-trip, and queue cases
+  (image+text → `ocrText` + entry; default-off no-op; video skipped). **No schema/deps change.** **Pending
+  APK spot-check** (real image download → image item + searchable text + inbox entry, offline).
+- **Pre-merge sweep refinements (same PR):** (a) `MediaThumb` now falls back to the image **file** for
+  `image` items with a null thumbnail (they were showing a movie-icon placeholder in grid/dashboard/
+  collections/hero/related); (b) the classifier collapses an image + its yt-dlp `--write-thumbnail` sidecar
+  to **one** item (largest = photo, smaller = thumbnail) so a single image download isn't double-counted;
+  (c) quick wins — auto-transcribe skips image items, and `durationSec` is gated to non-image. The
+  unconditional `--write-thumbnail` and non-`mediaTypeForExt` image formats are logged in `BACKLOG.md`.
 
 ### `[ ]` P13c — Smart auto-tagging *(generation; APK)*
 LLM-suggested tags feeding the **existing** tag system — builds directly on the P13a generation patterns.

diff --git a/lib/features/library/presentation/media_grid.dart b/lib/features/library/presentation/media_grid.dart
@@ -241,12 +241,20 @@ class MediaThumb extends StatelessWidget {
     final fallback = ColoredBox(
       color: scheme.surfaceContainerHighest,
       child: Icon(
-        item.type == 'audio' ? Icons.music_note : Icons.movie_outlined,
+        switch (item.type) {
+          'audio' => Icons.music_note,
+          'image' => Icons.image_outlined,
+          _ => Icons.movie_outlined,
+        },
         color: scheme.onSurfaceVariant,
         size: 40,
       ),
     );
-    final thumbPath = item.thumbPath;
+    // Image items often have no separate thumbnail (the photo is its own
+    // thumbnail) — render the image file directly. Everything else needs a
+    // generated thumbnail.
+    final thumbPath =
+        item.thumbPath ?? (item.type == 'image' ? item.filePath : null);
     if (thumbPath == null) return fallback;
     return Image.file(
       File(thumbPath),

diff --git a/lib/features/library/presentation/ocr.dart b/lib/features/library/presentation/ocr.dart
@@ -0,0 +1,15 @@
+/// Pure, engine-free helper for auto-OCR-on-download (P13b-3). Kept out of the
+/// queue controller so the gating decision is unit-testable in isolation
+/// (mirrors `autoSummaryDecision`).
+library;
+
+/// Whether a freshly downloaded item should be auto-scanned for text now.
+/// [enabled] is `autoOcrOnDownload`; [engineAvailable] is whether ML Kit OCR can
+/// run on this host; [isImage] is whether the item is an image; [alreadyScanned]
+/// is whether OCR text is already stored.
+bool shouldAutoOcr({
+  required bool enabled,
+  required bool engineAvailable,
+  required bool isImage,
+  required bool alreadyScanned,
+}) => enabled && engineAvailable && isImage && !alreadyScanned;
diff --git a/lib/features/queue/data/completed_outputs.dart b/lib/features/queue/data/completed_outputs.dart
@@ -1,18 +1,24 @@
 import 'dart:io';
 
+import 'package:grabbit/core/utils/media_type.dart';
+
 /// The classified files produced by a finished download in its per-task folder.
 typedef DownloadOutputs = ({List<File> media, File? thumb, File? info});
 
 const _subtitleExts = {'srt', 'vtt', 'ass', 'ssa', 'lrc', 'sub'};
-const _thumbExts = {'jpg', 'jpeg', 'png', 'webp'};
 
 /// Sorts a download folder's files into the media file(s), the thumbnail, and
 /// the `.info.json` sidecar. Subtitle sidecars (`.srt`/`.vtt`/`.srv*`/…) and
 /// other JSON sidecars are excluded so they're never mistaken for the media —
 /// and multiple media files (yt-dlp `--split-chapters`) are all returned.
+///
+/// Image files are **tentative thumbnails**: alongside a video/audio download an
+/// image is the thumbnail sidecar, but an **image-only** download (a photo or a
+/// carousel of photos) has no video/audio — there the images *are* the media
+/// (P13b-3), so they become image library items rather than being discarded.
 DownloadOutputs classifyDownloadOutputs(Iterable<File> files) {
-  final media = <File>[];
-  File? thumb;
+  final media = <File>[]; // video / audio
+  final images = <File>[]; // image files — thumbnail(s) or the media itself
   File? info;
   for (final f in files) {
     final lower = f.path.toLowerCase();
@@ -23,12 +29,39 @@ DownloadOutputs classifyDownloadOutputs(Iterable<File> files) {
       // Other yt-dlp sidecars (e.g. live chat) — ignore.
     } else if (_subtitleExts.contains(ext) || ext.startsWith('srv')) {
       // Subtitle sidecars — not media.
-    } else if (_thumbExts.contains(ext)) {
-      thumb ??= f;
+    } else if (mediaTypeForExt(ext) == 'image') {
+      images.add(f);
     } else {
       media.add(f);
     }
   }
   media.sort((a, b) => a.path.compareTo(b.path));
-  return (media: media, thumb: thumb, info: info);
+  // Video/audio present → images are thumbnail sidecars (keep the first).
+  if (media.isNotEmpty) {
+    images.sort((a, b) => a.path.compareTo(b.path));
+    return (
+      media: media,
+      thumb: images.isEmpty ? null : images.first,
+      info: info,
+    );
+  }
+  // Image download → the image is the media. A carousel expands to one task
+  // (folder) per photo, so multiple images here means the photo PLUS yt-dlp's
+  // `--write-thumbnail` sidecar — keep the largest as the photo and the next as
+  // its thumbnail (rather than minting a duplicate item).
+  if (images.length <= 1) {
+    return (media: images, thumb: null, info: info);
+  }
+  images.sort((a, b) => _sizeOf(b).compareTo(_sizeOf(a)));
+  return (media: [images.first], thumb: images[1], info: info);
+}
+
+/// File size in bytes, or 0 when it can't be read (e.g. a missing path in a
+/// unit test) — used only to pick the largest image as the media.
+int _sizeOf(File f) {
+  try {
+    return f.lengthSync();
+  } on FileSystemException {
+    return 0;
+  }
 }
diff --git a/lib/features/queue/presentation/queue_controller.dart b/lib/features/queue/presentation/queue_controller.dart
@@ -6,6 +6,7 @@ import 'package:drift/drift.dart' show Value;
 import 'package:flutter/widgets.dart' show AppLifecycleState;
 import 'package:flutter_riverpod/flutter_riverpod.dart';
 import 'package:grabbit/core/ai/generation_provider.dart';
+import 'package:grabbit/core/ai/ocr_provider.dart';
 import 'package:grabbit/core/ai/transcription_provider.dart';
 import 'package:grabbit/core/db/database.dart';
 import 'package:grabbit/core/db/database_provider.dart';
@@ -26,6 +27,7 @@ import 'package:grabbit/features/library/data/library_repository.dart';
 import 'package:grabbit/features/library/data/metadata_repository.dart';
 import 'package:grabbit/features/library/data/transcript_service.dart';
 import 'package:grabbit/features/library/presentation/ai_summary.dart';
+import 'package:grabbit/features/library/presentation/ocr.dart';
 import 'package:grabbit/features/notifications/data/notification_enums.dart';
 import 'package:grabbit/features/notifications/data/notifications_repository.dart';
 import 'package:grabbit/features/notifications/data/system_notification_service.dart';
@@ -51,6 +53,8 @@ typedef _PersistResult = ({
   // in but the generation model isn't downloaded → prompt to finish setup.
   int summaryCount,
   bool summaryNeedsModel,
+  // P13b-3: count of image items auto-scanned for text (OCR).
+  int ocrCount,
 });
 
 class QueueConfig {
@@ -418,6 +422,20 @@ class QueueController extends _$QueueController {
         dedupeKey: 'summary_needs_model',
       );
     }
+    // P13b-3: auto-OCR found text in a downloaded image (now searchable).
+    if (result.ocrCount > 0) {
+      await center.post(
+        category: NotificationCategory.ai,
+        severity: NotificationSeverity.success,
+        title: queued.title,
+        body: result.ocrCount > 1
+            ? 'Text found in ${result.ocrCount} images'
+            : 'Text found in image',
+        targetRoute: route,
+        itemId: single ? result.primaryId : null,
+        dedupeKey: 'ocr_$id',
+      );
+    }
     await _maybeNotifyOs(
       taskId: id,
       title: queued.title,
@@ -578,6 +596,7 @@ class QueueController extends _$QueueController {
       transcriptionNeedsModel: false,
       summaryCount: 0,
       summaryNeedsModel: false,
+      ocrCount: 0,
     );
     // Files land in a per-task subfolder (see YtDlpHost `-o`): the task id names
     // the folder, the user's template names the file inside it.
@@ -634,7 +653,9 @@ class QueueController extends _$QueueController {
                 type: type,
                 createdAt: DateTime.now(),
                 storageState: 'private',
-                durationSec: Value(single ? queued.durationSec : null),
+                durationSec: Value(
+                  single && type != 'image' ? queued.durationSec : null,
+                ),
                 sizeBytes: Value(await mediaFile.length()),
                 thumbPath: Value(outputs.thumb?.path),
                 width: Value(width),
@@ -684,6 +705,10 @@ class QueueController extends _$QueueController {
           : null;
       final whisperReady = whisper != null && await whisper.ensureReady();
       for (final (i, mediaFile) in outputs.media.indexed) {
+        // Images have no audio to transcribe — skip (avoids a wasted whisper
+        // transcode of a photo).
+        final ext = mediaFile.path.split('.').last.toLowerCase();
+        if (mediaTypeForExt(ext) == 'image') continue;
         final itemId = single ? id : '${id}__$i';
         final timed = await transcripts.extractTimed(
           mediaFile.path,
@@ -773,13 +798,49 @@ class QueueController extends _$QueueController {
       }
     }
 
+    // P13b-3: auto-scan freshly downloaded images for text (OCR) when opted in,
+    // so they become searchable. On-device + offline (bundled ML Kit, no
+    // download); images only; skips ones already scanned.
+    var ocrCount = 0;
+    if (settings.autoOcrOnDownload) {
+      final ocr = ref.read(ocrEngineProvider);
+      final metadata = ref.read(metadataRepositoryProvider);
+      for (final (i, mediaFile) in outputs.media.indexed) {
+        final itemId = single ? id : '${id}__$i';
+        final ext = mediaFile.path.split('.').last.toLowerCase();
+        final isImage =
+            !queued.request.audioOnly && mediaTypeForExt(ext) == 'image';
+        final meta = await (db.select(
+          db.mediaMetadata,
+        )..where((m) => m.itemId.equals(itemId))).getSingleOrNull();
+        if (!shouldAutoOcr(
+          enabled: settings.autoOcrOnDownload,
+          engineAvailable: ocr.isAvailable,
+          isImage: isImage,
+          alreadyScanned: meta?.ocrText?.trim().isNotEmpty ?? false,
+        )) {
+          continue;
+        }
+        try {
+          final text = (await ocr.recognizeText(mediaFile.path)).trim();
+          if (text.isNotEmpty) {
+            await metadata.updateOcrText(itemId, text);
+            ocrCount++;
+          }
+        } catch (_) {
+          // A per-item OCR failure must not fail the download.
+        }
+      }
+    }
+
     return (
       primaryId: single ? id : '${id}__0',
       itemCount: outputs.media.length,
       transcriptCount: transcriptCount,
       transcriptionNeedsModel: transcriptionNeedsModel,
       summaryCount: summaryCount,
       summaryNeedsModel: summaryNeedsModel,
+      ocrCount: ocrCount,
     );
   }
 

diff --git a/lib/features/settings/data/settings_model.dart b/lib/features/settings/data/settings_model.dart
@@ -97,6 +97,11 @@ abstract class SettingsModel with _$SettingsModel {
     // fetch — mirrors `autoTranscribe`). The on-demand summary on item detail
     // (P13a) works regardless.
     @Default(false) bool autoSummarizeOnDownload,
+    // P13b-3: auto-extract text (OCR) from a newly downloaded image in the
+    // background, so it becomes searchable. Opt-in (defaults off); runs only on
+    // images, on-device + offline (bundled ML Kit, no download). The on-demand
+    // "Scan text" on item detail (P13b-1) works regardless.
+    @Default(false) bool autoOcrOnDownload,
     // On-device speech transcription (P12e). Opt-in (defaults off); the whisper
     // model is downloaded only when the user enables it + picks a model.
     // `selectedTranscriptionModelId` empty = the device-tier recommendation;

diff --git a/lib/features/settings/presentation/ai_settings_screen.dart b/lib/features/settings/presentation/ai_settings_screen.dart
@@ -12,6 +12,7 @@ import 'package:grabbit/core/ai/inference_error.dart';
 import 'package:grabbit/core/ai/model_capability_matrix.dart';
 import 'package:grabbit/core/ai/model_catalog.dart';
 import 'package:grabbit/core/ai/model_download_service.dart';
+import 'package:grabbit/core/ai/ocr_provider.dart';
 import 'package:grabbit/core/ai/transcription_model.dart';
 import 'package:grabbit/core/ai/transcription_provider.dart';
 import 'package:grabbit/core/device/device_profile.dart';
@@ -70,11 +71,57 @@ class AiSettingsScreen extends ConsumerWidget {
         ),
         const _GenerationCard(),
         const _TranscriptionCard(),
+        const _OcrCard(),
       ],
     );
   }
 }
 
+/// On-device image OCR (P13b-3). Image text is always scannable by hand from an
+/// image's detail screen (P13b-1); this card just offers the opt-in to do it
+/// automatically on download. Shown only where ML Kit OCR can run (Android).
+class _OcrCard extends ConsumerWidget {
+  const _OcrCard();
+
+  @override
+  Widget build(BuildContext context, WidgetRef ref) {
+    if (!ref.watch(ocrEngineProvider).isAvailable) {
+      return const SizedBox.shrink();
+    }
+    final auto = ref.watch(
+      settingsControllerProvider.select(
+        (s) => s.value?.autoOcrOnDownload ?? false,
+      ),
+    );
+    return Padding(
+      padding: const EdgeInsets.only(top: 8),
+      child: SettingsCard(
+        children: [
+          SwitchListTile(
+            secondary: const InfoHintButton(
+              InfoHint(
+                title: 'Auto-scan images for text',
+                body:
+                    'Automatically read text inside each downloaded image so '
+                    'you can search for it — all on-device and offline. You can '
+                    'always scan an image by hand from its detail screen.',
+              ),
+            ),
+            title: const Text('Image text (OCR)'),
+            subtitle: const Text(
+              'Scan new image downloads for searchable text',
+            ),
+            value: auto,
+            onChanged: (v) => ref
+                .read(settingsControllerProvider.notifier)
+                .setAutoOcrOnDownload(v),
+          ),
+        ],
+      ),
+    );
+  }
+}
+
 /// Compact banner framing the AI screen with the device's capability tier (P12g)
 /// — so a user understands *why* some AI options are offered or gated. Reads the
 /// live tier (probed at startup); the InfoHint explains on-device scaling.

diff --git a/lib/features/settings/presentation/settings_controller.dart b/lib/features/settings/presentation/settings_controller.dart
@@ -147,6 +147,10 @@ class SettingsController extends _$SettingsController {
   Future<void> setAutoSummarizeOnDownload(bool value) async =>
       _update((await future).copyWith(autoSummarizeOnDownload: value));
 
+  /// Auto-scan newly downloaded images for text (OCR) in the background (P13b-3).
+  Future<void> setAutoOcrOnDownload(bool value) async =>
+      _update((await future).copyWith(autoOcrOnDownload: value));
+
   /// On-device transcription opt-in (P12e).
   Future<void> setTranscriptionEnabled(bool value) async =>
       _update((await future).copyWith(transcriptionEnabled: value));