Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions docs/BACKLOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,15 @@ _(nothing active — pick the next batch from below)_
- [ ] **OCR — non-Latin scripts.** P13b-1 ships the **bundled Latin** ML Kit recognizer (no Google Play
Services, offline). Chinese/Japanese/Korean/Devanagari need their own ML Kit script models (extra APK
size or a download). Add a script choice if users want non-Latin OCR. *(From P13b-1.)*
- [ ] **Unconditional `--write-thumbnail` for image downloads.** `YtDlpHost.kt` passes
`--write-thumbnail --convert-thumbnails jpg` for every download, so an image download wastes a fetch
writing a thumbnail of the photo. P13b-3 handles this defensively in Dart (the classifier keeps the
largest image as the photo and the smaller as its thumbnail), but a cleaner fix would gate the flag off
at request time for image downloads (needs an `isImage`/`writeThumbnail` hint through the Pigeon
`DownloadRequest`). *(From P13b-3 sweep.)*
- [ ] **Image formats outside `mediaTypeForExt`.** `.heic`/`.heif`/`.avif`/`.tiff` aren't in the image set,
so such a download is classified as a `video` item. Add them (+ confirm the player/thumbnail handle
them) if real downloads produce them. *(From P13b-3 sweep.)*
- [ ] **Auto-summarize — queue-decoupled background run.** P13a-2 generates the auto-summary **inline** in
`_persistCompleted` before the next download pumps (gated on "model present" so it can't stall on a
fetch), exactly like `autoTranscribe`. Generation is heavier than whisper-tiny, so a fuller design
Expand Down
14 changes: 14 additions & 0 deletions docs/VERIFICATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -954,6 +954,20 @@ entries, or verify after P11c lands.)*
"Couldn't detect the language".
- [ ] On a host without ML Kit, the **Translate…** action is absent (graceful).

### P13b-3 — Auto-OCR on download (+ image-download fix) *(install `app-arm64-v8a-debug.apk`)*
- [ ] **Image download fix:** download a single image (e.g. an Instagram/X photo, or a photo carousel) →
it now appears in the library as an **image item** (previously it produced nothing), shows **its own
picture as the thumbnail** in the grid/dashboard/collections (not a movie-icon placeholder), and is
exactly **one** item even though yt-dlp also writes a thumbnail sidecar. The video case is unchanged
(the video is the item; its thumbnail is still a thumbnail).
- [ ] **Export:** export a downloaded image item to the gallery → it lands in the **Images** collection
and opens in the device gallery.
- [ ] AI & graph settings → enable **Image text (OCR) · Auto-scan new image downloads**. Download an image
with legible text → its text becomes **searchable** + a "Text found in image" Activity Inbox entry,
**fully offline**.
- [ ] **Default off:** with the toggle off, image downloads are not auto-scanned (on-demand "Scan text"
still works). A **video** download is never auto-OCR'd. The queue still drains normally.

### P13 (later subphases)
- [ ] **Transcription / summarization / translation / OCR** each work (capability-gated) and write
results back to the item.
Expand Down
23 changes: 20 additions & 3 deletions docs/design/P13-PLAN.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,12 +150,29 @@ target-language UX + GMS nuance). Measure APK-size impact in the first ML Kit bu
BCP mapping, `translateReadiness` truth table, controller with a fake engine. **Pending APK spot-check**
(the native ML Kit translate/language-id + the pack download); the widget flow is APK-verified.

#### `[ ]` P13b-3 — Auto-OCR on download *(follow-up; native; APK)*
#### `[~]` P13b-3 — Auto-OCR on download (+ image-download fix) *(follow-up; native; APK)*
- Opt-in (default off) auto-scan of **image** downloads, mirroring P13a-2 auto-summarize: a settings toggle +
a gated block in `queue_controller._persistCompleted` (runs inline; OCR is cheap + offline) → `updateOcrText`
→ an Activity Inbox entry. Grows search coverage automatically.
- **Exit / review:** with auto-OCR on, a finished image download is scanned + becomes searchable offline;
default-off does nothing; the queue still drains.
- **Precursor fix (maintainer call):** `classifyDownloadOutputs` routed **all** image extensions to `thumb`,
so a single-image download (a photo/carousel) produced **no media item** — auto-OCR would never fire.
Fixed: image files are tentative thumbnails, but when a download has **no video/audio**, the images **are**
the media (→ `image` items). Reuses `mediaTypeForExt` for consistency. This also fixes image downloads
generally (they now appear in the library, with dimensions, OCR, etc.).
- **Exit / review:** an image-only download becomes an `image` item; with auto-OCR on, it's scanned + becomes
searchable offline; default-off / video items do nothing; the queue still drains.
- **Status:** implemented (CI-green) — classifier fix (+ tests); `autoOcrOnDownload` setting + setter; pure
`shouldAutoOcr`; gated auto-OCR block in `_persistCompleted` (`ocrCount` in `_PersistResult`) + an `ai`
success inbox entry when text is found; an "Image text (OCR)" auto-scan card in AI settings (shown where ML
Kit runs). Tests: classifier image cases, `shouldAutoOcr` truth table, settings round-trip, and queue cases
(image+text → `ocrText` + entry; default-off no-op; video skipped). **No schema/deps change.** **Pending
APK spot-check** (real image download → image item + searchable text + inbox entry, offline).
- **Pre-merge sweep refinements (same PR):** (a) `MediaThumb` now falls back to the image **file** for
`image` items with a null thumbnail (they were showing a movie-icon placeholder in grid/dashboard/
collections/hero/related); (b) the classifier collapses an image + its yt-dlp `--write-thumbnail` sidecar
to **one** item (largest = photo, smaller = thumbnail) so a single image download isn't double-counted;
(c) quick wins — auto-transcribe skips image items, and `durationSec` is gated to non-image. The
unconditional `--write-thumbnail` and non-`mediaTypeForExt` image formats are logged in `BACKLOG.md`.

### `[ ]` P13c — Smart auto-tagging *(generation; APK)*
LLM-suggested tags feeding the **existing** tag system — builds directly on the P13a generation patterns.
Expand Down
12 changes: 10 additions & 2 deletions lib/features/library/presentation/media_grid.dart
Original file line number Diff line number Diff line change
Expand Up @@ -241,12 +241,20 @@ class MediaThumb extends StatelessWidget {
final fallback = ColoredBox(
color: scheme.surfaceContainerHighest,
child: Icon(
item.type == 'audio' ? Icons.music_note : Icons.movie_outlined,
switch (item.type) {
'audio' => Icons.music_note,
'image' => Icons.image_outlined,
_ => Icons.movie_outlined,
},
color: scheme.onSurfaceVariant,
size: 40,
),
);
final thumbPath = item.thumbPath;
// Image items often have no separate thumbnail (the photo is its own
// thumbnail) — render the image file directly. Everything else needs a
// generated thumbnail.
final thumbPath =
item.thumbPath ?? (item.type == 'image' ? item.filePath : null);
if (thumbPath == null) return fallback;
return Image.file(
File(thumbPath),
Expand Down
15 changes: 15 additions & 0 deletions lib/features/library/presentation/ocr.dart
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
/// Pure, engine-free helper for auto-OCR-on-download (P13b-3). Kept out of the
/// queue controller so the gating decision is unit-testable in isolation
/// (mirrors `autoSummaryDecision`).
library;

/// Whether a freshly downloaded item should be auto-scanned for text now.
/// [enabled] is `autoOcrOnDownload`; [engineAvailable] is whether ML Kit OCR can
/// run on this host; [isImage] is whether the item is an image; [alreadyScanned]
/// is whether OCR text is already stored.
bool shouldAutoOcr({
required bool enabled,
required bool engineAvailable,
required bool isImage,
required bool alreadyScanned,
}) => enabled && engineAvailable && isImage && !alreadyScanned;
45 changes: 39 additions & 6 deletions lib/features/queue/data/completed_outputs.dart
Original file line number Diff line number Diff line change
@@ -1,18 +1,24 @@
import 'dart:io';

import 'package:grabbit/core/utils/media_type.dart';

/// The classified files produced by a finished download in its per-task folder.
typedef DownloadOutputs = ({List<File> media, File? thumb, File? info});

const _subtitleExts = {'srt', 'vtt', 'ass', 'ssa', 'lrc', 'sub'};
const _thumbExts = {'jpg', 'jpeg', 'png', 'webp'};

/// Sorts a download folder's files into the media file(s), the thumbnail, and
/// the `.info.json` sidecar. Subtitle sidecars (`.srt`/`.vtt`/`.srv*`/…) and
/// other JSON sidecars are excluded so they're never mistaken for the media —
/// and multiple media files (yt-dlp `--split-chapters`) are all returned.
///
/// Image files are **tentative thumbnails**: alongside a video/audio download an
/// image is the thumbnail sidecar, but an **image-only** download (a photo or a
/// carousel of photos) has no video/audio — there the images *are* the media
/// (P13b-3), so they become image library items rather than being discarded.
DownloadOutputs classifyDownloadOutputs(Iterable<File> files) {
final media = <File>[];
File? thumb;
final media = <File>[]; // video / audio
final images = <File>[]; // image files — thumbnail(s) or the media itself
File? info;
for (final f in files) {
final lower = f.path.toLowerCase();
Expand All @@ -23,12 +29,39 @@ DownloadOutputs classifyDownloadOutputs(Iterable<File> files) {
// Other yt-dlp sidecars (e.g. live chat) — ignore.
} else if (_subtitleExts.contains(ext) || ext.startsWith('srv')) {
// Subtitle sidecars — not media.
} else if (_thumbExts.contains(ext)) {
thumb ??= f;
} else if (mediaTypeForExt(ext) == 'image') {
images.add(f);
} else {
media.add(f);
}
}
media.sort((a, b) => a.path.compareTo(b.path));
return (media: media, thumb: thumb, info: info);
// Video/audio present → images are thumbnail sidecars (keep the first).
if (media.isNotEmpty) {
images.sort((a, b) => a.path.compareTo(b.path));
return (
media: media,
thumb: images.isEmpty ? null : images.first,
info: info,
);
}
// Image download → the image is the media. A carousel expands to one task
// (folder) per photo, so multiple images here means the photo PLUS yt-dlp's
// `--write-thumbnail` sidecar — keep the largest as the photo and the next as
// its thumbnail (rather than minting a duplicate item).
if (images.length <= 1) {
return (media: images, thumb: null, info: info);
}
images.sort((a, b) => _sizeOf(b).compareTo(_sizeOf(a)));
return (media: [images.first], thumb: images[1], info: info);
}

/// File size in bytes, or 0 when it can't be read (e.g. a missing path in a
/// unit test) — used only to pick the largest image as the media.
int _sizeOf(File f) {
try {
return f.lengthSync();
} on FileSystemException {
return 0;
}
}
63 changes: 62 additions & 1 deletion lib/features/queue/presentation/queue_controller.dart
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ import 'package:drift/drift.dart' show Value;
import 'package:flutter/widgets.dart' show AppLifecycleState;
import 'package:flutter_riverpod/flutter_riverpod.dart';
import 'package:grabbit/core/ai/generation_provider.dart';
import 'package:grabbit/core/ai/ocr_provider.dart';
import 'package:grabbit/core/ai/transcription_provider.dart';
import 'package:grabbit/core/db/database.dart';
import 'package:grabbit/core/db/database_provider.dart';
Expand All @@ -26,6 +27,7 @@ import 'package:grabbit/features/library/data/library_repository.dart';
import 'package:grabbit/features/library/data/metadata_repository.dart';
import 'package:grabbit/features/library/data/transcript_service.dart';
import 'package:grabbit/features/library/presentation/ai_summary.dart';
import 'package:grabbit/features/library/presentation/ocr.dart';
import 'package:grabbit/features/notifications/data/notification_enums.dart';
import 'package:grabbit/features/notifications/data/notifications_repository.dart';
import 'package:grabbit/features/notifications/data/system_notification_service.dart';
Expand All @@ -51,6 +53,8 @@ typedef _PersistResult = ({
// in but the generation model isn't downloaded → prompt to finish setup.
int summaryCount,
bool summaryNeedsModel,
// P13b-3: count of image items auto-scanned for text (OCR).
int ocrCount,
});

class QueueConfig {
Expand Down Expand Up @@ -418,6 +422,20 @@ class QueueController extends _$QueueController {
dedupeKey: 'summary_needs_model',
);
}
// P13b-3: auto-OCR found text in a downloaded image (now searchable).
if (result.ocrCount > 0) {
await center.post(
category: NotificationCategory.ai,
severity: NotificationSeverity.success,
title: queued.title,
body: result.ocrCount > 1
? 'Text found in ${result.ocrCount} images'
: 'Text found in image',
targetRoute: route,
itemId: single ? result.primaryId : null,
dedupeKey: 'ocr_$id',
);
}
await _maybeNotifyOs(
taskId: id,
title: queued.title,
Expand Down Expand Up @@ -578,6 +596,7 @@ class QueueController extends _$QueueController {
transcriptionNeedsModel: false,
summaryCount: 0,
summaryNeedsModel: false,
ocrCount: 0,
);
// Files land in a per-task subfolder (see YtDlpHost `-o`): the task id names
// the folder, the user's template names the file inside it.
Expand Down Expand Up @@ -634,7 +653,9 @@ class QueueController extends _$QueueController {
type: type,
createdAt: DateTime.now(),
storageState: 'private',
durationSec: Value(single ? queued.durationSec : null),
durationSec: Value(
single && type != 'image' ? queued.durationSec : null,
),
sizeBytes: Value(await mediaFile.length()),
thumbPath: Value(outputs.thumb?.path),
width: Value(width),
Expand Down Expand Up @@ -684,6 +705,10 @@ class QueueController extends _$QueueController {
: null;
final whisperReady = whisper != null && await whisper.ensureReady();
for (final (i, mediaFile) in outputs.media.indexed) {
// Images have no audio to transcribe — skip (avoids a wasted whisper
// transcode of a photo).
final ext = mediaFile.path.split('.').last.toLowerCase();
if (mediaTypeForExt(ext) == 'image') continue;
final itemId = single ? id : '${id}__$i';
final timed = await transcripts.extractTimed(
mediaFile.path,
Expand Down Expand Up @@ -773,13 +798,49 @@ class QueueController extends _$QueueController {
}
}

// P13b-3: auto-scan freshly downloaded images for text (OCR) when opted in,
// so they become searchable. On-device + offline (bundled ML Kit, no
// download); images only; skips ones already scanned.
var ocrCount = 0;
if (settings.autoOcrOnDownload) {
final ocr = ref.read(ocrEngineProvider);
final metadata = ref.read(metadataRepositoryProvider);
for (final (i, mediaFile) in outputs.media.indexed) {
final itemId = single ? id : '${id}__$i';
final ext = mediaFile.path.split('.').last.toLowerCase();
final isImage =
!queued.request.audioOnly && mediaTypeForExt(ext) == 'image';
final meta = await (db.select(
db.mediaMetadata,
)..where((m) => m.itemId.equals(itemId))).getSingleOrNull();
if (!shouldAutoOcr(
enabled: settings.autoOcrOnDownload,
engineAvailable: ocr.isAvailable,
isImage: isImage,
alreadyScanned: meta?.ocrText?.trim().isNotEmpty ?? false,
)) {
continue;
}
try {
final text = (await ocr.recognizeText(mediaFile.path)).trim();
if (text.isNotEmpty) {
await metadata.updateOcrText(itemId, text);
ocrCount++;
}
} catch (_) {
// A per-item OCR failure must not fail the download.
}
}
}

return (
primaryId: single ? id : '${id}__0',
itemCount: outputs.media.length,
transcriptCount: transcriptCount,
transcriptionNeedsModel: transcriptionNeedsModel,
summaryCount: summaryCount,
summaryNeedsModel: summaryNeedsModel,
ocrCount: ocrCount,
);
}

Expand Down
5 changes: 5 additions & 0 deletions lib/features/settings/data/settings_model.dart
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,11 @@ abstract class SettingsModel with _$SettingsModel {
// fetch — mirrors `autoTranscribe`). The on-demand summary on item detail
// (P13a) works regardless.
@Default(false) bool autoSummarizeOnDownload,
// P13b-3: auto-extract text (OCR) from a newly downloaded image in the
// background, so it becomes searchable. Opt-in (defaults off); runs only on
// images, on-device + offline (bundled ML Kit, no download). The on-demand
// "Scan text" on item detail (P13b-1) works regardless.
@Default(false) bool autoOcrOnDownload,
// On-device speech transcription (P12e). Opt-in (defaults off); the whisper
// model is downloaded only when the user enables it + picks a model.
// `selectedTranscriptionModelId` empty = the device-tier recommendation;
Expand Down
47 changes: 47 additions & 0 deletions lib/features/settings/presentation/ai_settings_screen.dart
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ import 'package:grabbit/core/ai/inference_error.dart';
import 'package:grabbit/core/ai/model_capability_matrix.dart';
import 'package:grabbit/core/ai/model_catalog.dart';
import 'package:grabbit/core/ai/model_download_service.dart';
import 'package:grabbit/core/ai/ocr_provider.dart';
import 'package:grabbit/core/ai/transcription_model.dart';
import 'package:grabbit/core/ai/transcription_provider.dart';
import 'package:grabbit/core/device/device_profile.dart';
Expand Down Expand Up @@ -70,11 +71,57 @@ class AiSettingsScreen extends ConsumerWidget {
),
const _GenerationCard(),
const _TranscriptionCard(),
const _OcrCard(),
],
);
}
}

/// On-device image OCR (P13b-3). Image text is always scannable by hand from an
/// image's detail screen (P13b-1); this card just offers the opt-in to do it
/// automatically on download. Shown only where ML Kit OCR can run (Android).
class _OcrCard extends ConsumerWidget {
const _OcrCard();

@override
Widget build(BuildContext context, WidgetRef ref) {
if (!ref.watch(ocrEngineProvider).isAvailable) {
return const SizedBox.shrink();
}
final auto = ref.watch(
settingsControllerProvider.select(
(s) => s.value?.autoOcrOnDownload ?? false,
),
);
return Padding(
padding: const EdgeInsets.only(top: 8),
child: SettingsCard(
children: [
SwitchListTile(
secondary: const InfoHintButton(
InfoHint(
title: 'Auto-scan images for text',
body:
'Automatically read text inside each downloaded image so '
'you can search for it — all on-device and offline. You can '
'always scan an image by hand from its detail screen.',
),
),
title: const Text('Image text (OCR)'),
subtitle: const Text(
'Scan new image downloads for searchable text',
),
value: auto,
onChanged: (v) => ref
.read(settingsControllerProvider.notifier)
.setAutoOcrOnDownload(v),
),
],
),
);
}
}

/// Compact banner framing the AI screen with the device's capability tier (P12g)
/// — so a user understands *why* some AI options are offered or gated. Reads the
/// live tier (probed at startup); the InfoHint explains on-device scaling.
Expand Down
4 changes: 4 additions & 0 deletions lib/features/settings/presentation/settings_controller.dart
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,10 @@ class SettingsController extends _$SettingsController {
Future<void> setAutoSummarizeOnDownload(bool value) async =>
_update((await future).copyWith(autoSummarizeOnDownload: value));

/// Auto-scan newly downloaded images for text (OCR) in the background (P13b-3).
Future<void> setAutoOcrOnDownload(bool value) async =>
_update((await future).copyWith(autoOcrOnDownload: value));

/// On-device transcription opt-in (P12e).
Future<void> setTranscriptionEnabled(bool value) async =>
_update((await future).copyWith(transcriptionEnabled: value));
Expand Down
Loading
Loading