Skip to content

Add native OCR to screenshot editor#1799

Merged
richiemcilroy merged 13 commits into
mainfrom
codex/screenshot-editor-ocr
May 12, 2026
Merged

Add native OCR to screenshot editor#1799
richiemcilroy merged 13 commits into
mainfrom
codex/screenshot-editor-ocr

Conversation

@richiemcilroy
Copy link
Copy Markdown
Member

@richiemcilroy richiemcilroy commented May 11, 2026

Adds native OCR to the screenshot editor using macOS Vision and Windows Media OCR.
Selected regions are cropped from the source image and processed off the UI path before copying recognized text.
Validated with Rust, Biome, diff checks, and a Windows OCR API compile check.

Greptile Summary

This PR introduces native OCR to the screenshot editor, automatically running macOS Vision or Windows Media OCR on the full visible image region and overlaying transparent, selectable text spans so users can copy text directly from screenshots. It also ships annotation copy/paste (Cmd+C/V with cascading 16 px offsets), pinch-to-zoom gesture support, a calculateImageTransform refactor that correctly propagates aspectRatio through the layout pipeline, and size-aware CameraIssueOverlay improvements.

  • OCR backend (screenshot_editor.rs): crops the RGBA source buffer to BGRA in a checked, row-by-row loop, then dispatches off-thread to Vision (cidre::vn) on macOS or Windows.Media.Ocr on Windows; WinRT is initialised/uninitialised per call via a RAII guard; the line-bounds returned by each engine are offset back into full-image coordinates before serialisation.
  • OCR frontend (OcrSelectionOverlay.tsx): reacts to crop changes via a memoised region key, discards stale responses with an incrementing requestId, and renders each recognised line as an absolutely-positioned transparent <span> scaled to fit its bounding box; pointer-events are enabled only in \"select\" tool mode.
  • Layout refactor (layout.ts): calculateImageTransform / getImageRect now accept an AspectRatio | null parameter; an early-exit guard for zero-size inputs and two separate auto vs. fixed-aspect paths were added; all call sites in context.tsx and Preview.tsx were updated.

Confidence Score: 4/5

Safe to merge on macOS; the Windows OCR path has a known alpha-mode mismatch (flagged in a prior review thread) that may degrade OCR quality on screenshots with semi-transparent pixels.

The Rust OCR implementation, TypeScript overlay component, layout refactor, and copy/paste additions are all structurally sound. The one outstanding concern is on the Windows path in screenshot_editor.rs: the pixel data is straight (non-premultiplied) RGBA converted to BGRA, but SoftwareBitmap is told it is premultiplied, which can cause the OCR engine to mis-interpret colour values for semi-transparent pixels. This was raised in a prior review and is still unaddressed in the current code.

apps/desktop/src-tauri/src/screenshot_editor.rs — specifically the SoftwareBitmap alpha mode used in the Windows OCR path.

Important Files Changed

Filename Overview
apps/desktop/src-tauri/src/screenshot_editor.rs Adds ~460 lines of OCR logic: RGBA→BGRA crop extraction, macOS Vision dispatch via cidre, and Windows Media OCR dispatch; the BitmapAlphaMode::Premultiplied mismatch on the Windows path (previously flagged) is still present
apps/desktop/src/routes/screenshot-editor/OcrSelectionOverlay.tsx New component; automatically runs OCR on the full visible region on mount and on every crop change, overlaying transparent selectable text spans; uses raw invoke instead of the generated typed binding (previously flagged)
apps/desktop/src/routes/screenshot-editor/layout.ts Significant refactor of calculateImageTransform/getImageRect to add an aspectRatio parameter; guards for zero-size inputs, correct even-rounding helpers, and separate auto vs. fixed-aspect layout paths look correct
apps/desktop/src/routes/screenshot-editor/Editor.tsx Adds annotation copy/paste (Cmd+C/V) with per-paste 16px offset; correctly gates clipboard copy on annotation state and lets native browser copy fall through when OCR text is selected
apps/desktop/src/routes/screenshot-editor/Preview.tsx Adds pinch-to-zoom gesture support via gesturestart/change/end events, normalised wheel delta handling, and a pan-drag capture overlay; mounts OcrSelectionOverlay; moves wheel listener to an imperative addEventListener to support passive:false
apps/desktop/src/routes/camera.tsx Refactors CameraIssueOverlay to accept size-aware text metrics, optional top offset and border-radius; moves the overlay inside the camera frame for LegacyCameraPreviewPage; extracts magic number 56 into CAMERA_TOOLBAR_HEIGHT constant
apps/desktop/src/routes/screenshot-editor/AnnotationLayer.tsx Adds e.preventDefault() and selection clearing on drag start; flips SVG container to pointer-events:none in select mode (without an active drag) so OCR text spans underneath can be selected; raises z-index from 10 to 20

Comments Outside Diff (1)

  1. apps/desktop/src/routes/screenshot-editor/OcrSelectionOverlay.tsx, line 777-780 (link)

    P2 Raw invoke used instead of the generated type-safe wrapper. recognize_screenshot_text is registered with #[specta::specta] and added to the invoke handler, so the binding generator should produce a typed commands.recognizeScreenshotText(...). Using raw invoke means ScreenshotOcrResult is duplicated locally in the component — any signature change on the Rust side won't be caught at compile time. The same file already imports commands from ~/utils/tauri for writeClipboardString; the bindings need to be regenerated to include the new command and then used here.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: apps/desktop/src/routes/screenshot-editor/OcrSelectionOverlay.tsx
    Line: 777-780
    
    Comment:
    Raw `invoke` used instead of the generated type-safe wrapper. `recognize_screenshot_text` is registered with `#[specta::specta]` and added to the invoke handler, so the binding generator should produce a typed `commands.recognizeScreenshotText(...)`. Using raw `invoke` means `ScreenshotOcrResult` is duplicated locally in the component — any signature change on the Rust side won't be caught at compile time. The same file already imports `commands` from `~/utils/tauri` for `writeClipboardString`; the bindings need to be regenerated to include the new command and then used here.
    
    How can I resolve this? If you propose a fix, please make it concise.

Reviews (2): Last reviewed commit: "feat(camera): improve preview issue over..." | Re-trigger Greptile

@richiemcilroy richiemcilroy marked this pull request as ready for review May 11, 2026 19:38
@brin-security-scanner brin-security-scanner Bot added contributor:verified Contributor passed trust analysis. pr:verified PR passed security analysis. labels May 11, 2026
Comment on lines +1099 to +1105
let bitmap = SoftwareBitmap::CreateCopyWithAlphaFromBuffer(
&buffer,
BitmapPixelFormat::Bgra8,
width,
height,
BitmapAlphaMode::Premultiplied,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The BGRA pixel data copied from the source is straight (non-premultiplied) alpha, but BitmapAlphaMode::Premultiplied tells Windows the RGB channels have already been multiplied by alpha. If the screenshot contains any semi-transparent pixels, the OCR engine may internally un-premultiply the RGB values (dividing by alpha), producing inflated and incorrect colour values. BitmapAlphaMode::Straight (or BitmapAlphaMode::Ignore since OCR doesn't need transparency) is the correct choice.

Suggested change
let bitmap = SoftwareBitmap::CreateCopyWithAlphaFromBuffer(
&buffer,
BitmapPixelFormat::Bgra8,
width,
height,
BitmapAlphaMode::Premultiplied,
)
let bitmap = SoftwareBitmap::CreateCopyWithAlphaFromBuffer(
&buffer,
BitmapPixelFormat::Bgra8,
width,
height,
BitmapAlphaMode::Straight,
)
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/desktop/src-tauri/src/screenshot_editor.rs
Line: 1099-1105

Comment:
The BGRA pixel data copied from the source is straight (non-premultiplied) alpha, but `BitmapAlphaMode::Premultiplied` tells Windows the RGB channels have already been multiplied by alpha. If the screenshot contains any semi-transparent pixels, the OCR engine may internally un-premultiply the RGB values (dividing by alpha), producing inflated and incorrect colour values. `BitmapAlphaMode::Straight` (or `BitmapAlphaMode::Ignore` since OCR doesn't need transparency) is the correct choice.

```suggestion
    let bitmap = SoftwareBitmap::CreateCopyWithAlphaFromBuffer(
        &buffer,
        BitmapPixelFormat::Bgra8,
        width,
        height,
        BitmapAlphaMode::Straight,
    )
```

How can I resolve this? If you propose a fix, please make it concise.

@richiemcilroy
Copy link
Copy Markdown
Member Author

please re-review the pr @greptileai. we re-did it so that the ocr is now fully selectable/clickable on the actual screenshot.

Comment on lines +86 to +95
createEffect(() => {
const key = sourceRegionKey();
const region = sourceRegion();
requestId += 1;
const currentRequestId = requestId;

if (!key || !region) {
setOcrResult(null);
return;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OCR currently runs every time sourceRegionKey() changes even when the user is in a drawing tool. If you only need selectable OCR text in "select" mode, gating the effect avoids a bunch of redundant native OCR work (especially while the crop is being dragged).

Suggested change
createEffect(() => {
const key = sourceRegionKey();
const region = sourceRegion();
requestId += 1;
const currentRequestId = requestId;
if (!key || !region) {
setOcrResult(null);
return;
}
createEffect(() => {
const tool = activeTool();
const key = sourceRegionKey();
const region = sourceRegion();
requestId += 1;
const currentRequestId = requestId;
if (tool !== "select") {
setOcrResult(null);
return;
}
if (!key || !region) {
setOcrResult(null);
return;
}


void (async () => {
try {
const result = await invoke<ScreenshotOcrResult>(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor maintainability thing: since recognize_screenshot_text is a new specta command, it would be nice to avoid raw invoke + duplicated ScreenshotOcr* types here. If bindings can be regenerated, prefer going through the generated commands API (and generated types); otherwise consider extracting a small wrapper (and these types) into a non-generated module so signature/type changes don’t silently drift.

@richiemcilroy richiemcilroy merged commit eb8c22e into main May 12, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor:verified Contributor passed trust analysis. pr:verified PR passed security analysis.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant