Add native OCR to screenshot editor by richiemcilroy · Pull Request #1799 · CapSoftware/Cap

richiemcilroy · 2026-05-11T19:37:41Z

Adds native OCR to the screenshot editor using macOS Vision and Windows Media OCR.
Selected regions are cropped from the source image and processed off the UI path before copying recognized text.
Validated with Rust, Biome, diff checks, and a Windows OCR API compile check.

Greptile Summary

This PR introduces native OCR to the screenshot editor, automatically running macOS Vision or Windows Media OCR on the full visible image region and overlaying transparent, selectable text spans so users can copy text directly from screenshots. It also ships annotation copy/paste (Cmd+C/V with cascading 16 px offsets), pinch-to-zoom gesture support, a calculateImageTransform refactor that correctly propagates aspectRatio through the layout pipeline, and size-aware CameraIssueOverlay improvements.

OCR backend (screenshot_editor.rs): crops the RGBA source buffer to BGRA in a checked, row-by-row loop, then dispatches off-thread to Vision (cidre::vn) on macOS or Windows.Media.Ocr on Windows; WinRT is initialised/uninitialised per call via a RAII guard; the line-bounds returned by each engine are offset back into full-image coordinates before serialisation.
OCR frontend (OcrSelectionOverlay.tsx): reacts to crop changes via a memoised region key, discards stale responses with an incrementing requestId, and renders each recognised line as an absolutely-positioned transparent <span> scaled to fit its bounding box; pointer-events are enabled only in \"select\" tool mode.
Layout refactor (layout.ts): calculateImageTransform / getImageRect now accept an AspectRatio | null parameter; an early-exit guard for zero-size inputs and two separate auto vs. fixed-aspect paths were added; all call sites in context.tsx and Preview.tsx were updated.

Confidence Score: 4/5

Safe to merge on macOS; the Windows OCR path has a known alpha-mode mismatch (flagged in a prior review thread) that may degrade OCR quality on screenshots with semi-transparent pixels.

The Rust OCR implementation, TypeScript overlay component, layout refactor, and copy/paste additions are all structurally sound. The one outstanding concern is on the Windows path in screenshot_editor.rs: the pixel data is straight (non-premultiplied) RGBA converted to BGRA, but SoftwareBitmap is told it is premultiplied, which can cause the OCR engine to mis-interpret colour values for semi-transparent pixels. This was raised in a prior review and is still unaddressed in the current code.

apps/desktop/src-tauri/src/screenshot_editor.rs — specifically the SoftwareBitmap alpha mode used in the Windows OCR path.

Important Files Changed

Filename	Overview
apps/desktop/src-tauri/src/screenshot_editor.rs	Adds ~460 lines of OCR logic: RGBA→BGRA crop extraction, macOS Vision dispatch via cidre, and Windows Media OCR dispatch; the BitmapAlphaMode::Premultiplied mismatch on the Windows path (previously flagged) is still present
apps/desktop/src/routes/screenshot-editor/OcrSelectionOverlay.tsx	New component; automatically runs OCR on the full visible region on mount and on every crop change, overlaying transparent selectable text spans; uses raw invoke instead of the generated typed binding (previously flagged)
apps/desktop/src/routes/screenshot-editor/layout.ts	Significant refactor of calculateImageTransform/getImageRect to add an aspectRatio parameter; guards for zero-size inputs, correct even-rounding helpers, and separate auto vs. fixed-aspect layout paths look correct
apps/desktop/src/routes/screenshot-editor/Editor.tsx	Adds annotation copy/paste (Cmd+C/V) with per-paste 16px offset; correctly gates clipboard copy on annotation state and lets native browser copy fall through when OCR text is selected
apps/desktop/src/routes/screenshot-editor/Preview.tsx	Adds pinch-to-zoom gesture support via gesturestart/change/end events, normalised wheel delta handling, and a pan-drag capture overlay; mounts OcrSelectionOverlay; moves wheel listener to an imperative addEventListener to support passive:false
apps/desktop/src/routes/camera.tsx	Refactors CameraIssueOverlay to accept size-aware text metrics, optional top offset and border-radius; moves the overlay inside the camera frame for LegacyCameraPreviewPage; extracts magic number 56 into CAMERA_TOOLBAR_HEIGHT constant
apps/desktop/src/routes/screenshot-editor/AnnotationLayer.tsx	Adds e.preventDefault() and selection clearing on drag start; flips SVG container to pointer-events:none in select mode (without an active drag) so OCR text spans underneath can be selected; raises z-index from 10 to 20

Comments Outside Diff (1)

apps/desktop/src/routes/screenshot-editor/OcrSelectionOverlay.tsx, line 777-780 (link)

Raw invoke used instead of the generated type-safe wrapper. recognize_screenshot_text is registered with #[specta::specta] and added to the invoke handler, so the binding generator should produce a typed commands.recognizeScreenshotText(...). Using raw invoke means ScreenshotOcrResult is duplicated locally in the component — any signature change on the Rust side won't be caught at compile time. The same file already imports commands from ~/utils/tauri for writeClipboardString; the bindings need to be regenerated to include the new command and then used here.

Prompt To Fix With AI

This is a comment left during a code review.
Path: apps/desktop/src/routes/screenshot-editor/OcrSelectionOverlay.tsx
Line: 777-780

Comment:
Raw `invoke` used instead of the generated type-safe wrapper. `recognize_screenshot_text` is registered with `#[specta::specta]` and added to the invoke handler, so the binding generator should produce a typed `commands.recognizeScreenshotText(...)`. Using raw `invoke` means `ScreenshotOcrResult` is duplicated locally in the component — any signature change on the Rust side won't be caught at compile time. The same file already imports `commands` from `~/utils/tauri` for `writeClipboardString`; the bindings need to be regenerated to include the new command and then used here.

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (2): Last reviewed commit: "feat(camera): improve preview issue over..." | Re-trigger Greptile}

greptile-apps · 2026-05-11T19:42:39Z

+    let bitmap = SoftwareBitmap::CreateCopyWithAlphaFromBuffer(
+        &buffer,
+        BitmapPixelFormat::Bgra8,
+        width,
+        height,
+        BitmapAlphaMode::Premultiplied,
+    )


The BGRA pixel data copied from the source is straight (non-premultiplied) alpha, but BitmapAlphaMode::Premultiplied tells Windows the RGB channels have already been multiplied by alpha. If the screenshot contains any semi-transparent pixels, the OCR engine may internally un-premultiply the RGB values (dividing by alpha), producing inflated and incorrect colour values. BitmapAlphaMode::Straight (or BitmapAlphaMode::Ignore since OCR doesn't need transparency) is the correct choice.

Suggested change

let bitmap = SoftwareBitmap::CreateCopyWithAlphaFromBuffer(

&buffer,

BitmapPixelFormat::Bgra8,

width,

height,

BitmapAlphaMode::Premultiplied,

)

let bitmap = SoftwareBitmap::CreateCopyWithAlphaFromBuffer(

&buffer,

BitmapPixelFormat::Bgra8,

width,

height,

BitmapAlphaMode::Straight,

)

Prompt To Fix With AI

This is a comment left during a code review. Path: apps/desktop/src-tauri/src/screenshot_editor.rs Line: 1099-1105 Comment: The BGRA pixel data copied from the source is straight (non-premultiplied) alpha, but `BitmapAlphaMode::Premultiplied` tells Windows the RGB channels have already been multiplied by alpha. If the screenshot contains any semi-transparent pixels, the OCR engine may internally un-premultiply the RGB values (dividing by alpha), producing inflated and incorrect colour values. `BitmapAlphaMode::Straight` (or `BitmapAlphaMode::Ignore` since OCR doesn't need transparency) is the correct choice. ```suggestion let bitmap = SoftwareBitmap::CreateCopyWithAlphaFromBuffer( &buffer, BitmapPixelFormat::Bgra8, width, height, BitmapAlphaMode::Straight, ) ``` How can I resolve this? If you propose a fix, please make it concise.

richiemcilroy · 2026-05-12T11:47:54Z

please re-review the pr @greptileai. we re-did it so that the ocr is now fully selectable/clickable on the actual screenshot.

tembo · 2026-05-12T11:49:55Z

+	createEffect(() => {
+		const key = sourceRegionKey();
+		const region = sourceRegion();
+		requestId += 1;
+		const currentRequestId = requestId;
+
+		if (!key || !region) {
+			setOcrResult(null);
+			return;
+		}


OCR currently runs every time sourceRegionKey() changes even when the user is in a drawing tool. If you only need selectable OCR text in "select" mode, gating the effect avoids a bunch of redundant native OCR work (especially while the crop is being dragged).

Suggested change

createEffect(() => {

const key = sourceRegionKey();

const region = sourceRegion();

requestId += 1;

const currentRequestId = requestId;

if (!key || !region) {

setOcrResult(null);

return;

}

createEffect(() => {

const tool = activeTool();

const key = sourceRegionKey();

const region = sourceRegion();

requestId += 1;

const currentRequestId = requestId;

if (tool !== "select") {

setOcrResult(null);

return;

}

if (!key || !region) {

setOcrResult(null);

return;

}

tembo · 2026-05-12T11:49:56Z

+
+		void (async () => {
+			try {
+				const result = await invoke<ScreenshotOcrResult>(


Minor maintainability thing: since recognize_screenshot_text is a new specta command, it would be nice to avoid raw invoke + duplicated ScreenshotOcr* types here. If bindings can be regenerated, prefer going through the generated commands API (and generated types); otherwise consider extracting a small wrapper (and these types) into a non-generated module so signature/type changes don’t silently drift.

feat: add native OCR to screenshot editor

d1cfa16

richiemcilroy marked this pull request as ready for review May 11, 2026 19:38

brin-security-scanner Bot added contributor:verified Contributor passed trust analysis. pr:verified PR passed security analysis. labels May 11, 2026

greptile-apps Bot reviewed May 11, 2026

View reviewed changes

richiemcilroy added 9 commits May 12, 2026 12:43

feat(screenshot-editor): add aspect-ratio-aware image layout math

0100bb6

feat(screenshot-editor): propagate aspect ratio in editor transforms

734d815

feat(screenshot-editor): full-crop OCR with selectable overlay

db7daa0

fix(screenshot-editor): remove copy-text toolbar tool

43c9ead

fix(screenshot-editor): tighten annotation SVG hit targeting

b59e273

feat(screenshot-editor): add annotation clipboard shortcuts

7013afe

fix(screenshot-editor): respect annotation copy keyboard handling

1430623

feat(screenshot-editor): gesture zoom and preview layer stacking

da757ab

feat(camera): improve preview issue overlay and toolbar height

bc70b22

tembo Bot reviewed May 12, 2026

View reviewed changes

richiemcilroy added 3 commits May 12, 2026 13:01

fmt

7173a3d

fmt

c10b5f3

fmt

063f6ac

richiemcilroy merged commit eb8c22e into main May 12, 2026
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add native OCR to screenshot editor#1799

Add native OCR to screenshot editor#1799
richiemcilroy merged 13 commits into
mainfrom
codex/screenshot-editor-ocr

richiemcilroy commented May 11, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

greptile-apps Bot May 11, 2026

Uh oh!

richiemcilroy commented May 12, 2026

Uh oh!

tembo Bot May 12, 2026

Uh oh!

tembo Bot May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

richiemcilroy commented May 11, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

richiemcilroy commented May 12, 2026

Uh oh!

tembo Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

tembo Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

richiemcilroy commented May 11, 2026 •

edited by greptile-apps Bot

Loading