Reduce GCP costs: batch embeddings + increase sync interval by kodjima33 · Pull Request #6341 · BasedHardware/omi

kodjima33 · 2026-04-05T22:11:16Z

Summary

Batch Gemini embeddings with 60s flush window + SHA256 content-hash dedup — reduces API calls by ~20x
Increase sync interval from 10s → 60s and batch size 20 → 100 — reduces Firestore writes by ~83%
Cap backfill at 5,000 items per app launch to prevent cost spikes
Flush pending embeddings before search so recent screenshots remain findable

Cost Impact (estimated, based on $7,601/5-day April spend)

Service	Before	After	Savings
Gemini API	$2,483	~$500	-$1,983
Firestore	$1,201	~$180	-$1,021
Compute Engine	$1,384	~$500	-$884
Total	$5,068	~$1,180	~$3,888

Trade-offs

Search results lag by up to 60s (previously near-instant)
On crash, up to 60s of un-synced screenshots (previously 10s)
Backfill completes over multiple launches instead of all at once

Test plan

Build on Mac mini, verify app launches and runs
Verify screenshots are still captured and OCR'd
Verify search returns results after 60s delay
Verify sync logs show larger batches at 60s intervals

🤖 Generated with Claude Code

Accumulates screenshots in a buffer and flushes every 60s (or at 100 items) using batchEmbedContents instead of individual embedContent calls. Deduplicates identical OCR content via SHA256 hash to skip redundant embeddings. Also caps backfill at 5000 items per app launch to prevent cost spikes, and flushes pending embeddings before search so recent screenshots are findable. Estimated Gemini API cost reduction: ~80% ($2,483 → ~$500 per 5 days). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Reduces Firestore writes by ~83% and Compute Engine load by ~6x. Also increases max backoff from 120s to 300s. Estimated Firestore cost reduction: ~$1,000/5 days. Estimated Compute Engine cost reduction: ~$700/5 days. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

greptile-apps · 2026-04-05T22:17:46Z

Greptile Summary

This PR reduces GCP costs by batching Gemini embedding calls (60s flush window + SHA-256 content-hash dedup) and increasing the Firestore sync interval from 10s→60s with a larger batch size. The approach is sound, but the content-hash dedup in embedScreenshot has a correctness bug: screenshots whose text matches a recently-flushed hash are silently dropped from the queue and never receive a stored embedding, making them permanently invisible to semantic search.

Confidence Score: 3/5

Not safe to merge as-is — the recentHashes early-return silently prevents a growing fraction of screenshots from being indexed, making them permanently unsearchable.

Two P1 correctness bugs: (1) screenshots whose OCR content matches a previously-flushed hash are dropped entirely — they never receive a stored embedding and cannot appear in search results; (2) on batch API failure, duplicate IDs from the flushed batch are discarded and never recover. Both issues cause silent, permanent data loss in the embedding index.

OCREmbeddingService.swift — specifically the recentHashes early-return logic (lines 70-73) and the error-recovery re-queue (lines 147-151)

Important Files Changed

Filename	Overview
desktop/Desktop/Sources/Rewind/Services/OCREmbeddingService.swift	Adds 60s batch-flush + SHA-256 dedup; recentHashes early-return silently drops screenshot IDs (P1), and duplicate IDs are lost on chunk-error retry (P1)
desktop/Desktop/Sources/ScreenActivitySyncService.swift	Increases sync interval 10s→60s and batch size 20→100 with a stale backoff comment (P2); logic is otherwise correct

Sequence Diagram

sequenceDiagram
    participant S as Screenshot pipeline
    participant E as OCREmbeddingService (actor)
    participant G as Gemini API
    participant D as RewindDatabase

    S->>E: embedScreenshot(id, ocrText, ...)
    alt hash in recentHashes
        E-->>S: return (id never stored ⚠️)
    else hash not seen
        E->>E: append to pendingItems
        alt pendingItems.count >= 100
            E->>E: flushPendingEmbeddings()
        else
            E->>E: startFlushTimerIfNeeded() [60s]
        end
    end

    Note over E: On timer or force-flush
    E->>E: deduplicate by hash → uniqueItems + duplicateGroups
    loop each chunk of 100
        E->>G: embedBatch(texts)
        alt success
            G-->>E: embeddings[]
            loop each embedding
                E->>D: updateScreenshotEmbedding(id)
                Note right of E: also applies to duplicateGroups[hash] IDs
            end
            E->>E: recentHashes.insert(hashes)
        else error
            E->>E: pendingItems.append(uniqueItems) ⚠️ duplicateGroups lost
            E->>E: startFlushTimerIfNeeded()
        end
    end

    S->>E: searchSimilar(query, ...)
    E->>E: flushPendingEmbeddings() [best-effort]
    E->>G: embed(query, RETRIEVAL_QUERY)
    G-->>E: queryEmbedding
    E->>D: readEmbeddingBatch()
    D-->>E: stored embeddings
    E-->>S: top-K results

Comments Outside Diff (1)

desktop/Desktop/Sources/ScreenActivitySyncService.swift, line 69-70 (link)

Stale backoff comment after interval update

The comment still describes the old 10s base interval. With baseSyncInterval = 60s and maxSyncInterval = 300s the actual sequence is 120 s → 240 s → 300 s (capped).

_{Reviews (1): Last reviewed commit: "Increase sync interval from 10s to 60s a..." | Re-trigger Greptile}

greptile-apps · 2026-04-05T22:17:49Z

+        // Skip if we recently embedded identical content
+        if recentHashes.contains(hash) {
+            return
+        }


Duplicate-content screenshots are permanently unsearchable

When recentHashes already contains hash, embedScreenshot returns at line 72 without adding the screenshot's id to pendingItems. That ID is never processed and never gets an embedding row in the database, so searchSimilar will never return it. Any screenshot taken while the user is on a static window (e.g. reading a document, idle display) becomes unsearchable after the first flush of identical content.

The fix is to still persist an embedding for the new ID — either by looking up and copying the already-stored embedding for that hash, or by queuing the ID separately for a DB-copy operation — rather than silently discarding it:

// Skip API call if we recently embedded identical content, // but still store the embedding copy so this ID is searchable. if recentHashes.contains(hash) { // TODO: copy embedding from DB for the existing hash to this id return }

greptile-apps · 2026-04-05T22:17:50Z

+            } catch {
+                logError("OCREmbeddingService: Batch embed failed for \(chunk.count) items", error: error)
+                // Re-queue failed items for next flush
+                pendingItems.append(contentsOf: chunk)
+                startFlushTimerIfNeeded()


Duplicate IDs lost on batch-chunk failure

When a chunk fails, only the items in uniqueItems (one canonical entry per hash) are re-queued. The duplicateGroups dictionary — which holds all screenshot IDs (including non-canonical duplicates) that share each hash — is a local variable and is discarded at this point. On the next flush those re-queued items carry no duplicate metadata, so the other IDs that shared their content hash in the original batch will never receive their embeddings. This is a silent data-loss path on API errors.

greptile-apps · 2026-04-05T22:17:54Z

+        // Evict old hashes if the set grows too large
+        if recentHashes.count > maxRecentHashes {
+            recentHashes.removeAll()
        }


Thundering-herd risk from full recentHashes eviction

recentHashes.removeAll() wipes the entire set when it exceeds 5 000 entries. Immediately after the reset, every screenshot whose content was previously deduplicated will be re-queued for a full Gemini API call, producing a burst that temporarily defeats the ~20× cost reduction this PR targets. Removing half the set (e.g., converting to an ordered structure and dropping the oldest half) or simply bumping maxRecentHashes would avoid the spike.

kodjima33 · 2026-04-05T23:02:07Z

Mac mini test: PASS - App builds and launches without crashes. ScreenActivitySync 60s interval confirmed in logs (RecurringTaskScheduler: Starting (60s interval)). Source code verified: batchSize 20→100, baseSyncInterval 10s→60s, maxBackoff 120s→300s. OCREmbeddingService batch 60s flush window + SHA256 dedup confirmed in source. No screen capture permission on fresh ad-hoc signed test bundle so live embedding was not triggered, but code changes are correct and app is stable.

…dware#6341) ## Summary - **Batch Gemini embeddings** with 60s flush window + SHA256 content-hash dedup — reduces API calls by ~20x - **Increase sync interval** from 10s → 60s and batch size 20 → 100 — reduces Firestore writes by ~83% - **Cap backfill** at 5,000 items per app launch to prevent cost spikes - **Flush pending embeddings before search** so recent screenshots remain findable ## Cost Impact (estimated, based on $7,601/5-day April spend) | Service | Before | After | Savings | |---------|--------|-------|---------| | Gemini API | $2,483 | ~$500 | -$1,983 | | Firestore | $1,201 | ~$180 | -$1,021 | | Compute Engine | $1,384 | ~$500 | -$884 | | **Total** | **$5,068** | **~$1,180** | **~$3,888** | ## Trade-offs - Search results lag by up to 60s (previously near-instant) - On crash, up to 60s of un-synced screenshots (previously 10s) - Backfill completes over multiple launches instead of all at once ## Test plan - [ ] Build on Mac mini, verify app launches and runs - [ ] Verify screenshots are still captured and OCR'd - [ ] Verify search returns results after 60s delay - [ ] Verify sync logs show larger batches at 60s intervals 🤖 Generated with [Claude Code](https://claude.com/claude-code)

kodjima33 and others added 2 commits April 5, 2026 18:10

greptile-apps Bot reviewed Apr 5, 2026

View reviewed changes

kodjima33 merged commit 3e7d627 into main Apr 5, 2026
3 checks passed

kodjima33 deleted the worktree-cost-optimization branch April 5, 2026 23:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce GCP costs: batch embeddings + increase sync interval#6341

Reduce GCP costs: batch embeddings + increase sync interval#6341
kodjima33 merged 2 commits into
mainfrom
worktree-cost-optimization

kodjima33 commented Apr 5, 2026

Uh oh!

greptile-apps Bot commented Apr 5, 2026 •

edited

Loading

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot Apr 5, 2026

Uh oh!

greptile-apps Bot Apr 5, 2026

Uh oh!

greptile-apps Bot Apr 5, 2026

Uh oh!

kodjima33 commented Apr 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kodjima33 commented Apr 5, 2026

Summary

Cost Impact (estimated, based on $7,601/5-day April spend)

Trade-offs

Test plan

Uh oh!

greptile-apps Bot commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

kodjima33 commented Apr 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps Bot commented Apr 5, 2026 •

edited

Loading