Reduce GCP costs: batch embeddings + increase sync interval#6341
Conversation
Accumulates screenshots in a buffer and flushes every 60s (or at 100 items) using batchEmbedContents instead of individual embedContent calls. Deduplicates identical OCR content via SHA256 hash to skip redundant embeddings. Also caps backfill at 5000 items per app launch to prevent cost spikes, and flushes pending embeddings before search so recent screenshots are findable. Estimated Gemini API cost reduction: ~80% ($2,483 → ~$500 per 5 days). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reduces Firestore writes by ~83% and Compute Engine load by ~6x. Also increases max backoff from 120s to 300s. Estimated Firestore cost reduction: ~$1,000/5 days. Estimated Compute Engine cost reduction: ~$700/5 days. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Greptile SummaryThis PR reduces GCP costs by batching Gemini embedding calls (60s flush window + SHA-256 content-hash dedup) and increasing the Firestore sync interval from 10s→60s with a larger batch size. The approach is sound, but the content-hash dedup in Confidence Score: 3/5Not safe to merge as-is — the recentHashes early-return silently prevents a growing fraction of screenshots from being indexed, making them permanently unsearchable. Two P1 correctness bugs: (1) screenshots whose OCR content matches a previously-flushed hash are dropped entirely — they never receive a stored embedding and cannot appear in search results; (2) on batch API failure, duplicate IDs from the flushed batch are discarded and never recover. Both issues cause silent, permanent data loss in the embedding index. OCREmbeddingService.swift — specifically the recentHashes early-return logic (lines 70-73) and the error-recovery re-queue (lines 147-151) Important Files Changed
Sequence DiagramsequenceDiagram
participant S as Screenshot pipeline
participant E as OCREmbeddingService (actor)
participant G as Gemini API
participant D as RewindDatabase
S->>E: embedScreenshot(id, ocrText, ...)
alt hash in recentHashes
E-->>S: return (id never stored ⚠️)
else hash not seen
E->>E: append to pendingItems
alt pendingItems.count >= 100
E->>E: flushPendingEmbeddings()
else
E->>E: startFlushTimerIfNeeded() [60s]
end
end
Note over E: On timer or force-flush
E->>E: deduplicate by hash → uniqueItems + duplicateGroups
loop each chunk of 100
E->>G: embedBatch(texts)
alt success
G-->>E: embeddings[]
loop each embedding
E->>D: updateScreenshotEmbedding(id)
Note right of E: also applies to duplicateGroups[hash] IDs
end
E->>E: recentHashes.insert(hashes)
else error
E->>E: pendingItems.append(uniqueItems) ⚠️ duplicateGroups lost
E->>E: startFlushTimerIfNeeded()
end
end
S->>E: searchSimilar(query, ...)
E->>E: flushPendingEmbeddings() [best-effort]
E->>G: embed(query, RETRIEVAL_QUERY)
G-->>E: queryEmbedding
E->>D: readEmbeddingBatch()
D-->>E: stored embeddings
E-->>S: top-K results
|
| // Skip if we recently embedded identical content | ||
| if recentHashes.contains(hash) { | ||
| return | ||
| } |
There was a problem hiding this comment.
Duplicate-content screenshots are permanently unsearchable
When recentHashes already contains hash, embedScreenshot returns at line 72 without adding the screenshot's id to pendingItems. That ID is never processed and never gets an embedding row in the database, so searchSimilar will never return it. Any screenshot taken while the user is on a static window (e.g. reading a document, idle display) becomes unsearchable after the first flush of identical content.
The fix is to still persist an embedding for the new ID — either by looking up and copying the already-stored embedding for that hash, or by queuing the ID separately for a DB-copy operation — rather than silently discarding it:
// Skip API call if we recently embedded identical content,
// but still store the embedding copy so this ID is searchable.
if recentHashes.contains(hash) {
// TODO: copy embedding from DB for the existing hash to this id
return
}| } catch { | ||
| logError("OCREmbeddingService: Batch embed failed for \(chunk.count) items", error: error) | ||
| // Re-queue failed items for next flush | ||
| pendingItems.append(contentsOf: chunk) | ||
| startFlushTimerIfNeeded() |
There was a problem hiding this comment.
Duplicate IDs lost on batch-chunk failure
When a chunk fails, only the items in uniqueItems (one canonical entry per hash) are re-queued. The duplicateGroups dictionary — which holds all screenshot IDs (including non-canonical duplicates) that share each hash — is a local variable and is discarded at this point. On the next flush those re-queued items carry no duplicate metadata, so the other IDs that shared their content hash in the original batch will never receive their embeddings. This is a silent data-loss path on API errors.
| // Evict old hashes if the set grows too large | ||
| if recentHashes.count > maxRecentHashes { | ||
| recentHashes.removeAll() | ||
| } |
There was a problem hiding this comment.
Thundering-herd risk from full
recentHashes eviction
recentHashes.removeAll() wipes the entire set when it exceeds 5 000 entries. Immediately after the reset, every screenshot whose content was previously deduplicated will be re-queued for a full Gemini API call, producing a burst that temporarily defeats the ~20× cost reduction this PR targets. Removing half the set (e.g., converting to an ordered structure and dropping the oldest half) or simply bumping maxRecentHashes would avoid the spike.
|
Mac mini test: PASS - App builds and launches without crashes. ScreenActivitySync 60s interval confirmed in logs ( |
…dware#6341) ## Summary - **Batch Gemini embeddings** with 60s flush window + SHA256 content-hash dedup — reduces API calls by ~20x - **Increase sync interval** from 10s → 60s and batch size 20 → 100 — reduces Firestore writes by ~83% - **Cap backfill** at 5,000 items per app launch to prevent cost spikes - **Flush pending embeddings before search** so recent screenshots remain findable ## Cost Impact (estimated, based on $7,601/5-day April spend) | Service | Before | After | Savings | |---------|--------|-------|---------| | Gemini API | $2,483 | ~$500 | -$1,983 | | Firestore | $1,201 | ~$180 | -$1,021 | | Compute Engine | $1,384 | ~$500 | -$884 | | **Total** | **$5,068** | **~$1,180** | **~$3,888** | ## Trade-offs - Search results lag by up to 60s (previously near-instant) - On crash, up to 60s of un-synced screenshots (previously 10s) - Backfill completes over multiple launches instead of all at once ## Test plan - [ ] Build on Mac mini, verify app launches and runs - [ ] Verify screenshots are still captured and OCR'd - [ ] Verify search returns results after 60s delay - [ ] Verify sync logs show larger batches at 60s intervals 🤖 Generated with [Claude Code](https://claude.com/claude-code)
Summary
Cost Impact (estimated, based on $7,601/5-day April spend)
Trade-offs
Test plan
🤖 Generated with Claude Code