Add clipboard relevance filter and line-level distillation#210
Merged
Conversation
Clipboard content was blindly injected into every autocomplete prompt, even when the user had long moved on from whatever they copied. This adds a heuristic gate (ClipboardRelevanceFilter) that checks three signals before allowing clipboard into the prompt: 1. Staleness — drop if clipboard unchanged for >5 minutes 2. App affinity — always keep if copied in the same app 3. Token overlap — drop if clipboard shares no words with current text Zero latency, fully testable (injectable dateProvider), no model calls.
Even when the relevance filter keeps clipboard content, the raw text can contain noisy lines (email signatures, import blocks, boilerplate) that waste prompt tokens. ClipboardContentDistiller extracts only the lines whose tokens overlap with the user's prefix text, keeping the prompt focused on what's actually relevant to the current completion. Short clipboard (≤3 lines) passes through unchanged. When no individual line overlaps, the first 300 characters are kept as a head-biased fallback. Also extracts the shared tokenizer into PromptContextSanitizer so both ClipboardRelevanceFilter and ClipboardContentDistiller use the same logic.
83ad0ea to
9afea0c
Compare
The clipboard contained "func deployServer() {" but the prefix was
"the deploy is running". significantTokens splits only on non-alphanumeric
boundaries, so deployServer lowercases to a single "deployserver" token
that does not match "deploy". The line was dropped and the test expected
it to be kept. Rename to "func deploy() {" so the line genuinely shares
a token with the prefix.
Fix four review findings on the relevance filter and its wiring: - Drop the same-app affinity heuristic. We only ever observe the typing app, not the actual copier, so recording it as the source granted unconditional injection in apps where the user merely typed. The filter now relies on staleness and token overlap only. - Gate the staleness clock on a real baseline. lastKnownChangeCount starts as Optional. The first observation records the baseline without stamping a date, so pre-launch clipboard content stays out of the prompt until the user actually copies again while Cotabby is running. - Align the filter and the distiller on the same prefix window. The coordinator now computes truncatedPromptPrefix once and feeds it to both, preventing the gate from passing on tokens that get truncated before the distiller sees them — which previously caused the head-fallback to inject 300 chars of unrelated clipboard. - Replace the concrete ClipboardRelevanceFilter dependency on the coordinator with a ClipboardRelevanceFiltering protocol, matching the rest of the coordinator's collaborators. Tests: - Drop the app-affinity cases and add baseline-gating coverage so the first observation never injects and a subsequent change does.
…ilter # Conflicts: # Cotabby.xcodeproj/project.pbxproj
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Clipboard content was blindly injected into every autocomplete prompt, even when stale
or noisy. This PR adds two layers of filtering:
Relevance filter (
ClipboardRelevanceFilter) — gates clipboard injection on threezero-latency heuristics: staleness (>5 min unchanged → drop), app affinity (same-app
copy always passes), and token overlap (no shared words with prefix → drop).
Content distillation (
ClipboardContentDistiller) — for clipboard that passes therelevance gate, extracts only the lines whose tokens overlap with the user's prefix
text. Short clipboard (≤3 lines) passes through unchanged. When no individual line
overlaps, the first 300 characters are kept as a head-biased fallback.
Also extracts shared tokenization logic into
PromptContextSanitizer.significantTokens()so both components use the same tokenizer.
Validation
Linked issues
None.
Risk / rollout notes
ClipboardContextProvidingprotocol gains acurrentChangeCountproperty. Any test doubles conforming to this protocol will need updating.static leton their respective types.Greptile Summary
This PR adds two layers of clipboard filtering to the autocomplete pipeline: a
ClipboardRelevanceFilterthat gates injection on staleness and token overlap, and aClipboardContentDistillerthat trims long clipboard to only the lines that share tokens with the user's prefix. It also addresses three previously identified bugs — the boot-time staleness clock reset, the erroneous app-affinity bypass, and the filter/distiller window mismatch — by adopting a nil-sentinel baseline, removing app-affinity logic entirely, and computing the truncated prefix once in the coordinator before passing it to both components.ClipboardRelevanceFilter: state machine keyed onNSPasteboard.changeCount; first observation silently records the baseline without starting the staleness clock, so content copied before Cotabby launched is never injected until the user performs a fresh copy.ClipboardContentDistiller: clips short (≤3 line) content through unchanged; longer content is filtered to token-overlapping lines with a 300-char head fallback when no line matches.SuggestionRequestFactory:truncatedPromptPrefixpromoted tointernalandprefixTextthreaded through to the distiller so the filter and distiller evaluate the same bounded window.Confidence Score: 5/5
Safe to merge. The change is purely additive filtering — clipboard content that was previously injected may now be dropped or trimmed, but no new content is introduced into prompts.
All three bugs flagged in prior review rounds have been addressed: the nil-sentinel baseline prevents the boot-time clock reset, app-affinity logic is removed entirely, and the coordinator now computes the truncated prefix once and passes it to both the filter and the distiller. New types are @MainActor-isolated, injected via narrow protocols consistent with AGENTS.md conventions, and covered by deterministic unit tests with a swappable dateProvider.
The project.pbxproj has a LlamaSwift in Frameworks build file entry that appears unrelated to this feature and has no matching Frameworks build phase addition in the diff — worth confirming it is intentional.
Important Files Changed
Sequence Diagram
sequenceDiagram participant SC as SuggestionCoordinator participant CCP as ClipboardContextProvider participant SRF as SuggestionRequestFactory participant CRF as ClipboardRelevanceFilter participant CCD as ClipboardContentDistiller participant PSan as PromptContextSanitizer SC->>CCP: currentContext() CCP-->>SC: rawClipboard (String?) SC->>CCP: currentChangeCount CCP-->>SC: pasteboardChangeCount (Int) SC->>SRF: truncatedPromptPrefix(from: rawContext.precedingText) SRF-->>SC: truncatedPrefix SC->>CRF: filter(clipboard:, pasteboardChangeCount:, precedingText: truncatedPrefix) note over CRF: 1st call → record baseline, return nil CRF->>PSan: significantTokens(from: clipboard) CRF->>PSan: significantTokens(from: precedingText) PSan-->>CRF: "Set<String>" CRF-->>SC: clipboardContext (String? — nil if stale or no overlap) SC->>SRF: buildRequest(context:, clipboardContext:, ...) SRF->>SRF: truncatedPromptPrefix(from: context.precedingText) SRF->>PSan: sanitize(rawContext) PSan-->>SRF: sanitizedContext SRF->>CCD: distill(clipboard: sanitizedContext, prefixText:) CCD->>PSan: significantTokens(from: prefixText) CCD->>PSan: significantTokens(from: each line) PSan-->>CCD: "Set<String>" CCD-->>SRF: distilled (relevant lines or head fallback) SRF->>SRF: clippedText(distilled, max: 1200) SRF-->>SC: SuggestionRequestBuildResultReviews (6): Last reviewed commit: "Merge remote-tracking branch 'origin/mai..." | Re-trigger Greptile
Context used: