Skip to content

Add clipboard relevance filter and line-level distillation#210

Merged
FuJacob merged 5 commits into
mainfrom
clipboard-relevance-filter
May 25, 2026
Merged

Add clipboard relevance filter and line-level distillation#210
FuJacob merged 5 commits into
mainfrom
clipboard-relevance-filter

Conversation

@FuJacob
Copy link
Copy Markdown
Owner

@FuJacob FuJacob commented May 25, 2026

Summary

Clipboard content was blindly injected into every autocomplete prompt, even when stale
or noisy. This PR adds two layers of filtering:

  1. Relevance filter (ClipboardRelevanceFilter) — gates clipboard injection on three
    zero-latency heuristics: staleness (>5 min unchanged → drop), app affinity (same-app
    copy always passes), and token overlap (no shared words with prefix → drop).

  2. Content distillation (ClipboardContentDistiller) — for clipboard that passes the
    relevance gate, extracts only the lines whose tokens overlap with the user's prefix
    text. Short clipboard (≤3 lines) passes through unchanged. When no individual line
    overlaps, the first 300 characters are kept as a head-biased fallback.

Also extracts shared tokenization logic into PromptContextSanitizer.significantTokens()
so both components use the same tokenizer.

Validation

xcodebuild -project tabby.xcodeproj -scheme tabby -destination 'platform=macOS' build
# ** BUILD SUCCEEDED **

xcodebuild -project tabby.xcodeproj -scheme tabby -destination 'platform=macOS' build-for-testing
# ** TEST BUILD SUCCEEDED **

swiftlint lint --quiet
# No new warnings

Linked issues

None.

Risk / rollout notes

  • Users with clipboard context enabled will see filtered/distilled clipboard instead of raw. This is purely additive filtering — content that was previously injected may now be dropped or trimmed, but nothing new is added.
  • The ClipboardContextProviding protocol gains a currentChangeCount property. Any test doubles conforming to this protocol will need updating.
  • Tuning constants: staleness threshold (5 min), minimum token length (3 chars), compact line threshold (3 lines), head fallback (300 chars) — all static let on their respective types.

Greptile Summary

This PR adds two layers of clipboard filtering to the autocomplete pipeline: a ClipboardRelevanceFilter that gates injection on staleness and token overlap, and a ClipboardContentDistiller that trims long clipboard to only the lines that share tokens with the user's prefix. It also addresses three previously identified bugs — the boot-time staleness clock reset, the erroneous app-affinity bypass, and the filter/distiller window mismatch — by adopting a nil-sentinel baseline, removing app-affinity logic entirely, and computing the truncated prefix once in the coordinator before passing it to both components.

  • ClipboardRelevanceFilter: state machine keyed on NSPasteboard.changeCount; first observation silently records the baseline without starting the staleness clock, so content copied before Cotabby launched is never injected until the user performs a fresh copy.
  • ClipboardContentDistiller: clips short (≤3 line) content through unchanged; longer content is filtered to token-overlapping lines with a 300-char head fallback when no line matches.
  • SuggestionRequestFactory: truncatedPromptPrefix promoted to internal and prefixText threaded through to the distiller so the filter and distiller evaluate the same bounded window.

Confidence Score: 5/5

Safe to merge. The change is purely additive filtering — clipboard content that was previously injected may now be dropped or trimmed, but no new content is introduced into prompts.

All three bugs flagged in prior review rounds have been addressed: the nil-sentinel baseline prevents the boot-time clock reset, app-affinity logic is removed entirely, and the coordinator now computes the truncated prefix once and passes it to both the filter and the distiller. New types are @MainActor-isolated, injected via narrow protocols consistent with AGENTS.md conventions, and covered by deterministic unit tests with a swappable dateProvider.

The project.pbxproj has a LlamaSwift in Frameworks build file entry that appears unrelated to this feature and has no matching Frameworks build phase addition in the diff — worth confirming it is intentional.

Important Files Changed

Filename Overview
Cotabby/Support/ClipboardRelevanceFilter.swift New filter type addressing three previously filed bugs: removes false app-affinity bypass, uses nil-sentinel baseline to avoid the boot-time staleness clock reset, and evaluates token overlap against the same bounded window the distiller sees. Logic, state transitions, and test coverage all look correct.
Cotabby/Support/ClipboardContentDistiller.swift New distiller correctly passes short (≤3 line) clipboard through unchanged, filters longer content to token-overlapping lines, and falls back to the first 300 chars when no line matches. Uses the shared significantTokens tokenizer for consistency.
Cotabby/App/Coordinators/SuggestionCoordinator+Prediction.swift Computes the truncated prefix once and passes it to the relevance filter so the filter and the downstream distiller evaluate overlap against the same bounded window. Correctly replaces the old unconditional clipboard injection.
Cotabby/Support/SuggestionRequestFactory.swift Promotes truncatedPromptPrefix from private to internal so the coordinator can use it, threads prefixText into activeClipboardContext, and adds the ClipboardContentDistiller call before the existing character cap. Changes are minimal and non-breaking.
Cotabby/Models/SuggestionSubsystemContracts.swift Adds currentChangeCount to ClipboardContextProviding and introduces the ClipboardRelevanceFiltering protocol, both @MainActor-scoped and consistent with existing contract style.
CotabbyTests/ClipboardRelevanceFilterTests.swift Good coverage of nil input, baseline gating, token overlap (case-insensitive, short-token exclusion), staleness expiry, and clock reset on new copy. Uses injectable dateProvider to make time-dependent assertions deterministic.
CotabbyTests/ClipboardContentDistillerTests.swift Covers the short-passthrough, partial-overlap, no-overlap head-fallback, case-insensitive, short-token, and empty-prefix cases. All assertions match the documented behaviour.

Sequence Diagram

sequenceDiagram
    participant SC as SuggestionCoordinator
    participant CCP as ClipboardContextProvider
    participant SRF as SuggestionRequestFactory
    participant CRF as ClipboardRelevanceFilter
    participant CCD as ClipboardContentDistiller
    participant PSan as PromptContextSanitizer

    SC->>CCP: currentContext()
    CCP-->>SC: rawClipboard (String?)
    SC->>CCP: currentChangeCount
    CCP-->>SC: pasteboardChangeCount (Int)
    SC->>SRF: truncatedPromptPrefix(from: rawContext.precedingText)
    SRF-->>SC: truncatedPrefix
    SC->>CRF: filter(clipboard:, pasteboardChangeCount:, precedingText: truncatedPrefix)
    note over CRF: 1st call → record baseline, return nil
    CRF->>PSan: significantTokens(from: clipboard)
    CRF->>PSan: significantTokens(from: precedingText)
    PSan-->>CRF: "Set<String>"
    CRF-->>SC: clipboardContext (String? — nil if stale or no overlap)
    SC->>SRF: buildRequest(context:, clipboardContext:, ...)
    SRF->>SRF: truncatedPromptPrefix(from: context.precedingText)
    SRF->>PSan: sanitize(rawContext)
    PSan-->>SRF: sanitizedContext
    SRF->>CCD: distill(clipboard: sanitizedContext, prefixText:)
    CCD->>PSan: significantTokens(from: prefixText)
    CCD->>PSan: significantTokens(from: each line)
    PSan-->>CCD: "Set<String>"
    CCD-->>SRF: distilled (relevant lines or head fallback)
    SRF->>SRF: clippedText(distilled, max: 1200)
    SRF-->>SC: SuggestionRequestBuildResult
Loading

Fix All in Codex Fix All in Claude Code

Reviews (6): Last reviewed commit: "Merge remote-tracking branch 'origin/mai..." | Re-trigger Greptile

Context used:

  • Context used - AGENTS.md (source)

Comment thread Cotabby/Support/ClipboardRelevanceFilter.swift Outdated
Comment thread Cotabby/App/Coordinators/SuggestionCoordinator.swift Outdated
@FuJacob FuJacob changed the title Add clipboard relevance filter to drop stale context Add clipboard relevance filter and line-level distillation May 25, 2026
Comment thread Cotabby/Support/ClipboardRelevanceFilter.swift Outdated
FuJacob added 2 commits May 24, 2026 21:26
Clipboard content was blindly injected into every autocomplete prompt,
even when the user had long moved on from whatever they copied. This
adds a heuristic gate (ClipboardRelevanceFilter) that checks three
signals before allowing clipboard into the prompt:

1. Staleness — drop if clipboard unchanged for >5 minutes
2. App affinity — always keep if copied in the same app
3. Token overlap — drop if clipboard shares no words with current text

Zero latency, fully testable (injectable dateProvider), no model calls.
Even when the relevance filter keeps clipboard content, the raw text can
contain noisy lines (email signatures, import blocks, boilerplate) that
waste prompt tokens. ClipboardContentDistiller extracts only the lines
whose tokens overlap with the user's prefix text, keeping the prompt
focused on what's actually relevant to the current completion.

Short clipboard (≤3 lines) passes through unchanged. When no individual
line overlaps, the first 300 characters are kept as a head-biased
fallback.

Also extracts the shared tokenizer into PromptContextSanitizer so both
ClipboardRelevanceFilter and ClipboardContentDistiller use the same logic.
@FuJacob FuJacob force-pushed the clipboard-relevance-filter branch from 83ad0ea to 9afea0c Compare May 25, 2026 04:27
The clipboard contained "func deployServer() {" but the prefix was
"the deploy is running". significantTokens splits only on non-alphanumeric
boundaries, so deployServer lowercases to a single "deployserver" token
that does not match "deploy". The line was dropped and the test expected
it to be kept. Rename to "func deploy() {" so the line genuinely shares
a token with the prefix.
Comment thread Cotabby/App/Coordinators/SuggestionCoordinator+Prediction.swift
FuJacob added 2 commits May 24, 2026 22:18
Fix four review findings on the relevance filter and its wiring:

- Drop the same-app affinity heuristic. We only ever observe the typing
  app, not the actual copier, so recording it as the source granted
  unconditional injection in apps where the user merely typed. The
  filter now relies on staleness and token overlap only.

- Gate the staleness clock on a real baseline. lastKnownChangeCount
  starts as Optional. The first observation records the baseline without
  stamping a date, so pre-launch clipboard content stays out of the
  prompt until the user actually copies again while Cotabby is running.

- Align the filter and the distiller on the same prefix window. The
  coordinator now computes truncatedPromptPrefix once and feeds it to
  both, preventing the gate from passing on tokens that get truncated
  before the distiller sees them — which previously caused the
  head-fallback to inject 300 chars of unrelated clipboard.

- Replace the concrete ClipboardRelevanceFilter dependency on the
  coordinator with a ClipboardRelevanceFiltering protocol, matching the
  rest of the coordinator's collaborators.

Tests:
- Drop the app-affinity cases and add baseline-gating coverage so the
  first observation never injects and a subsequent change does.
…ilter

# Conflicts:
#	Cotabby.xcodeproj/project.pbxproj
@FuJacob FuJacob merged commit 9287a0c into main May 25, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant