Skip to content

Fix mid-word completion inserting a stray space on accept#622

Merged
FuJacob merged 1 commit into
mainfrom
fix/621-trust-model-word-boundary
Jun 6, 2026
Merged

Fix mid-word completion inserting a stray space on accept#622
FuJacob merged 1 commit into
mainfrom
fix/621-trust-model-word-boundary

Conversation

@FuJacob
Copy link
Copy Markdown
Owner

@FuJacob FuJacob commented Jun 6, 2026

Summary

On Tab, a completion that continued the user's partial word (caret at the end of after, model suggests noon) committed as after noon instead of afternoon. The accept path synthesized a word boundary the ghost text never showed, so the inserted text disagreed with the overlay. This removes that synthesis and trusts the model's own leading-space signal, so what Tab inserts matches what the ghost showed.

The base-model prompt ends at a clean boundary (BaseCompletionPromptRenderer trims trailing whitespace), so the model's first token already encodes intent: a leading space means "new word", none means "continue the current word". insertionChunk now types the chunk verbatim and lets that decide the boundary, instead of injecting a separator whenever a word character met a word character.

Validation

xcodebuild -project Cotabby.xcodeproj -scheme Cotabby -destination 'platform=macOS' build \
  -derivedDataPath build/DerivedData CODE_SIGNING_ALLOWED=NO
# ** BUILD SUCCEEDED **

xcodebuild ... test-without-building \
  -only-testing:CotabbyTests/SuggestionSessionReconcilerTests \
  -only-testing:CotabbyTests/SuggestionTextNormalizerTests \
  -only-testing:CotabbyTests/SuggestionCoordinatorAcceptanceTests
# SuggestionSessionReconcilerTests: 72 tests, 0 failures
# Normalizer + CoordinatorAcceptance: 29 tests, 0 failures

swiftlint lint --quiet <changed files>
# exit 0

Added a regression test pinning the reported case: insertionChunk(forAcceptedChunk: "noon", precedingText: "after") == "noon" (was " noon"). Manually confirmed in-app that mid-word completions now accept without a stray space, and that genuine new-word completions still arrive with their space.

Linked issues

Master tracking issue: #623

Fixes #621
Fixes #620
Fixes #617
Fixes #615
Fixes #614
Fixes #559
Fixes #557
Fixes #553
Fixes #548
Fixes #547
Fixes #525
Fixes #491
Fixes #479
Fixes #395
Fixes #549
Fixes #543
Fixes #507

Risk / rollout notes

  • Behavior change on the accept path only. insertionChunk no longer synthesizes a separator when a word-character chunk meets a word-character prefix; it types the chunk verbatim. The model's leading space (already carried into the first chunk by nextAcceptanceChunk) is what now marks a new word.
  • The leading-space dedup (drop a leading space when the live field already ends in whitespace) is unchanged, so between-words completions do not double-space. This dedup is load-bearing because the prompt is trimmed before generation, so the model always emits a leading space for a new word.
  • Tradeoff: if the model omits a space it should have emitted (Hello + World), the words now glue into HelloWorld rather than being separated. This is WYSIWYG, the ghost text showed the glue, and it is strictly less confusing than the previous silent injection that produced after noon from a ghost reading afternoon.
  • Possible follow-up: a SymSpell dictionary tiebreak (the 82k-word corrector is already wired into the coordinator) to split genuine two-word cases while keeping real compounds like afternoon glued. The data to tune it lives in llm-io.jsonl.

Greptile Summary

Fixes a bug where mid-word Tab completions committed with a stray synthesized space — after + noon became after noon instead of afternoon. The fix removes the old word-boundary synthesis in insertionChunk, leaving only the double-space dedup, so the accepted text now matches exactly what the ghost overlay showed (WYSIWYG).

  • SuggestionSessionReconciler.insertionChunk: Old Rule 2 (inject a space whenever chunk-starts-with-word-char meets prefix-ends-in-word-char) is dropped entirely. The function now either strips a redundant leading space (when the live field already ends in whitespace) or types the chunk verbatim, trusting the model's own leading-space signal.
  • Tests: Three tests are renamed and their expected values inverted to pin the new no-synthesis contract; a new test_insertionChunk_continuesPartialWordWhenModelOmitsLeadingSpace regression case locks in the exact after/noonafternoon scenario from issue [Bug] Auto-Complete Bug #621.

Confidence Score: 4/5

Safe to merge; change is confined to the accept path and makes committed text match what the ghost overlay showed.

The removal of word-boundary synthesis is clearly correct and matches the documented WYSIWYG intent. The doc comment for insertionChunk grounds its reasoning in BaseCompletionPromptRenderer's prompt-trimming behaviour, but the function is also exercised by the Foundation Model backend whose prompt renderer does not trim trailing whitespace — the contract still holds, but the stated rationale only covers one of the two code paths.

The doc comment in SuggestionSessionReconciler.insertionChunk (lines 415–420) deserves a second look to confirm it accurately describes both the llama and Foundation Model backends.

Important Files Changed

Filename Overview
Cotabby/Support/SuggestionSessionReconciler.swift Removes the word-boundary synthesis (old Rule 2) from insertionChunk, leaving only the double-space dedup (old Rule 1); logic is clean and well-commented, with a minor doc-comment gap for the Foundation Model path
CotabbyTests/SuggestionSessionReconcilerTests.swift Renames and rewrites three tests to pin the new trust-the-model contract; adds the specific "#621" regression case; all assertions align with the updated behavior

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["insertionChunk(chunk, precedingText)"] --> B{precedingText ends\nin horizontal whitespace?}
    B -- Yes --> C["Drop leading horizontal\nwhitespace from chunk"]
    C --> D["Return stripped chunk\n(prevents double-space)"]
    B -- No --> E["Return chunk verbatim\n(trust model's leading-space signal)"]
    E --> F{Did model include\nleading space?}
    F -- "Yes, e.g. ' World'" --> G["'Hello' + ' World'\n→ 'Hello World' ✓"]
    F -- "No, e.g. 'noon'" --> H["'after' + 'noon'\n→ 'afternoon' ✓ (was 'after noon')"]
Loading

Fix All in Codex Fix All in Claude Code

Reviews (1): Last reviewed commit: "Trust the model's word boundary on sugge..." | Re-trigger Greptile

Greptile also left 1 inline comment on this PR.

insertionChunk synthesized a separator whenever a completion's first word met the user's partial word, so "after" + model "noon" committed as "after noon" even though the ghost text showed "afternoon". The space was injected only at accept time, so it never appeared in the overlay: a non-WYSIWYG surprise on every mid-word accept.

The base-model prompt ends at a clean boundary, so the model's first token already encodes intent: a leading space means new word, none means continue the current word. Honor it by removing the boundary synthesis (Rule 2) and typing the chunk verbatim. The leading-space dedup (Rule 1) stays, because the prompt is trimmed before generation so the model always emits a leading space for a new word that would otherwise double against whitespace the field already provides.
Comment thread Cotabby/Support/SuggestionSessionReconciler.swift
@FuJacob FuJacob merged commit 9a64106 into main Jun 6, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment