Skip to content

Add fill-in-middle prompting for mid-line completions#521

Merged
FuJacob merged 1 commit into
mainfrom
feat/fill-in-middle
Jun 2, 2026
Merged

Add fill-in-middle prompting for mid-line completions#521
FuJacob merged 1 commit into
mainfrom
feat/fill-in-middle

Conversation

@FuJacob
Copy link
Copy Markdown
Owner

@FuJacob FuJacob commented Jun 2, 2026

Summary

Adds fill-in-middle (FIM) prompting so a base model can infill at the caret conditioned on the text
that comes after it, not just before. Today the Open Source path always builds a forward base prompt
(prefix last), so a mid-line completion has no idea what follows the cursor and tends to duplicate or
ignore it. FIM wraps the before- and after-cursor text in the model's FIM marker tokens
([prefix] <before> [suffix] <after> [middle]) and lets the model generate the missing middle.

  • FillInMiddlePolicy (pure, tested): detects the FIM marker token ids by scanning the vocabulary
    for the <|fim_prefix|> / <|fim_suffix|> / <|fim_middle|> strings, and assembles the
    prefix-suffix-middle token sequence, trimming each side toward the caret to fit a token budget.
  • The runtime detects and caches the markers per model (under autocompleteLock), and when a FIM
    request is present builds the FIM prompt in place of the forward prompt; FIM bypasses KV-prefix
    reuse and is never suffix-trimmed (which would drop the trailing [middle] marker). If the model
    lacks the markers, it transparently falls back to the base prompt.
  • The suggestion engine supplies a FIM request only when the caret has text after it (a real mid-line
    completion).

Gated behind a new cotabbyFillInMiddleEnabled developer flag (default off), and additionally
capability-gated: it only activates on models that actually ship the FIM markers (e.g. the Qwen
tiers). The forward base prompt is unchanged otherwise.

Validation

xcodebuild ... test ... CODE_SIGNING_ALLOWED=NO CODE_SIGNING_REQUIRED=NO \
  -only-testing:CotabbyTests/FillInMiddlePolicyTests \
  -only-testing:CotabbyTests/LlamaSuggestionEngineCancellationTests \
  -only-testing:CotabbyTests/ConstrainedBeamSearchTests
# ** TEST SUCCEEDED **
#   FillInMiddlePolicyTests: marker detection (all-present vs missing), prefix-suffix-middle ordering,
#     trim-toward-caret over budget, markers kept under a tiny budget
#   runtime tests unchanged / green (generate() restructure compiles and behaves)

swiftlint --strict --quiet   # exit 0 (clean)
xcodegen generate            # registered the two new files (CI drift guard passes)

Linked issues

None. Phase-3 parity: native FIM for mid-line completions.

Risk / rollout notes

  • Default off and capability-gated. With the flag off (or on a model without FIM markers), behavior
    is byte-for-byte the existing forward base prompt. No change to shipped behavior.
  • FIM uses the raw before/after-cursor text (it is document-structure infilling), so it does not carry
    the persona/custom-rules conditioning the forward prompt adds; that is the standard FIM trade-off and
    can be revisited.
  • Detecting markers scans the vocabulary once per model (cached, under the lock) the first time a FIM
    completion is requested. Enabling FIM by default still needs on-device quality evaluation on a
    FIM-capable model.

Greptile Summary

Adds fill-in-middle (FIM) prompting for mid-line completions so the model can condition on text after the cursor, not just before it. The feature is default-off and capability-gated (falls back silently to the existing forward prompt on models without FIM marker tokens).

  • FillInMiddlePolicy (new, pure, tested) scans the vocabulary once per model to detect the three FIM marker token IDs and assembles a prefix-suffix-middle token sequence trimmed to fit the token budget.
  • LlamaRuntimeCore caches the detected markers per model under autocompleteLock, replaces the forward prompt tokens with the FIM sequence when requested, and clears the KV-reuse hint accordingly.
  • LlamaSuggestionEngine builds a FillInMiddleRequest only when the developer flag is on and the caret has non-empty trailing text.

Confidence Score: 4/5

Safe to merge: the feature is default-off and the existing forward-prompt path is untouched, so there is no change in shipped behavior.

The implementation is well-structured and the fallback path is robust. One functional limitation stands out in assemblePromptTokens: suffix context is hard-capped at half the token budget even when the prefix is short or empty, meaning the model sees less post-cursor content than the window allows in those cases. This would matter most for mid-line completions near the top of a document. There is also minor wasted work tokenizing the forward prompt before the FIM override, which is harmless now but worth cleaning up before the flag is enabled by default.

Cotabby/Support/FillInMiddlePolicy.swift — the budget-split logic in assemblePromptTokens; Cotabby/Services/Runtime/LlamaRuntimeCore.swift — the forward tokenize happens unconditionally before the FIM override.

Important Files Changed

Filename Overview
Cotabby/Support/FillInMiddlePolicy.swift New pure FIM policy: marker detection (early-exit scan) and prompt assembly. Budget allocation gives suffix a hard cap of budget/2 even when prefix is empty, leaving slots unused.
Cotabby/Services/Runtime/LlamaRuntimeCore.swift Integrates FIM prompt building under autocompleteLock with per-model marker cache; forward prompt is still fully tokenized before being discarded when FIM overrides it.
Cotabby/Services/Runtime/LlamaSuggestionEngine.swift Correctly gates FIM on the developer flag and non-empty trailing text; passes FillInMiddleRequest to generation options.
Cotabby/Models/LlamaRuntimeModels.swift Adds FillInMiddleRequest struct and fillInMiddle option to LlamaGenerationOptions; well-isolated, excluded from SamplingFingerprint as documented.
CotabbyTests/FillInMiddlePolicyTests.swift Covers marker detection, ordering, trim-toward-caret, and tiny-budget behaviour; all cases are deterministic and runtime-free.
Cotabby.xcodeproj/project.pbxproj Registers FillInMiddlePolicy.swift and FillInMiddlePolicyTests.swift in both the app and test targets; no issues.

Sequence Diagram

sequenceDiagram
    participant SE as LlamaSuggestionEngine
    participant RC as LlamaRuntimeCore
    participant FP as FillInMiddlePolicy

    SE->>SE: "fillInMiddleRequest(for:)<br/>(flag on & trailingText non-empty)"
    SE->>RC: "generate(prompt:options:)<br/>options.fillInMiddle = FillInMiddleRequest"

    RC->>RC: "tokenize(forwardPrompt)<br/>(always runs, discarded if FIM active)"
    RC->>RC: autocompleteLock.lock()

    RC->>RC: "fimMarkers()<br/>(check per-model URL cache)"
    alt markers not yet cached
        RC->>FP: detectMarkers(vocabSize, bytesFor:)
        FP-->>RC: FIMMarkers? (nil if model lacks them)
        RC->>RC: cache markers + modelURL
    end

    alt "FIM active (request non-nil & markers found)"
        RC->>FP: assemblePromptTokens(prefix, suffix, markers, maxTokens)
        FP-->>RC: [Int32] token sequence
        RC->>RC: "promptTokens = fimTokens"
    else FIM inactive (flag off, no request, or model lacks markers)
        RC->>RC: use forward prompt tokens as-is
    end

    RC->>RC: obtainAutocompleteSequence(promptTokens, ...)
    RC-->>SE: generated text
Loading

Comments Outside Diff (1)

  1. Cotabby/Services/Runtime/LlamaRuntimeCore.swift, line 150-162 (link)

    P2 Wasted forward-prompt tokenization when FIM overrides it. tokenize(prompt) (the full forward base prompt) is called unconditionally before the FIM override under the lock. When FIM is active the resulting allPromptTokens and all the trimming work are discarded. Since options.fillInMiddle is known before the lock, a guard options.fillInMiddle == nil || … early-path could skip the forward tokenize when FIM will definitely take over. Minor for now with the flag off, but worth addressing before enabling FIM by default.

    Fix in Codex Fix in Claude Code

Fix All in Codex Fix All in Claude Code

Reviews (1): Last reviewed commit: "Add fill-in-middle prompting for mid-lin..." | Re-trigger Greptile

Greptile also left 1 inline comment on this PR.

The Open Source path always built a forward base prompt, so a mid-line completion had no signal about the text after the caret. Add FIM: wrap the before/after-cursor text in the model's FIM marker tokens ([prefix] before [suffix] after [middle]) and let the model generate the missing middle.

FillInMiddlePolicy (pure, tested) detects the marker token ids from the vocabulary and assembles the prefix-suffix-middle sequence, trimming each side toward the caret to fit a budget. The runtime caches the markers per model under the lock, builds the FIM prompt in place of the forward one (bypassing KV reuse, never suffix-trimmed), and falls back to the base prompt when markers are absent.

Gated behind cotabbyFillInMiddleEnabled (default off) and capability-gated to models that ship FIM markers; shipped behavior is unchanged.
@FuJacob FuJacob merged commit 0abe365 into main Jun 2, 2026
4 checks passed
@FuJacob FuJacob deleted the feat/fill-in-middle branch June 2, 2026 02:02
Comment on lines +72 to +73
let suffixKept = Array(suffixTokens.prefix(min(suffixTokens.count, budget / 2)))
let prefixKept = Array(prefixTokens.suffix(max(0, budget - suffixKept.count)))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unused prefix budget is not reclaimed by the suffix. suffixKept is hard-capped at budget / 2 regardless of how few prefix tokens are actually available. When the cursor is near the start of a document (short or empty prefix), the suffix can only fill half the budget even though the other half goes unused — so the model sees far less post-cursor context than the window allows. The fix is to compute each side's actual allocation and let the other side claim the slack.

Suggested change
let suffixKept = Array(suffixTokens.prefix(min(suffixTokens.count, budget / 2)))
let prefixKept = Array(prefixTokens.suffix(max(0, budget - suffixKept.count)))
let prefixAvail = min(prefixTokens.count, budget - min(suffixTokens.count, budget / 2))
let prefixKept = Array(prefixTokens.suffix(prefixAvail))
let suffixKept = Array(suffixTokens.prefix(min(suffixTokens.count, budget - prefixKept.count)))

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Fix in Codex Fix in Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant