Add fill-in-middle prompting for mid-line completions by FuJacob · Pull Request #521 · FuJacob/cotabby

FuJacob · 2026-06-02T01:59:13Z

Summary

Adds fill-in-middle (FIM) prompting so a base model can infill at the caret conditioned on the text
that comes after it, not just before. Today the Open Source path always builds a forward base prompt
(prefix last), so a mid-line completion has no idea what follows the cursor and tends to duplicate or
ignore it. FIM wraps the before- and after-cursor text in the model's FIM marker tokens
([prefix] <before> [suffix] <after> [middle]) and lets the model generate the missing middle.

FillInMiddlePolicy (pure, tested): detects the FIM marker token ids by scanning the vocabulary
for the <|fim_prefix|> / <|fim_suffix|> / <|fim_middle|> strings, and assembles the
prefix-suffix-middle token sequence, trimming each side toward the caret to fit a token budget.
The runtime detects and caches the markers per model (under autocompleteLock), and when a FIM
request is present builds the FIM prompt in place of the forward prompt; FIM bypasses KV-prefix
reuse and is never suffix-trimmed (which would drop the trailing [middle] marker). If the model
lacks the markers, it transparently falls back to the base prompt.
The suggestion engine supplies a FIM request only when the caret has text after it (a real mid-line
completion).

Gated behind a new cotabbyFillInMiddleEnabled developer flag (default off), and additionally
capability-gated: it only activates on models that actually ship the FIM markers (e.g. the Qwen
tiers). The forward base prompt is unchanged otherwise.

Validation

xcodebuild ... test ... CODE_SIGNING_ALLOWED=NO CODE_SIGNING_REQUIRED=NO \
  -only-testing:CotabbyTests/FillInMiddlePolicyTests \
  -only-testing:CotabbyTests/LlamaSuggestionEngineCancellationTests \
  -only-testing:CotabbyTests/ConstrainedBeamSearchTests
# ** TEST SUCCEEDED **
#   FillInMiddlePolicyTests: marker detection (all-present vs missing), prefix-suffix-middle ordering,
#     trim-toward-caret over budget, markers kept under a tiny budget
#   runtime tests unchanged / green (generate() restructure compiles and behaves)

swiftlint --strict --quiet   # exit 0 (clean)
xcodegen generate            # registered the two new files (CI drift guard passes)

Linked issues

None. Phase-3 parity: native FIM for mid-line completions.

Risk / rollout notes

Default off and capability-gated. With the flag off (or on a model without FIM markers), behavior
is byte-for-byte the existing forward base prompt. No change to shipped behavior.
FIM uses the raw before/after-cursor text (it is document-structure infilling), so it does not carry
the persona/custom-rules conditioning the forward prompt adds; that is the standard FIM trade-off and
can be revisited.
Detecting markers scans the vocabulary once per model (cached, under the lock) the first time a FIM
completion is requested. Enabling FIM by default still needs on-device quality evaluation on a
FIM-capable model.

Greptile Summary

Adds fill-in-middle (FIM) prompting for mid-line completions so the model can condition on text after the cursor, not just before it. The feature is default-off and capability-gated (falls back silently to the existing forward prompt on models without FIM marker tokens).

FillInMiddlePolicy (new, pure, tested) scans the vocabulary once per model to detect the three FIM marker token IDs and assembles a prefix-suffix-middle token sequence trimmed to fit the token budget.
LlamaRuntimeCore caches the detected markers per model under autocompleteLock, replaces the forward prompt tokens with the FIM sequence when requested, and clears the KV-reuse hint accordingly.
LlamaSuggestionEngine builds a FillInMiddleRequest only when the developer flag is on and the caret has non-empty trailing text.

Confidence Score: 4/5

Safe to merge: the feature is default-off and the existing forward-prompt path is untouched, so there is no change in shipped behavior.

The implementation is well-structured and the fallback path is robust. One functional limitation stands out in assemblePromptTokens: suffix context is hard-capped at half the token budget even when the prefix is short or empty, meaning the model sees less post-cursor content than the window allows in those cases. This would matter most for mid-line completions near the top of a document. There is also minor wasted work tokenizing the forward prompt before the FIM override, which is harmless now but worth cleaning up before the flag is enabled by default.

Cotabby/Support/FillInMiddlePolicy.swift — the budget-split logic in assemblePromptTokens; Cotabby/Services/Runtime/LlamaRuntimeCore.swift — the forward tokenize happens unconditionally before the FIM override.

Important Files Changed

Filename	Overview
Cotabby/Support/FillInMiddlePolicy.swift	New pure FIM policy: marker detection (early-exit scan) and prompt assembly. Budget allocation gives suffix a hard cap of budget/2 even when prefix is empty, leaving slots unused.
Cotabby/Services/Runtime/LlamaRuntimeCore.swift	Integrates FIM prompt building under autocompleteLock with per-model marker cache; forward prompt is still fully tokenized before being discarded when FIM overrides it.
Cotabby/Services/Runtime/LlamaSuggestionEngine.swift	Correctly gates FIM on the developer flag and non-empty trailing text; passes FillInMiddleRequest to generation options.
Cotabby/Models/LlamaRuntimeModels.swift	Adds FillInMiddleRequest struct and fillInMiddle option to LlamaGenerationOptions; well-isolated, excluded from SamplingFingerprint as documented.
CotabbyTests/FillInMiddlePolicyTests.swift	Covers marker detection, ordering, trim-toward-caret, and tiny-budget behaviour; all cases are deterministic and runtime-free.
Cotabby.xcodeproj/project.pbxproj	Registers FillInMiddlePolicy.swift and FillInMiddlePolicyTests.swift in both the app and test targets; no issues.

Sequence Diagram

sequenceDiagram
    participant SE as LlamaSuggestionEngine
    participant RC as LlamaRuntimeCore
    participant FP as FillInMiddlePolicy

    SE->>SE: "fillInMiddleRequest(for:)<br/>(flag on & trailingText non-empty)"
    SE->>RC: "generate(prompt:options:)<br/>options.fillInMiddle = FillInMiddleRequest"

    RC->>RC: "tokenize(forwardPrompt)<br/>(always runs, discarded if FIM active)"
    RC->>RC: autocompleteLock.lock()

    RC->>RC: "fimMarkers()<br/>(check per-model URL cache)"
    alt markers not yet cached
        RC->>FP: detectMarkers(vocabSize, bytesFor:)
        FP-->>RC: FIMMarkers? (nil if model lacks them)
        RC->>RC: cache markers + modelURL
    end

    alt "FIM active (request non-nil & markers found)"
        RC->>FP: assemblePromptTokens(prefix, suffix, markers, maxTokens)
        FP-->>RC: [Int32] token sequence
        RC->>RC: "promptTokens = fimTokens"
    else FIM inactive (flag off, no request, or model lacks markers)
        RC->>RC: use forward prompt tokens as-is
    end

    RC->>RC: obtainAutocompleteSequence(promptTokens, ...)
    RC-->>SE: generated text

Comments Outside Diff (1)

Cotabby/Services/Runtime/LlamaRuntimeCore.swift, line 150-162 (link)

Wasted forward-prompt tokenization when FIM overrides it. tokenize(prompt) (the full forward base prompt) is called unconditionally before the FIM override under the lock. When FIM is active the resulting allPromptTokens and all the trimming work are discarded. Since options.fillInMiddle is known before the lock, a guard options.fillInMiddle == nil || … early-path could skip the forward tokenize when FIM will definitely take over. Minor for now with the flag off, but worth addressing before enabling FIM by default.

_{Reviews (1): Last reviewed commit: "Add fill-in-middle prompting for mid-lin..." | Re-trigger Greptile}

Greptile also left 1 inline comment on this PR.

The Open Source path always built a forward base prompt, so a mid-line completion had no signal about the text after the caret. Add FIM: wrap the before/after-cursor text in the model's FIM marker tokens ([prefix] before [suffix] after [middle]) and let the model generate the missing middle. FillInMiddlePolicy (pure, tested) detects the marker token ids from the vocabulary and assembles the prefix-suffix-middle sequence, trimming each side toward the caret to fit a budget. The runtime caches the markers per model under the lock, builds the FIM prompt in place of the forward one (bypassing KV reuse, never suffix-trimmed), and falls back to the base prompt when markers are absent. Gated behind cotabbyFillInMiddleEnabled (default off) and capability-gated to models that ship FIM markers; shipped behavior is unchanged.

greptile-apps · 2026-06-02T02:03:23Z

+        let suffixKept = Array(suffixTokens.prefix(min(suffixTokens.count, budget / 2)))
+        let prefixKept = Array(prefixTokens.suffix(max(0, budget - suffixKept.count)))


Unused prefix budget is not reclaimed by the suffix. suffixKept is hard-capped at budget / 2 regardless of how few prefix tokens are actually available. When the cursor is near the start of a document (short or empty prefix), the suffix can only fill half the budget even though the other half goes unused — so the model sees far less post-cursor context than the window allows. The fix is to compute each side's actual allocation and let the other side claim the slack.

Suggested change

let suffixKept = Array(suffixTokens.prefix(min(suffixTokens.count, budget / 2)))

let prefixKept = Array(prefixTokens.suffix(max(0, budget - suffixKept.count)))

let prefixAvail = min(prefixTokens.count, budget - min(suffixTokens.count, budget / 2))

let prefixKept = Array(prefixTokens.suffix(prefixAvail))

let suffixKept = Array(suffixTokens.prefix(min(suffixTokens.count, budget - prefixKept.count)))

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

FuJacob merged commit 0abe365 into main Jun 2, 2026
4 checks passed

FuJacob deleted the feat/fill-in-middle branch June 2, 2026 02:02

greptile-apps Bot reviewed Jun 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add fill-in-middle prompting for mid-line completions#521

Add fill-in-middle prompting for mid-line completions#521
FuJacob merged 1 commit into
mainfrom
feat/fill-in-middle

FuJacob commented Jun 2, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

Uh oh!

greptile-apps Bot Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		let suffixKept = Array(suffixTokens.prefix(min(suffixTokens.count, budget / 2)))
		let prefixKept = Array(prefixTokens.suffix(max(0, budget - suffixKept.count)))

-        let suffixKept = Array(suffixTokens.prefix(min(suffixTokens.count, budget / 2)))
-        let prefixKept = Array(prefixTokens.suffix(max(0, budget - suffixKept.count)))
+        let prefixAvail = min(prefixTokens.count, budget - min(suffixTokens.count, budget / 2))
+        let prefixKept = Array(prefixTokens.suffix(prefixAvail))
+        let suffixKept = Array(suffixTokens.prefix(min(suffixTokens.count, budget - prefixKept.count)))

Uh oh!

Conversation

FuJacob commented Jun 2, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Linked issues

Risk / rollout notes

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Comments Outside Diff (1)

Uh oh!

Uh oh!

greptile-apps Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

FuJacob commented Jun 2, 2026 •

edited by greptile-apps Bot

Loading