Add fill-in-middle prompting for mid-line completions#521
Merged
Conversation
The Open Source path always built a forward base prompt, so a mid-line completion had no signal about the text after the caret. Add FIM: wrap the before/after-cursor text in the model's FIM marker tokens ([prefix] before [suffix] after [middle]) and let the model generate the missing middle. FillInMiddlePolicy (pure, tested) detects the marker token ids from the vocabulary and assembles the prefix-suffix-middle sequence, trimming each side toward the caret to fit a budget. The runtime caches the markers per model under the lock, builds the FIM prompt in place of the forward one (bypassing KV reuse, never suffix-trimmed), and falls back to the base prompt when markers are absent. Gated behind cotabbyFillInMiddleEnabled (default off) and capability-gated to models that ship FIM markers; shipped behavior is unchanged.
Comment on lines
+72
to
+73
| let suffixKept = Array(suffixTokens.prefix(min(suffixTokens.count, budget / 2))) | ||
| let prefixKept = Array(prefixTokens.suffix(max(0, budget - suffixKept.count))) |
Contributor
There was a problem hiding this comment.
Unused prefix budget is not reclaimed by the suffix.
suffixKept is hard-capped at budget / 2 regardless of how few prefix tokens are actually available. When the cursor is near the start of a document (short or empty prefix), the suffix can only fill half the budget even though the other half goes unused — so the model sees far less post-cursor context than the window allows. The fix is to compute each side's actual allocation and let the other side claim the slack.
Suggested change
| let suffixKept = Array(suffixTokens.prefix(min(suffixTokens.count, budget / 2))) | |
| let prefixKept = Array(prefixTokens.suffix(max(0, budget - suffixKept.count))) | |
| let prefixAvail = min(prefixTokens.count, budget - min(suffixTokens.count, budget / 2)) | |
| let prefixKept = Array(prefixTokens.suffix(prefixAvail)) | |
| let suffixKept = Array(suffixTokens.prefix(min(suffixTokens.count, budget - prefixKept.count))) |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds fill-in-middle (FIM) prompting so a base model can infill at the caret conditioned on the text
that comes after it, not just before. Today the Open Source path always builds a forward base prompt
(prefix last), so a mid-line completion has no idea what follows the cursor and tends to duplicate or
ignore it. FIM wraps the before- and after-cursor text in the model's FIM marker tokens
(
[prefix] <before> [suffix] <after> [middle]) and lets the model generate the missing middle.FillInMiddlePolicy(pure, tested): detects the FIM marker token ids by scanning the vocabularyfor the
<|fim_prefix|>/<|fim_suffix|>/<|fim_middle|>strings, and assembles theprefix-suffix-middle token sequence, trimming each side toward the caret to fit a token budget.
autocompleteLock), and when a FIMrequest is present builds the FIM prompt in place of the forward prompt; FIM bypasses KV-prefix
reuse and is never suffix-trimmed (which would drop the trailing
[middle]marker). If the modellacks the markers, it transparently falls back to the base prompt.
completion).
Gated behind a new
cotabbyFillInMiddleEnableddeveloper flag (default off), and additionallycapability-gated: it only activates on models that actually ship the FIM markers (e.g. the Qwen
tiers). The forward base prompt is unchanged otherwise.
Validation
Linked issues
None. Phase-3 parity: native FIM for mid-line completions.
Risk / rollout notes
is byte-for-byte the existing forward base prompt. No change to shipped behavior.
the persona/custom-rules conditioning the forward prompt adds; that is the standard FIM trade-off and
can be revisited.
completion is requested. Enabling FIM by default still needs on-device quality evaluation on a
FIM-capable model.
Greptile Summary
Adds fill-in-middle (FIM) prompting for mid-line completions so the model can condition on text after the cursor, not just before it. The feature is default-off and capability-gated (falls back silently to the existing forward prompt on models without FIM marker tokens).
FillInMiddlePolicy(new, pure, tested) scans the vocabulary once per model to detect the three FIM marker token IDs and assembles a prefix-suffix-middle token sequence trimmed to fit the token budget.LlamaRuntimeCorecaches the detected markers per model underautocompleteLock, replaces the forward prompt tokens with the FIM sequence when requested, and clears the KV-reuse hint accordingly.LlamaSuggestionEnginebuilds aFillInMiddleRequestonly when the developer flag is on and the caret has non-empty trailing text.Confidence Score: 4/5
Safe to merge: the feature is default-off and the existing forward-prompt path is untouched, so there is no change in shipped behavior.
The implementation is well-structured and the fallback path is robust. One functional limitation stands out in assemblePromptTokens: suffix context is hard-capped at half the token budget even when the prefix is short or empty, meaning the model sees less post-cursor content than the window allows in those cases. This would matter most for mid-line completions near the top of a document. There is also minor wasted work tokenizing the forward prompt before the FIM override, which is harmless now but worth cleaning up before the flag is enabled by default.
Cotabby/Support/FillInMiddlePolicy.swift — the budget-split logic in assemblePromptTokens; Cotabby/Services/Runtime/LlamaRuntimeCore.swift — the forward tokenize happens unconditionally before the FIM override.
Important Files Changed
Sequence Diagram
sequenceDiagram participant SE as LlamaSuggestionEngine participant RC as LlamaRuntimeCore participant FP as FillInMiddlePolicy SE->>SE: "fillInMiddleRequest(for:)<br/>(flag on & trailingText non-empty)" SE->>RC: "generate(prompt:options:)<br/>options.fillInMiddle = FillInMiddleRequest" RC->>RC: "tokenize(forwardPrompt)<br/>(always runs, discarded if FIM active)" RC->>RC: autocompleteLock.lock() RC->>RC: "fimMarkers()<br/>(check per-model URL cache)" alt markers not yet cached RC->>FP: detectMarkers(vocabSize, bytesFor:) FP-->>RC: FIMMarkers? (nil if model lacks them) RC->>RC: cache markers + modelURL end alt "FIM active (request non-nil & markers found)" RC->>FP: assemblePromptTokens(prefix, suffix, markers, maxTokens) FP-->>RC: [Int32] token sequence RC->>RC: "promptTokens = fimTokens" else FIM inactive (flag off, no request, or model lacks markers) RC->>RC: use forward prompt tokens as-is end RC->>RC: obtainAutocompleteSequence(promptTokens, ...) RC-->>SE: generated textComments Outside Diff (1)
Cotabby/Services/Runtime/LlamaRuntimeCore.swift, line 150-162 (link)tokenize(prompt)(the full forward base prompt) is called unconditionally before the FIM override under the lock. When FIM is active the resultingallPromptTokensand all the trimming work are discarded. Sinceoptions.fillInMiddleis known before the lock, aguard options.fillInMiddle == nil || …early-path could skip the forward tokenize when FIM will definitely take over. Minor for now with the flag off, but worth addressing before enabling FIM by default.Reviews (1): Last reviewed commit: "Add fill-in-middle prompting for mid-lin..." | Re-trigger Greptile