Reframe FM instructions positively and trim few-shots to two#342
Merged
Conversation
FuJacob
added a commit
that referenced
this pull request
May 28, 2026
- Few-shot count assertion now scopes the Continuation: split to the examples section so injected language hints or custom rules cannot inflate the count. - Output-contract assertion pins each forbidden-content token individually so a future wording change can't silently drop one.
- Few-shot count assertion now scopes the Continuation: split to the examples section so injected language hints or custom rules cannot inflate the count. - Output-contract assertion pins each forbidden-content token individually so a future wording change can't silently drop one.
Local FM eval run on the full stack (with #336 bounded to single-turn sessions) showed two cases (codeComment "// This is a workaround for the bug in ", prose "The Swift compiler enforces optionals because ") that the model echoed verbatim instead of continuing. The normalizer correctly strips the echo, but the user-visible result is an empty suggestion. These cases passed when this PR was first measured because the unconditional session reuse left a growing transcript of prior (continue-do-not-echo) demonstrations on every later request — implicit in-context learning that masked the rule removal. Once the engine is bounded to single-turn sessions (#336 follow-up), the rule has to be in the instructions channel for every request, not implicit in transcript history. The new rule pairs positive framing ("Continue from the position immediately after the existing text") with the explicit prohibition that was removed, keeping the spirit of WWDC25's positive-identity guidance while restoring the hard constraint. Eval after this change: drift=3, midword=10, empty=0, noise=0 — same shape as the #335 baseline. A new test_sessionInstructions_forbidEchoingExistingText assertion pins both clauses so a future rewrite cannot silently drop them again.
6f1afe5 to
3a129d0
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Apple's WWDC25 prompt-design guidance is to use a short positive identity plus a small number of demonstrations rather than a long list of prohibitions, because the chat-tuned system model responds more reliably to "this is who you are" than to "do not do X." The previous FM session instructions were 10 lines of mostly negative rules with a 5-pair few-shot block. This rewrite collapses the rules to four positive lines and trims the few-shot block to two pairs (one prose, one code), shrinking the instruction prefix that lands in Apple's 4096-token shared context and re-pointing the chat prior at the actual task.
Stacked on
feat/fm-streaming. Use the eval suite from the parent stack PR to compare drift / mid-word truncation rates against the previous instruction text.Validation
xcodebuild test ... -only-testing:CotabbyTests/FoundationModelPromptRendererTests -only-testing:CotabbyTests/SuggestionEngineRouterTests CODE_SIGNING_ALLOWED=NO→
Executed 9 tests, with 0 failures (0 unexpected)swiftlint lint --quiet→ no violations.Manual: pick prefixes that historically drifted ("Hey Jacob, ", "Thanks for ", "Hi Sarah,\n\n") and verify the suggestion continues the sentence in voice instead of greeting the user back.
Linked issues
Refs the FM-quality investigation.
Risk / rollout notes
test_sessionInstructions_includeExactlyTwoContinuationExamples) pins the few-shot count to two so the trim cannot silently regrow.perf/fm-session-reuse-prewarm) is keyed on the rendered instructions string. After this lands, the first request on each session rebuilds the cache once with the shorter instructions; subsequent requests reuse it. No special migration needed.Greptile Summary
This PR replaces the FM session instruction block with a shorter, positively-framed identity following Apple's WWDC25 prompt-design guidance, and trims the few-shot demonstration set from five pairs to two (one prose salutation, one code). No request shape, engine wiring, or output-normalizer logic changes.
FoundationModelPromptRenderer.swift: the base rules array shrinks from 10 lines to 5 positive identity lines;continuationExampleLinesshrinks from 10 strings (5 pairs) to 4 strings (2 pairs), keeping the prose-salutation and code cases that cover the two principal failure modes.CotabbyTests/PromptPolicyTests.swift: three tests are renamed and rewritten to match the new instruction text, and a newtest_sessionInstructions_includeExactlyTwoContinuationExamplesis added to pin the few-shot count so a silent regrowth would fail fast.Confidence Score: 5/5
Safe to merge — this is a pure prompt-text change with no logic, protocol, or wiring modifications.
Every changed line is a string constant or a test assertion that mirrors it. The instruction-prefix cache rebuild on first request after landing is acknowledged and requires no migration. The new tests are accurate, and the few-shot trim is deliberate and well-reasoned.
No files require special attention.
Important Files Changed
Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A[SuggestionRequest] --> B[sessionInstructions] B --> C[5 positive identity lines] C --> C1["1. Positive identity: 'You complete partially-typed text'"] C --> C2["2. Output contract: no greeting/sign-off/quotes/markdown/labels/explanation"] C --> C3["3. Anti-echo guard: Continue from position after existing text"] C --> C4["4. Style match: language, register, casing, punctuation"] C --> C5["5. Context usage: clipboard/screen only when helpful"] B --> D{languageInstruction?} D -- yes --> E[Append language hint] D -- no --> F[Skip] E --> G["Examples header + 2 few-shot pairs"] F --> G G --> G1["Pair 1: prose salutation (anti-restart)"] G --> G2["Pair 2: code prefix → code continuation"] G --> H{customRules?} H -- yes --> I["'Your style preferences:' + rules + subordination line"] H -- no --> J[Skip] I --> K[lines.joined separator newline] J --> K K --> L[Instructions string → Apple FM channel]Reviews (4): Last reviewed commit: "Restore no-echo rule to prevent empty co..." | Re-trigger Greptile