Reframe FM instructions positively and trim few-shots to two by FuJacob · Pull Request #342 · FuJacob/cotabby

FuJacob · 2026-05-28T08:42:32Z

Summary

Apple's WWDC25 prompt-design guidance is to use a short positive identity plus a small number of demonstrations rather than a long list of prohibitions, because the chat-tuned system model responds more reliably to "this is who you are" than to "do not do X." The previous FM session instructions were 10 lines of mostly negative rules with a 5-pair few-shot block. This rewrite collapses the rules to four positive lines and trims the few-shot block to two pairs (one prose, one code), shrinking the instruction prefix that lands in Apple's 4096-token shared context and re-pointing the chat prior at the actual task.

Stacked on feat/fm-streaming. Use the eval suite from the parent stack PR to compare drift / mid-word truncation rates against the previous instruction text.

Validation

xcodebuild test ... -only-testing:CotabbyTests/FoundationModelPromptRendererTests -only-testing:CotabbyTests/SuggestionEngineRouterTests CODE_SIGNING_ALLOWED=NO
→ Executed 9 tests, with 0 failures (0 unexpected)

swiftlint lint --quiet → no violations.

Manual: pick prefixes that historically drifted ("Hey Jacob, ", "Thanks for ", "Hi Sarah,\n\n") and verify the suggestion continues the sentence in voice instead of greeting the user back.

Linked issues

Refs the FM-quality investigation.

Risk / rollout notes

Pure prompt-policy change. No protocol, request shape, or engine wiring change.
Three test assertions changed shape: the old ones pinned on phrases ("text-continuation engine", "Do not repeat or quote the existing text.") that no longer appear. The new ones lock in the positive identity ("complete partially-typed text"), the output contract anchor ("Output the continuation only:"), specific forbidden-content tokens ("no greeting", "no markdown"), and the style line.
A new test (test_sessionInstructions_includeExactlyTwoContinuationExamples) pins the few-shot count to two so the trim cannot silently regrow.
The instruction-prefix cache from PR 2 (perf/fm-session-reuse-prewarm) is keyed on the rendered instructions string. After this lands, the first request on each session rebuilds the cache once with the shorter instructions; subsequent requests reuse it. No special migration needed.

Greptile Summary

This PR replaces the FM session instruction block with a shorter, positively-framed identity following Apple's WWDC25 prompt-design guidance, and trims the few-shot demonstration set from five pairs to two (one prose salutation, one code). No request shape, engine wiring, or output-normalizer logic changes.

FoundationModelPromptRenderer.swift: the base rules array shrinks from 10 lines to 5 positive identity lines; continuationExampleLines shrinks from 10 strings (5 pairs) to 4 strings (2 pairs), keeping the prose-salutation and code cases that cover the two principal failure modes.
CotabbyTests/PromptPolicyTests.swift: three tests are renamed and rewritten to match the new instruction text, and a new test_sessionInstructions_includeExactlyTwoContinuationExamples is added to pin the few-shot count so a silent regrowth would fail fast.

Confidence Score: 5/5

Safe to merge — this is a pure prompt-text change with no logic, protocol, or wiring modifications.

Every changed line is a string constant or a test assertion that mirrors it. The instruction-prefix cache rebuild on first request after landing is acknowledged and requires no migration. The new tests are accurate, and the few-shot trim is deliberate and well-reasoned.

No files require special attention.

Important Files Changed

Filename	Overview
Cotabby/Support/FoundationModelPromptRenderer.swift	Rewrites FM session instructions from 10 negative prohibition lines + 5 few-shot pairs to 5 positive identity lines + 2 few-shot pairs; production logic is unchanged — the diff is purely string constants.
CotabbyTests/PromptPolicyTests.swift	Three test renames/rewrites plus a new count-pinning test; assertions are accurate for the new instruction text, and the coverage split across tests is appropriate.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[SuggestionRequest] --> B[sessionInstructions]
    B --> C[5 positive identity lines]
    C --> C1["1. Positive identity: 'You complete partially-typed text'"]
    C --> C2["2. Output contract: no greeting/sign-off/quotes/markdown/labels/explanation"]
    C --> C3["3. Anti-echo guard: Continue from position after existing text"]
    C --> C4["4. Style match: language, register, casing, punctuation"]
    C --> C5["5. Context usage: clipboard/screen only when helpful"]
    B --> D{languageInstruction?}
    D -- yes --> E[Append language hint]
    D -- no --> F[Skip]
    E --> G["Examples header + 2 few-shot pairs"]
    F --> G
    G --> G1["Pair 1: prose salutation (anti-restart)"]
    G --> G2["Pair 2: code prefix → code continuation"]
    G --> H{customRules?}
    H -- yes --> I["'Your style preferences:' + rules + subordination line"]
    H -- no --> J[Skip]
    I --> K[lines.joined separator newline]
    J --> K
    K --> L[Instructions string → Apple FM channel]

_{Reviews (4): Last reviewed commit: "Restore no-echo rule to prevent empty co..." | Re-trigger Greptile}

- Few-shot count assertion now scopes the Continuation: split to the examples section so injected language hints or custom rules cannot inflate the count. - Output-contract assertion pins each forbidden-content token individually so a future wording change can't silently drop one.

Local FM eval run on the full stack (with #336 bounded to single-turn sessions) showed two cases (codeComment "// This is a workaround for the bug in ", prose "The Swift compiler enforces optionals because ") that the model echoed verbatim instead of continuing. The normalizer correctly strips the echo, but the user-visible result is an empty suggestion. These cases passed when this PR was first measured because the unconditional session reuse left a growing transcript of prior (continue-do-not-echo) demonstrations on every later request — implicit in-context learning that masked the rule removal. Once the engine is bounded to single-turn sessions (#336 follow-up), the rule has to be in the instructions channel for every request, not implicit in transcript history. The new rule pairs positive framing ("Continue from the position immediately after the existing text") with the explicit prohibition that was removed, keeping the spirit of WWDC25's positive-identity guidance while restoring the hard constraint. Eval after this change: drift=3, midword=10, empty=0, noise=0 — same shape as the #335 baseline. A new test_sessionInstructions_forbidEchoingExistingText assertion pins both clauses so a future rewrite cannot silently drop them again.

greptile-apps Bot reviewed May 28, 2026

View reviewed changes

Comment thread CotabbyTests/PromptPolicyTests.swift Outdated

Comment thread CotabbyTests/PromptPolicyTests.swift

FuJacob added 3 commits May 28, 2026 03:15

Reframe FM instructions positively and trim few-shots to two

d932b4d

FuJacob force-pushed the refactor/fm-positive-frame branch from 6f1afe5 to 3a129d0 Compare May 28, 2026 10:16

FuJacob changed the base branch from feat/fm-streaming to main May 28, 2026 10:16

FuJacob merged commit d991110 into main May 28, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reframe FM instructions positively and trim few-shots to two#342

Reframe FM instructions positively and trim few-shots to two#342
FuJacob merged 3 commits into
mainfrom
refactor/fm-positive-frame

FuJacob commented May 28, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

FuJacob commented May 28, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Linked issues

Risk / rollout notes

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

FuJacob commented May 28, 2026 •

edited by greptile-apps Bot

Loading