Skip to content

Reframe FM instructions positively and trim few-shots to two#342

Merged
FuJacob merged 3 commits into
mainfrom
refactor/fm-positive-frame
May 28, 2026
Merged

Reframe FM instructions positively and trim few-shots to two#342
FuJacob merged 3 commits into
mainfrom
refactor/fm-positive-frame

Conversation

@FuJacob
Copy link
Copy Markdown
Owner

@FuJacob FuJacob commented May 28, 2026

Summary

Apple's WWDC25 prompt-design guidance is to use a short positive identity plus a small number of demonstrations rather than a long list of prohibitions, because the chat-tuned system model responds more reliably to "this is who you are" than to "do not do X." The previous FM session instructions were 10 lines of mostly negative rules with a 5-pair few-shot block. This rewrite collapses the rules to four positive lines and trims the few-shot block to two pairs (one prose, one code), shrinking the instruction prefix that lands in Apple's 4096-token shared context and re-pointing the chat prior at the actual task.

Stacked on feat/fm-streaming. Use the eval suite from the parent stack PR to compare drift / mid-word truncation rates against the previous instruction text.

Validation

xcodebuild test ... -only-testing:CotabbyTests/FoundationModelPromptRendererTests -only-testing:CotabbyTests/SuggestionEngineRouterTests CODE_SIGNING_ALLOWED=NO
Executed 9 tests, with 0 failures (0 unexpected)

swiftlint lint --quiet → no violations.

Manual: pick prefixes that historically drifted ("Hey Jacob, ", "Thanks for ", "Hi Sarah,\n\n") and verify the suggestion continues the sentence in voice instead of greeting the user back.

Linked issues

Refs the FM-quality investigation.

Risk / rollout notes

  • Pure prompt-policy change. No protocol, request shape, or engine wiring change.
  • Three test assertions changed shape: the old ones pinned on phrases ("text-continuation engine", "Do not repeat or quote the existing text.") that no longer appear. The new ones lock in the positive identity ("complete partially-typed text"), the output contract anchor ("Output the continuation only:"), specific forbidden-content tokens ("no greeting", "no markdown"), and the style line.
  • A new test (test_sessionInstructions_includeExactlyTwoContinuationExamples) pins the few-shot count to two so the trim cannot silently regrow.
  • The instruction-prefix cache from PR 2 (perf/fm-session-reuse-prewarm) is keyed on the rendered instructions string. After this lands, the first request on each session rebuilds the cache once with the shorter instructions; subsequent requests reuse it. No special migration needed.

Greptile Summary

This PR replaces the FM session instruction block with a shorter, positively-framed identity following Apple's WWDC25 prompt-design guidance, and trims the few-shot demonstration set from five pairs to two (one prose salutation, one code). No request shape, engine wiring, or output-normalizer logic changes.

  • FoundationModelPromptRenderer.swift: the base rules array shrinks from 10 lines to 5 positive identity lines; continuationExampleLines shrinks from 10 strings (5 pairs) to 4 strings (2 pairs), keeping the prose-salutation and code cases that cover the two principal failure modes.
  • CotabbyTests/PromptPolicyTests.swift: three tests are renamed and rewritten to match the new instruction text, and a new test_sessionInstructions_includeExactlyTwoContinuationExamples is added to pin the few-shot count so a silent regrowth would fail fast.

Confidence Score: 5/5

Safe to merge — this is a pure prompt-text change with no logic, protocol, or wiring modifications.

Every changed line is a string constant or a test assertion that mirrors it. The instruction-prefix cache rebuild on first request after landing is acknowledged and requires no migration. The new tests are accurate, and the few-shot trim is deliberate and well-reasoned.

No files require special attention.

Important Files Changed

Filename Overview
Cotabby/Support/FoundationModelPromptRenderer.swift Rewrites FM session instructions from 10 negative prohibition lines + 5 few-shot pairs to 5 positive identity lines + 2 few-shot pairs; production logic is unchanged — the diff is purely string constants.
CotabbyTests/PromptPolicyTests.swift Three test renames/rewrites plus a new count-pinning test; assertions are accurate for the new instruction text, and the coverage split across tests is appropriate.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[SuggestionRequest] --> B[sessionInstructions]
    B --> C[5 positive identity lines]
    C --> C1["1. Positive identity: 'You complete partially-typed text'"]
    C --> C2["2. Output contract: no greeting/sign-off/quotes/markdown/labels/explanation"]
    C --> C3["3. Anti-echo guard: Continue from position after existing text"]
    C --> C4["4. Style match: language, register, casing, punctuation"]
    C --> C5["5. Context usage: clipboard/screen only when helpful"]
    B --> D{languageInstruction?}
    D -- yes --> E[Append language hint]
    D -- no --> F[Skip]
    E --> G["Examples header + 2 few-shot pairs"]
    F --> G
    G --> G1["Pair 1: prose salutation (anti-restart)"]
    G --> G2["Pair 2: code prefix → code continuation"]
    G --> H{customRules?}
    H -- yes --> I["'Your style preferences:' + rules + subordination line"]
    H -- no --> J[Skip]
    I --> K[lines.joined separator newline]
    J --> K
    K --> L[Instructions string → Apple FM channel]
Loading

Reviews (4): Last reviewed commit: "Restore no-echo rule to prevent empty co..." | Re-trigger Greptile

Comment thread CotabbyTests/PromptPolicyTests.swift Outdated
Comment thread CotabbyTests/PromptPolicyTests.swift
FuJacob added a commit that referenced this pull request May 28, 2026
- Few-shot count assertion now scopes the Continuation: split to the
  examples section so injected language hints or custom rules cannot
  inflate the count.
- Output-contract assertion pins each forbidden-content token
  individually so a future wording change can't silently drop one.
FuJacob added 3 commits May 28, 2026 03:15
- Few-shot count assertion now scopes the Continuation: split to the
  examples section so injected language hints or custom rules cannot
  inflate the count.
- Output-contract assertion pins each forbidden-content token
  individually so a future wording change can't silently drop one.
Local FM eval run on the full stack (with #336 bounded to single-turn
sessions) showed two cases (codeComment "// This is a workaround for the
bug in ", prose "The Swift compiler enforces optionals because ") that
the model echoed verbatim instead of continuing. The normalizer correctly
strips the echo, but the user-visible result is an empty suggestion.

These cases passed when this PR was first measured because the
unconditional session reuse left a growing transcript of prior
(continue-do-not-echo) demonstrations on every later request — implicit
in-context learning that masked the rule removal. Once the engine is
bounded to single-turn sessions (#336 follow-up), the rule has to be in
the instructions channel for every request, not implicit in transcript
history.

The new rule pairs positive framing ("Continue from the position
immediately after the existing text") with the explicit prohibition that
was removed, keeping the spirit of WWDC25's positive-identity guidance
while restoring the hard constraint. Eval after this change: drift=3,
midword=10, empty=0, noise=0 — same shape as the #335 baseline.

A new test_sessionInstructions_forbidEchoingExistingText assertion pins
both clauses so a future rewrite cannot silently drop them again.
@FuJacob FuJacob force-pushed the refactor/fm-positive-frame branch from 6f1afe5 to 3a129d0 Compare May 28, 2026 10:16
@FuJacob FuJacob changed the base branch from feat/fm-streaming to main May 28, 2026 10:16
@FuJacob FuJacob merged commit d991110 into main May 28, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant