Skip to content

Polish: menu-bar language picker, Apple Intelligence drift fix, drop beta labels#244

Merged
FuJacob merged 3 commits into
mainfrom
ai-polish-followup
May 25, 2026
Merged

Polish: menu-bar language picker, Apple Intelligence drift fix, drop beta labels#244
FuJacob merged 3 commits into
mainfrom
ai-polish-followup

Conversation

@FuJacob
Copy link
Copy Markdown
Owner

@FuJacob FuJacob commented May 25, 2026

Summary

Follow-up to #239 carrying three commits that landed on the branch after that PR was squash-merged, so they never reached main:

  • Menu-bar response-language picker (parity with the Settings picker).
  • Reduce Apple Intelligence (Foundation Models) assistant drift: reframe instructions as a text-continuation engine, drop the user's name from FM instructions (top trigger for "Jacob, how are you"; llama still personalizes), forbid refusals/apologies, and add few-shot continuation examples. Adds a flag-gated live eval (FoundationModelDriftEvalTests) that is -skip-testing'd in CI.
  • Drop the [BETA]/[EXPERIMENTAL] labels and the "does not perform as well" disclaimer for Apple Intelligence (app + README).

Validation

xcodebuild -project Cotabby.xcodeproj -scheme Cotabby -destination 'platform=macOS' build   # ** BUILD SUCCEEDED **
xcodebuild test ... -only-testing:CotabbyTests/FoundationModelPromptRendererTests             # ** TEST SUCCEEDED **

Local FM drift eval (RUN_FM_EVAL compile flag): assistant drift down to 1/10; the holdout is an Apple guardrail refusal on an empty-body greeting, not a prompt issue. The eval needs the on-device model and is non-deterministic, so it self-skips without the flag and is -skip-testing'd in CI.

Linked issues

Refs #239.

Risk / rollout notes

  • Prompt-only change for Apple Intelligence behavior; llama path unchanged (still uses the name).
  • New test file registered in project.pbxproj; excluded from CI via -skip-testing + a compile-flag gate.

Greptile Summary

This PR delivers three follow-up changes after #239: a Language picker in the menu bar (matching the existing Settings picker), a prompt-engineering overhaul for Apple Intelligence to reduce assistant drift (text-continuation identity + five few-shot examples, dropping the username injection), and removal of the [BETA]/[EXPERIMENTAL] labels and performance disclaimer. A flag-gated local drift eval harness (FoundationModelDriftEvalTests) is also added and correctly excluded from CI.

  • Menu-bar language picker: mirrors the Settings picker pattern exactly; the new selectedLanguageBinding follows the same Binding(get:set:) shape as all other bindings in MenuBarView.
  • FM prompt rewrite: replaces negative prohibitions with a positive text-continuation identity and five few-shot examples targeting the specific failure modes (greeting prefixes, pleasantries, re-starting sentences); username injection is removed with a clear comment linking to why and where personalization still lives.
  • Project changes outside the stated scope: CODE_SIGN_IDENTITY is changed from ad-hoc (\"-\") to \"Apple Development\" in both Debug and Release configs, and LSApplicationCategoryType is added — the signing change could affect local builds for contributors without an Apple Developer certificate.

Confidence Score: 4/5

Safe to merge; the prompt and UI changes are well-scoped, the new eval test is correctly double-gated, and CI is unaffected by any of the changes.

The functional changes (language picker, prompt rewrite, label removal) are clean and well-tested. The only concerns are a false-negative edge case in the local-only drift eval harness and an unrelated signing identity change in the project file that could surprise contributors building locally without an Apple Developer certificate.

Cotabby.xcodeproj/project.pbxproj — the CODE_SIGN_IDENTITY change is unrelated to the stated PR scope and worth a quick second look. CotabbyTests/FoundationModelDriftEvalTests.swift — the isDrift helper has a minor false-negative edge case in the greeting-opener check.

Important Files Changed

Filename Overview
Cotabby.xcodeproj/project.pbxproj Registers FoundationModelDriftEvalTests.swift; also changes CODE_SIGN_IDENTITY from ad-hoc to Apple Development in both Debug and Release, and adds LSApplicationCategoryType — both unrelated to the PR scope and potentially breaking for contributors without Apple Developer certs.
Cotabby/Support/FoundationModelPromptRenderer.swift Rewrites session instructions with a text-continuation identity, drops username injection, and appends five few-shot examples targeting observed drift failure modes; well-commented and backed by new unit tests.
Cotabby/UI/MenuBarView.swift Adds a Language picker row (MenuBarPickerRow + selectedLanguageBinding) matching the existing Settings picker pattern; implementation is clean and consistent with the surrounding binding style.
CotabbyTests/FoundationModelDriftEvalTests.swift New flag-gated local eval harness for FM drift; correctly double-gated and excluded from CI, but isDrift has a false-negative edge case when the prefix starts with a greeting opener and uses contains instead of hasPrefix for the exclusion check.
CotabbyTests/PromptPolicyTests.swift Tests updated to reflect new prompt contract: checks for text-continuation identity, asserts username is omitted, and guards presence of few-shot examples; all three new/modified tests are precise and meaningful.
.github/workflows/tests.yml Adds -skip-testing:CotabbyTests/FoundationModelDriftEvalTests with an explanatory comment; CI intent is clear and the change is correct.
Cotabby/Models/SuggestionEngineModels.swift Removes [BETA] suffix from Apple Intelligence display label; one-line change, no issues.
Cotabby/UI/WelcomeView.swift Removes [BETA] from the engine card title; clean one-line change.
README.md Drops [EXPERIMENTAL] label and does-not-perform-as-well disclaimer for Apple Intelligence; accurate given the prompt improvements.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[SuggestionRequest] --> B[sessionInstructions]
    B --> C[Base rules block\ntext-continuation identity\nno greet / no refuse / no repeat]
    C --> D{Language override?}
    D -- yes --> E[Append languageInstruction]
    D -- no --> F[Skip]
    E --> G[Append few-shot\ncontinuationExampleLines]
    F --> G
    G --> H{Custom style rules?}
    H -- yes --> I[Append style rules\nsubordinated to contract]
    H -- no --> J[Skip]
    I --> K[lines.joined separator newline]
    J --> K
    K --> L[Instructions string to Apple Foundation Models]

    M[MenuBarView] --> N[Language Picker]
    N --> O[selectedLanguageBinding\nget: suggestionSettings.responseLanguage\nset: setResponseLanguage]
    O --> P[SuggestionSettingsModel]
    P --> Q[Persisted to UserDefaults]
Loading

Fix All in Codex Fix All in Claude Code

Reviews (1): Last reviewed commit: "Drop Apple Intelligence beta/experimenta..." | Re-trigger Greptile

Greptile also left 2 inline comments on this PR.

FuJacob added 3 commits May 25, 2026 04:08
The chat-tuned system model kept breaking character — greeting the user by
name, adding pleasantries ('Hope it's going well'), and replying like an
assistant. Reshape the Foundation Models prompt and add a measurement harness:

- Reframe instructions as a text-continuation engine (not 'assist the user'),
  drop the user's name (the top trigger for 'Jacob, how are you'; llama keeps
  it), forbid refusals/apologies, and add few-shot continuation examples.
- FoundationModelDriftEvalTests: flag-gated live eval over known-bad prefixes,
  reporting drift rate. Needs the on-device model + is non-deterministic, so
  it's gated behind the RUN_FM_EVAL compile flag and -skip-testing in CI.

Local eval: assistant drift down to 1/10 (the holdout is an Apple guardrail
refusal on an empty-body greeting, not a prompt issue).
Remove the [BETA] tag from the engine label (onboarding + settings/menu) and
the [EXPERIMENTAL] tag plus the 'does not perform as well' disclaimer from the
README. Present Apple Intelligence as a first-class engine.
@FuJacob FuJacob merged commit 143e5a3 into main May 25, 2026
3 checks passed
@FuJacob FuJacob deleted the ai-polish-followup branch May 25, 2026 11:10
Comment on lines +100 to +103
let prefixLower = prefix.lowercased()
return openers.contains { opener in
trimmed.hasPrefix(opener) && !prefixLower.contains(opener)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Drift detection has false negatives for greeting-opener prefixes

prefixLower.contains(opener) is used to exempt continuations when the prefix already includes the opener, but contains is broader than hasPrefix. For the test case "Hey Jacob, ", prefixLower.contains("hey ") is true, so a re-greeting output that starts with "hey " (e.g. "hey there, how are you") bypasses the opener check entirely and is only caught if its content happens to match a driftTells phrase. The exclusion was intended to handle "finishing Hi Sa… → rah", which only requires checking whether the prefix starts with the opener. Switching to prefixLower.hasPrefix(opener) would correctly allow completion of an in-progress greeting while still flagging a fresh one.

A second minor inconsistency: "hello" in openers has no trailing space, while all other entries ("hi ", "hey ", "dear ", etc.) do. This means a continuation like "helloooo" would be flagged as drift, which is a false positive unlikely to occur in practice but asymmetric with the rest of the list.

Fix in Codex Fix in Claude Code

Comment on lines 439 to 443
buildSettings = {
ASSETCATALOG_COMPILER_APPICON_NAME = AppIcon;
ASSETCATALOG_COMPILER_GLOBAL_ACCENT_COLOR_NAME = AccentColor;
"CODE_SIGN_IDENTITY[sdk=macosx*]" = "-";
"CODE_SIGN_IDENTITY[sdk=macosx*]" = "Apple Development";
CODE_SIGN_STYLE = Automatic;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unrelated signing identity change may break local builds without Apple Developer cert

Both Debug and Release configurations for the Cotabby app target are changed from ad-hoc signing ("-") to "Apple Development". This change is unrelated to the PR's stated scope and, with CODE_SIGN_STYLE = Automatic, will cause Xcode to look up an Apple Developer identity in the keychain; contributors without a paid or free Apple Developer account enrolled in Xcode would see a "No signing certificate found" error. CI is unaffected because CODE_SIGNING_ALLOWED=NO is passed in tests.yml. If this change is intentional (e.g. to support entitlements that require a provisioning profile), a brief note in the PR description or a comment in the .pbxproj would help future contributors understand the requirement.

Fix in Codex Fix in Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant