Polish: menu-bar language picker, Apple Intelligence drift fix, drop beta labels#244
Conversation
The chat-tuned system model kept breaking character — greeting the user by
name, adding pleasantries ('Hope it's going well'), and replying like an
assistant. Reshape the Foundation Models prompt and add a measurement harness:
- Reframe instructions as a text-continuation engine (not 'assist the user'),
drop the user's name (the top trigger for 'Jacob, how are you'; llama keeps
it), forbid refusals/apologies, and add few-shot continuation examples.
- FoundationModelDriftEvalTests: flag-gated live eval over known-bad prefixes,
reporting drift rate. Needs the on-device model + is non-deterministic, so
it's gated behind the RUN_FM_EVAL compile flag and -skip-testing in CI.
Local eval: assistant drift down to 1/10 (the holdout is an Apple guardrail
refusal on an empty-body greeting, not a prompt issue).
Remove the [BETA] tag from the engine label (onboarding + settings/menu) and the [EXPERIMENTAL] tag plus the 'does not perform as well' disclaimer from the README. Present Apple Intelligence as a first-class engine.
| let prefixLower = prefix.lowercased() | ||
| return openers.contains { opener in | ||
| trimmed.hasPrefix(opener) && !prefixLower.contains(opener) | ||
| } |
There was a problem hiding this comment.
Drift detection has false negatives for greeting-opener prefixes
prefixLower.contains(opener) is used to exempt continuations when the prefix already includes the opener, but contains is broader than hasPrefix. For the test case "Hey Jacob, ", prefixLower.contains("hey ") is true, so a re-greeting output that starts with "hey " (e.g. "hey there, how are you") bypasses the opener check entirely and is only caught if its content happens to match a driftTells phrase. The exclusion was intended to handle "finishing Hi Sa… → rah", which only requires checking whether the prefix starts with the opener. Switching to prefixLower.hasPrefix(opener) would correctly allow completion of an in-progress greeting while still flagging a fresh one.
A second minor inconsistency: "hello" in openers has no trailing space, while all other entries ("hi ", "hey ", "dear ", etc.) do. This means a continuation like "helloooo" would be flagged as drift, which is a false positive unlikely to occur in practice but asymmetric with the rest of the list.
| buildSettings = { | ||
| ASSETCATALOG_COMPILER_APPICON_NAME = AppIcon; | ||
| ASSETCATALOG_COMPILER_GLOBAL_ACCENT_COLOR_NAME = AccentColor; | ||
| "CODE_SIGN_IDENTITY[sdk=macosx*]" = "-"; | ||
| "CODE_SIGN_IDENTITY[sdk=macosx*]" = "Apple Development"; | ||
| CODE_SIGN_STYLE = Automatic; |
There was a problem hiding this comment.
Unrelated signing identity change may break local builds without Apple Developer cert
Both Debug and Release configurations for the Cotabby app target are changed from ad-hoc signing ("-") to "Apple Development". This change is unrelated to the PR's stated scope and, with CODE_SIGN_STYLE = Automatic, will cause Xcode to look up an Apple Developer identity in the keychain; contributors without a paid or free Apple Developer account enrolled in Xcode would see a "No signing certificate found" error. CI is unaffected because CODE_SIGNING_ALLOWED=NO is passed in tests.yml. If this change is intentional (e.g. to support entitlements that require a provisioning profile), a brief note in the PR description or a comment in the .pbxproj would help future contributors understand the requirement.
Summary
Follow-up to #239 carrying three commits that landed on the branch after that PR was squash-merged, so they never reached
main:FoundationModelDriftEvalTests) that is-skip-testing'd in CI.[BETA]/[EXPERIMENTAL]labels and the "does not perform as well" disclaimer for Apple Intelligence (app + README).Validation
Local FM drift eval (RUN_FM_EVAL compile flag): assistant drift down to 1/10; the holdout is an Apple guardrail refusal on an empty-body greeting, not a prompt issue. The eval needs the on-device model and is non-deterministic, so it self-skips without the flag and is
-skip-testing'd in CI.Linked issues
Refs #239.
Risk / rollout notes
project.pbxproj; excluded from CI via-skip-testing+ a compile-flag gate.Greptile Summary
This PR delivers three follow-up changes after #239: a Language picker in the menu bar (matching the existing Settings picker), a prompt-engineering overhaul for Apple Intelligence to reduce assistant drift (text-continuation identity + five few-shot examples, dropping the username injection), and removal of the
[BETA]/[EXPERIMENTAL]labels and performance disclaimer. A flag-gated local drift eval harness (FoundationModelDriftEvalTests) is also added and correctly excluded from CI.selectedLanguageBindingfollows the sameBinding(get:set:)shape as all other bindings inMenuBarView.CODE_SIGN_IDENTITYis changed from ad-hoc (\"-\") to\"Apple Development\"in both Debug and Release configs, andLSApplicationCategoryTypeis added — the signing change could affect local builds for contributors without an Apple Developer certificate.Confidence Score: 4/5
Safe to merge; the prompt and UI changes are well-scoped, the new eval test is correctly double-gated, and CI is unaffected by any of the changes.
The functional changes (language picker, prompt rewrite, label removal) are clean and well-tested. The only concerns are a false-negative edge case in the local-only drift eval harness and an unrelated signing identity change in the project file that could surprise contributors building locally without an Apple Developer certificate.
Cotabby.xcodeproj/project.pbxproj — the CODE_SIGN_IDENTITY change is unrelated to the stated PR scope and worth a quick second look. CotabbyTests/FoundationModelDriftEvalTests.swift — the isDrift helper has a minor false-negative edge case in the greeting-opener check.
Important Files Changed
Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A[SuggestionRequest] --> B[sessionInstructions] B --> C[Base rules block\ntext-continuation identity\nno greet / no refuse / no repeat] C --> D{Language override?} D -- yes --> E[Append languageInstruction] D -- no --> F[Skip] E --> G[Append few-shot\ncontinuationExampleLines] F --> G G --> H{Custom style rules?} H -- yes --> I[Append style rules\nsubordinated to contract] H -- no --> J[Skip] I --> K[lines.joined separator newline] J --> K K --> L[Instructions string to Apple Foundation Models] M[MenuBarView] --> N[Language Picker] N --> O[selectedLanguageBinding\nget: suggestionSettings.responseLanguage\nset: setResponseLanguage] O --> P[SuggestionSettingsModel] P --> Q[Persisted to UserDefaults]Reviews (1): Last reviewed commit: "Drop Apple Intelligence beta/experimenta..." | Re-trigger Greptile