Switch 6 low-complexity Gemini consumers from Pro to Flash#6101
Conversation
Memory extraction is text+vision with JSON schema response, no tool loop. Flash handles this pattern well at 6x lower cost. Part of #6098 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Text-only task re-ranking by relevance. Runs every 5min. No vision, no tool loop — Flash-safe at 6x lower cost. Part of #6098 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Text-only task pair comparison for duplicates. Hourly with cooldown. No vision, no tool loop — Flash-safe at 6x lower cost. Part of #6098 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Text-only goal normalization from user input. On-demand, rare invocation. No vision, no tool loop — Flash-safe at 6x lower cost. Part of #6098 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Daily profile synthesis from local data. Runs once/day. Text-only with no tool loop — Flash-safe at 6x lower cost. Part of #6098 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Text-only note generation from transcription segments. Simple prompt with 3-10 word output. No vision, no tool loop — Flash-safe at 6x lower cost. Part of #6098 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…init Reviewer feedback: the model constant + GeminiClient(model: model) broke the consistency rule. Use default GeminiClient() like all other migrated consumers. Part of #6098 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reviewer feedback: log string would drift if default model changes. Use generic 'default (Flash)' instead. Part of #6098 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Greptile SummaryThis PR switches 6 Gemini AI consumers from
Confidence Score: 5/5Safe to merge — all changes are model-string-only with no logic, API, or UI impact. The single finding is a P2 style inconsistency in AIUserProfileService (hardcoded model string vs. default initializer); functionally the correct model is used. All remaining changes are clean and low-risk. desktop/Desktop/Sources/ProactiveAssistants/Services/AIUserProfileService.swift — minor inconsistency with the stated drift-prevention approach. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
subgraph Flash["✅ Switched to Flash (gemini-3-flash-preview)"]
LN["LiveNotesMonitor\ntext-only, on transcription"]
GA["GoalsAIService\ntext-only, on-demand"]
MA["MemoryAssistant\ntext+vision, no tool loop"]
TD["TaskDeduplicationService\ntext-only, hourly"]
TP["TaskPrioritizationService\ntext-only, every 5 min"]
UP["AIUserProfileService\ntext-only, daily"]
end
subgraph Pro["🔒 Kept on Pro (gemini-pro-latest)"]
AA["AdviceAssistant\nmulti-turn SQL loop + vision"]
TA["TaskAssistant\nmulti-turn, 5 tools, up to 5 iter"]
end
GC["GeminiClient\ndefault = gemini-3-flash-preview"]
LN -->|"GeminiClient()"| GC
GA -->|"GeminiClient()"| GC
MA -->|"GeminiClient(apiKey:)"| GC
TD -->|"GeminiClient()"| GC
TP -->|"GeminiClient()"| GC
UP -->|"GeminiClient(model: hardcoded ⚠️)"| GC
AA -->|"GeminiClient(model: 'gemini-pro-latest')"| GC
TA -->|"GeminiClient(model: 'gemini-pro-latest')"| GC
Reviews (1): Last reviewed commit: "Switch LiveNotesMonitor from Gemini Pro ..." | Re-trigger Greptile |
| static let shared = AIUserProfileService() | ||
|
|
||
| private let model = "gemini-pro-latest" | ||
| private let maxProfileLength = 10000 |
There was a problem hiding this comment.
Hardcoded model string contradicts stated approach
The PR description explicitly states: "uses the default GeminiClient() initializer (which defaults to gemini-3-flash-preview) instead of hardcoding the Flash model string, preventing model drift if the default changes."
However, AIUserProfileService is the only one of the 6 migrated files that still hardcodes the model name as a property and passes it explicitly via GeminiClient(model: model) (line 239). If GeminiClient's default changes later, this service will silently diverge while the other five automatically follow.
For consistency with the rest of the PR and to benefit from the drift-prevention rationale, drop the model property and call GeminiClient() directly at the call site:
| private let maxProfileLength = 10000 | |
| private let maxProfileLength = 10000 |
Then at line 239:
let gemini = try GeminiClient()
Live Test Evidence (CP9A/CP9B)Changed-Path Coverage
L1 — Build on Mac Mini (M4, macOS 26.3.1)All 6 changed files compiled and linked successfully: L2 — IntegrationBackend Rust proxy ( L1/L2 SynthesisAll 7 changed paths (P1-P7) proven at L1 via successful compilation and linking. Changes are model-string-only with no logic, API, or UI modifications. L2 integration is satisfied by the proxy being a pass-through and Flash already in production use by other consumers through the same code path. by AI for @beastoin |
Summary
Switches 6 Gemini AI consumers from
gemini-pro-latest($0.018/call) togemini-3-flash-preview($0.003/call) — a 6x cost reduction for these call paths.Migrated to Flash (6 files):
MemoryAssistant— text+vision, JSON schema response, no tool loopTaskPrioritizationService— text-only task re-ranking, runs every 5minTaskDeduplicationService— text-only duplicate comparison, hourlyGoalsAIService— text-only goal normalization, on-demandAIUserProfileService— text-only daily profile synthesisLiveNotesMonitor— text-only note generation from transcriptionKept on Pro (2 files, unchanged):
AdviceAssistant— multi-turn SQL investigation loop (3-7 iterations) + vision. Primary cost driver, needs Pro reasoning.TaskAssistant— multi-turn tool loop with 5 tools + vision, up to 5 iterations. Needs Pro for reliable tool selection.Approach: Per Codex consultation, uses the default
GeminiClient()initializer (which defaults togemini-3-flash-preview) instead of hardcoding the Flash model string, preventing model drift if the default changes.Cost Impact
Based on 3-day metrics (Mar 25-27):
$957/day ($29K/month), 68% Pro calls$100-150/day ($3-4.5K/month)Testing
GeminiClientdefault verified:gemini-3-flash-preview(line 229)Risks
Closes #6098 (L1 phase)
by AI for @beastoin