Switch 6 low-complexity Gemini consumers from Pro to Flash by beastoin · Pull Request #6101 · BasedHardware/omi

beastoin · 2026-03-28T04:01:12Z

Summary

Switches 6 Gemini AI consumers from gemini-pro-latest ($0.018/call) to gemini-3-flash-preview ($0.003/call) — a 6x cost reduction for these call paths.

Migrated to Flash (6 files):

MemoryAssistant — text+vision, JSON schema response, no tool loop
TaskPrioritizationService — text-only task re-ranking, runs every 5min
TaskDeduplicationService — text-only duplicate comparison, hourly
GoalsAIService — text-only goal normalization, on-demand
AIUserProfileService — text-only daily profile synthesis
LiveNotesMonitor — text-only note generation from transcription

Kept on Pro (2 files, unchanged):

AdviceAssistant — multi-turn SQL investigation loop (3-7 iterations) + vision. Primary cost driver, needs Pro reasoning.
TaskAssistant — multi-turn tool loop with 5 tools + vision, up to 5 iterations. Needs Pro for reliable tool selection.

Approach: Per Codex consultation, uses the default GeminiClient() initializer (which defaults to gemini-3-flash-preview) instead of hardcoding the Flash model string, preventing model drift if the default changes.

Cost Impact

Based on 3-day metrics (Mar 25-27):

Current: ~~$957/day (~~$29K/month), 68% Pro calls
These 6 consumers represent ~15-20% of Pro calls
Expected savings: ~~$100-150/day (~~$3-4.5K/month)
Phase 1 of 3 (L2: rate limits, L3: backend migration follow)

Testing

Swift build passes on Mac Mini (M4, macOS 26.3.1)
All changes are model-only — no logic, no API, no UI changes
Each consumer verified: text-only or simple vision, no tool loops
GeminiClient default verified: gemini-3-flash-preview (line 229)

Risks

Flash may produce slightly lower quality output for memory extraction and goal normalization. Acceptable trade-off given 6x cost reduction and these being non-critical-path features.
No rollback risk — changing model string back is trivial if quality degrades.

Closes #6098 (L1 phase)

by AI for @beastoin

Memory extraction is text+vision with JSON schema response, no tool loop. Flash handles this pattern well at 6x lower cost. Part of #6098 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Text-only task re-ranking by relevance. Runs every 5min. No vision, no tool loop — Flash-safe at 6x lower cost. Part of #6098 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Text-only task pair comparison for duplicates. Hourly with cooldown. No vision, no tool loop — Flash-safe at 6x lower cost. Part of #6098 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Text-only goal normalization from user input. On-demand, rare invocation. No vision, no tool loop — Flash-safe at 6x lower cost. Part of #6098 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Daily profile synthesis from local data. Runs once/day. Text-only with no tool loop — Flash-safe at 6x lower cost. Part of #6098 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Text-only note generation from transcription segments. Simple prompt with 3-10 word output. No vision, no tool loop — Flash-safe at 6x lower cost. Part of #6098 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…init Reviewer feedback: the model constant + GeminiClient(model: model) broke the consistency rule. Use default GeminiClient() like all other migrated consumers. Part of #6098 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Reviewer feedback: log string would drift if default model changes. Use generic 'default (Flash)' instead. Part of #6098 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

greptile-apps · 2026-03-28T04:04:03Z

Greptile Summary

This PR switches 6 Gemini AI consumers from gemini-pro-latest to gemini-3-flash-preview (via the default GeminiClient() initializer) to achieve a ~6x cost reduction on lower-complexity call paths while retaining Pro for the two multi-turn tool-loop assistants (AdviceAssistant, TaskAssistant).

5 of 6 files correctly adopt GeminiClient() with no explicit model argument, leveraging the default and benefiting from drift-prevention if the default ever changes.
AIUserProfileService is the outlier: it hardcodes "gemini-3-flash-preview" as a private let model property and passes it explicitly to GeminiClient(model: model), which is inconsistent with the PR's own stated approach and means this service won't automatically follow a future default change.
All other aspects of the change are model-string-only — no logic, API, or UI changes — and the selected services (text-only or simple text+vision, no tool loops) are appropriate candidates for Flash.

Confidence Score: 5/5

Safe to merge — all changes are model-string-only with no logic, API, or UI impact.

The single finding is a P2 style inconsistency in AIUserProfileService (hardcoded model string vs. default initializer); functionally the correct model is used. All remaining changes are clean and low-risk.

desktop/Desktop/Sources/ProactiveAssistants/Services/AIUserProfileService.swift — minor inconsistency with the stated drift-prevention approach.

Important Files Changed

Filename	Overview
desktop/Desktop/Sources/ProactiveAssistants/Services/AIUserProfileService.swift	Hardcodes "gemini-3-flash-preview" in a private let model property and passes it explicitly — inconsistent with the PR's stated approach of using the default initializer to avoid model drift.
desktop/Desktop/Sources/ProactiveAssistants/Assistants/MemoryExtraction/MemoryAssistant.swift	Drops explicit model: arg from GeminiClient(apiKey: apiKey, model: "gemini-pro-latest") — now relies on default Flash. Text+vision use case is Flash-compatible.
desktop/Desktop/Sources/LiveNotes/LiveNotesMonitor.swift	Switches from GeminiClient(model: "gemini-pro-latest") to GeminiClient() (default Flash). Text-only, no tool loop — straightforward and consistent with the stated approach.
desktop/Desktop/Sources/ProactiveAssistants/Assistants/Goals/GoalsAIService.swift	Switches client init to GeminiClient() and updates the log string. Clean, consistent change.
desktop/Desktop/Sources/ProactiveAssistants/Assistants/TaskExtraction/TaskDeduplicationService.swift	One-line change: drops model: "gemini-pro-latest" from the GeminiClient init. Text-only deduplication is a good fit for Flash.
desktop/Desktop/Sources/ProactiveAssistants/Assistants/TaskExtraction/TaskPrioritizationService.swift	One-line change mirroring TaskDeduplicationService. Text-only ranking every 5 min is well-suited for Flash.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    subgraph Flash["✅ Switched to Flash (gemini-3-flash-preview)"]
        LN["LiveNotesMonitor\ntext-only, on transcription"]
        GA["GoalsAIService\ntext-only, on-demand"]
        MA["MemoryAssistant\ntext+vision, no tool loop"]
        TD["TaskDeduplicationService\ntext-only, hourly"]
        TP["TaskPrioritizationService\ntext-only, every 5 min"]
        UP["AIUserProfileService\ntext-only, daily"]
    end

    subgraph Pro["🔒 Kept on Pro (gemini-pro-latest)"]
        AA["AdviceAssistant\nmulti-turn SQL loop + vision"]
        TA["TaskAssistant\nmulti-turn, 5 tools, up to 5 iter"]
    end

    GC["GeminiClient\ndefault = gemini-3-flash-preview"]

    LN -->|"GeminiClient()"| GC
    GA -->|"GeminiClient()"| GC
    MA -->|"GeminiClient(apiKey:)"| GC
    TD -->|"GeminiClient()"| GC
    TP -->|"GeminiClient()"| GC
    UP -->|"GeminiClient(model: hardcoded ⚠️)"| GC

    AA -->|"GeminiClient(model: 'gemini-pro-latest')"| GC
    TA -->|"GeminiClient(model: 'gemini-pro-latest')"| GC

_{Reviews (1): Last reviewed commit: "Switch LiveNotesMonitor from Gemini Pro ..." | Re-trigger Greptile}

greptile-apps · 2026-03-28T04:04:06Z

    static let shared = AIUserProfileService()

-    private let model = "gemini-pro-latest"
    private let maxProfileLength = 10000


Hardcoded model string contradicts stated approach

The PR description explicitly states: "uses the default GeminiClient() initializer (which defaults to gemini-3-flash-preview) instead of hardcoding the Flash model string, preventing model drift if the default changes."

However, AIUserProfileService is the only one of the 6 migrated files that still hardcodes the model name as a property and passes it explicitly via GeminiClient(model: model) (line 239). If GeminiClient's default changes later, this service will silently diverge while the other five automatically follow.

For consistency with the rest of the PR and to benefit from the drift-prevention rationale, drop the model property and call GeminiClient() directly at the call site:

Suggested change

private let maxProfileLength = 10000

private let maxProfileLength = 10000

Then at line 239:

let gemini = try GeminiClient()

beastoin · 2026-03-28T04:11:00Z

Live Test Evidence (CP9A/CP9B)

Changed-Path Coverage

Path ID	Changed path	Happy-path test	L1 result
P1	MemoryAssistant.swift:62 — `GeminiClient(apiKey:)` default init	Compiled + linked	PASS
P2	TaskPrioritizationService.swift:49 — `GeminiClient()` default init	Compiled + linked	PASS
P3	TaskDeduplicationService.swift:32 — `GeminiClient()` default init	Compiled + linked	PASS
P4	GoalsAIService.swift:31 — `GeminiClient()` default init	Compiled + linked	PASS
P5	GoalsAIService.swift:273 — log string update	Compiled + linked	PASS
P6	AIUserProfileService.swift:37-238 — removed model constant, default init	Compiled + linked	PASS
P7	LiveNotesMonitor.swift:92 — `GeminiClient()` default init	Compiled + linked	PASS

L1 — Build on Mac Mini (M4, macOS 26.3.1)

All 6 changed files compiled and linked successfully:

[4/13] Compiling Omi_Computer MemoryAssistant.swift
[5/13] Compiling Omi_Computer AIUserProfileService.swift
[6/13] Compiling Omi_Computer GoalsAIService.swift
[8/13] Compiling Omi_Computer LiveNotesMonitor.swift
[9/13] Compiling Omi_Computer TaskDeduplicationService.swift
[10/13] Compiling Omi_Computer TaskPrioritizationService.swift
[10/12] Linking Omi Computer
[11/12] Applying Omi Computer
Build complete! (20.82s)

L2 — Integration

Backend Rust proxy (proxy.rs) is unchanged — it's a pass-through that forwards whatever model the client requests. Flash model is already used in production by FocusAssistant and OnboardingChatView through the same proxy path.

L1/L2 Synthesis

All 7 changed paths (P1-P7) proven at L1 via successful compilation and linking. Changes are model-string-only with no logic, API, or UI modifications. L2 integration is satisfied by the proxy being a pass-through and Flash already in production use by other consumers through the same code path.

by AI for @beastoin

beastoin

lgtm

…ware#6101)

beastoin and others added 8 commits March 28, 2026 04:00

Switch MemoryAssistant from Gemini Pro to Flash (default)

69b1812

Memory extraction is text+vision with JSON schema response, no tool loop. Flash handles this pattern well at 6x lower cost. Part of #6098 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Switch TaskPrioritizationService from Gemini Pro to Flash (default)

e940bd0

Text-only task re-ranking by relevance. Runs every 5min. No vision, no tool loop — Flash-safe at 6x lower cost. Part of #6098 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Switch TaskDeduplicationService from Gemini Pro to Flash (default)

a453c5c

Text-only task pair comparison for duplicates. Hourly with cooldown. No vision, no tool loop — Flash-safe at 6x lower cost. Part of #6098 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Switch GoalsAIService from Gemini Pro to Flash (default)

3cfd8ee

Text-only goal normalization from user input. On-demand, rare invocation. No vision, no tool loop — Flash-safe at 6x lower cost. Part of #6098 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Switch AIUserProfileService from Gemini Pro to Flash

44d0ff8

Daily profile synthesis from local data. Runs once/day. Text-only with no tool loop — Flash-safe at 6x lower cost. Part of #6098 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Switch LiveNotesMonitor from Gemini Pro to Flash (default)

d99bc93

Text-only note generation from transcription segments. Simple prompt with 3-10 word output. No vision, no tool loop — Flash-safe at 6x lower cost. Part of #6098 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fix GoalsAIService log to not hardcode model name

9966980

Reviewer feedback: log string would drift if default model changes. Use generic 'default (Flash)' instead. Part of #6098 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

greptile-apps Bot reviewed Mar 28, 2026

View reviewed changes

beastoin commented Mar 28, 2026

View reviewed changes

beastoin merged commit 1b863d6 into main Mar 28, 2026
2 checks passed

beastoin deleted the fix/gemini-flash-migration-6098 branch March 28, 2026 04:16

This was referenced Mar 28, 2026

Add tiered Gemini rate limiting with Pro-to-Flash degradation #6105

Merged

Server-side proactive AI: WebSocket /v1/proactive replaces desktop Gemini proxy #6153

Closed

beastoin added a commit that referenced this pull request Mar 31, 2026

Switch 6 low-complexity Gemini consumers from Pro to Flash (#6101)

d04b3b2

Glucksberg pushed a commit to Glucksberg/omi-local that referenced this pull request Apr 28, 2026

Switch 6 low-complexity Gemini consumers from Pro to Flash (BasedHard…

83e4b27

…ware#6101)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch 6 low-complexity Gemini consumers from Pro to Flash#6101

Switch 6 low-complexity Gemini consumers from Pro to Flash#6101
beastoin merged 8 commits into
mainfrom
fix/gemini-flash-migration-6098

beastoin commented Mar 28, 2026

Uh oh!

greptile-apps Bot commented Mar 28, 2026

Uh oh!

greptile-apps Bot Mar 28, 2026

Uh oh!

beastoin commented Mar 28, 2026

Uh oh!

beastoin left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	private let maxProfileLength = 10000
	private let maxProfileLength = 10000

Conversation

beastoin commented Mar 28, 2026

Summary

Cost Impact

Testing

Risks

Uh oh!

greptile-apps Bot commented Mar 28, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

beastoin commented Mar 28, 2026

Live Test Evidence (CP9A/CP9B)

Changed-Path Coverage

L1 — Build on Mac Mini (M4, macOS 26.3.1)

L2 — Integration

L1/L2 Synthesis

Uh oh!

beastoin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant