Skip to content

Floating bar chat latency: 6.6-11.1s per query — bottleneck analysis with logs #6981

@beastoin

Description

@beastoin

Summary

Floating bar chat takes 6.6-11.1s (avg 8.6s) to respond to "what do you see on my screen?" queries. Tested 7 runs on Omi Beta v0.11.358 (production build, Mac Mini M4, ACP/Claude OAuth).

Sub-second is the UX target. Current architecture makes 3-4s achievable with quick wins; sub-second requires on-device vision.

Test Setup

  • App: Omi Beta v0.11.358 (production)
  • Query: "what do you see on my screen?" × 7 runs via floating bar
  • Provider: ACP (Claude OAuth) → claude-sonnet-4-6
  • Machine: Mac Mini M4, 24GB RAM, macOS Tahoe
  • System prompt: 27,387 chars (~36K tokens)

Results

Run Total Screenshot Quota Check Session LLM API Save/Sync Cache
1 9,520ms 36ms 1,629ms 689ms 6,600ms 566ms MISS
2 8,986ms 136ms 1,215ms 1ms 5,917ms 1,717ms MISS
3 7,918ms 133ms 1,885ms 1ms 5,242ms 657ms HIT
4 7,724ms 137ms 1,762ms 2ms 5,324ms 499ms MISS
5 8,077ms 138ms 1,794ms 1ms HIT
6 11,114ms 2ms 2,608ms 1ms 6,450ms 2,053ms HIT
7 6,563ms 1ms 714ms 1ms 5,381ms 466ms HIT
AVG 8,557ms 83ms 1,658ms 99ms 4,987ms 851ms

Pipeline Waterfall (best case — Run 7, 6.6s)

T+0ms     Screenshot capture (CGDisplayCreateImage → WebP 134KB)
T+1ms     Query dispatched to ChatProvider.sendMessage()
T+1ms     AgentBridge.query() → await fetchChatUsageQuota()  ← BLOCKS HERE
T+715ms   Quota OK → JSON serialized (base64 image + 27K system prompt)
T+716ms   Sent to Node.js bridge via stdin pipe
T+716ms   ACP session reused (key=floating, pre-warmed)
T+716ms   → Claude Sonnet 4.6 API call starts  ← BLOCKS HERE
T+6097ms  LLM response complete (5,381ms inference)
T+6097ms  Save to backend + Firebase sync + analytics
T+6563ms  DONE

3 Bottlenecks

1. LLM Inference — 5.2-6.6s (82% of best-case time)

System prompt is ~36K tokens:

Component Size Needed for floating bar?
base_template 9,542 chars Yes (persona, instructions)
schema 12,280 chars (45 tables) No — floating bar doesn't use SQL tools
context/memories 2,619 chars Partial
ai_profile 2,162 chars Yes
tasks 601 chars Maybe
goals 183 chars Maybe

Plus 1920×1080 screenshot (120-134 KB WebP → ~163 KB base64) for vision processing.

Prompt cache hit rate: 57% (4/7 runs). Cache misses add ~800ms to inference and cost 8× more ($0.24 vs $0.03 per query).

Code: System prompt built at ChatProvider.swift:857, cached in cachedMainSystemPrompt. Same prompt used for main chat and floating bar — no differentiation.

2. Quota Check — 0.7-2.6s (11-23% of time)

Sequential blocking await before any query is sent:

// AgentBridge.swift:422
if let quota = await APIClient.shared.fetchChatUsageQuota(), !quota.allowed {
    throw BridgeError.quotaExceeded(...)
}

Endpoint: GET api.omi.me/v1/users/me/usage-quota

Not parallelized with screenshot capture, JSON serialization, or anything else. Pure serial wait.

3. Save/Sync — 0.5-2.1s (7-18% of time)

After LLM response is already rendered to the user:

  • POST api.omi.me/v2/desktop/messages (response persistence)
  • AgentSync push (Firebase)
  • PostHog event tracking
  • GoalsAI progress check

Code: ChatProvider.swift:2523-2570

Current vs Optimal Pipeline

CURRENT (sequential):
  Screenshot → Send → [WAIT quota 1.7s] → [WAIT LLM 5.7s] → [WAIT save 0.9s] → Done
                                                                                  8.6s avg

OPTIMAL (parallel + slim prompt):
  Screenshot ──┐
  Quota check ─┤ (parallel)
               └─► [LLM ~3-4s with slim prompt] → Done
                                                    └─► Save (background)
                                                    3-4s estimated

Proposed Fixes

# Change Savings Effort Code Location
1 Optimistic/cached quota check 1.0-1.8s Low AgentBridge.swift:422
2 Background save/sync 0.5-1.5s Low ChatProvider.swift:2523-2570
3 Slim floating bar prompt (drop schema, skills) 1.0-2.0s Medium ChatProvider.swift:857
4 Half-resolution screenshot (960×540) 0.5-1.0s Low ScreenCaptureManager.swift
Total 3.0-6.3s

Estimated new time: ~2-4s (down from 6.6-11.1s)

Sub-second requires on-device vision (Apple Vision + CoreML) or pre-computed responses — not achievable with cloud LLM.

Token Usage & Cost

Run Total Tokens CacheRead CacheWrite Cost
1 36,759 0 36,692 $0.231
2 38,388 0 38,331 $0.241
3 40,004 38,331 1,629 $0.030
4 41,613 0 41,576 $0.261
5 43,227 41,576 1,609 $0.032
6 44,843 43,185 1,614 $0.033
7 46,449 44,799 1,616 $0.033

Cache miss: ~$0.24/query. Cache hit: ~$0.03/query (8× cheaper).

Raw Logs

Run 1 — Cold start (9.5s)
[15:31:32.827] [app] ScreenCaptureManager: Screenshot captured 1920x1080, WebP 120 KB
[15:31:32.863] [app] PostHog: Tracked event 'floating_bar_query_sent'
[15:31:32.864] [app] ChatProvider loaded 50 memories from local DB
[15:31:32.865] [app] ChatProvider loaded 4 goals from local DB
[15:31:32.866] [app] ChatProvider loaded 8 tasks for context
[15:31:32.867] [app] ChatProvider loaded AI profile (generated 2026-04-23 12:59:33 +0000)
[15:31:32.868] [app] ChatProvider loaded schema for 45 tables
[15:31:32.869] [app] AgentBridge: starting with node=...node (exists=true), bridge=...index.js (exists=true)
[15:31:32.920] [app] AgentBridge stderr: [agent] Bridge main() starting (pid=55280, node=v22.14.0)
[15:31:32.920] [app] AgentBridge stderr: [agent] Harness mode: acp
[15:31:32.922] [app] AgentBridge: bridge ready (sessionId=)
[15:31:32.926] [app] ChatProvider: prompt built — schema: yes, goals: 4, tasks: 8, ai_profile: yes, memories: 50, history: none, prompt_length: 27387 chars
[15:31:32.927] [app] ChatProvider: prompt breakdown — base_template:9542c, context:2619c, goals:183c, tasks:601c, ai_profile:2162c, schema:12280c
[15:31:32.928] [app] AgentBridge stderr: [agent] Warmup requested (cwd=default, sessions=main, floating)
[15:31:33.011] [app] AgentBridge stderr: [agent] ACP initialized
[15:31:34.492] [app] APIClient: Quota plan=Operator unit=questions used=277.0 limit=500.0 allowed=true
[15:31:34.494] [app] AgentBridge stderr: [agent] Query mode: act
[15:31:35.181] [app] AgentBridge stderr: [agent] Pre-warmed session: d8343433-... (key=floating, model=claude-sonnet-4-6)
[15:31:35.212] [app] AgentBridge stderr: [agent] Reusing existing ACP session: d8343433-... (key=floating)
[15:31:41.781] [app] AgentBridge stderr: [agent] Usage: model=claude-sonnet-4-6, cost=$0.23094, cacheWrite=36692, cacheRead=0, total=36759
[15:31:41.781] [app] AgentBridge stderr: [agent] Prompt completed: stopReason=end_turn
[15:31:42.347] [app] PostHog: Tracked event 'chat_agent_query_completed'
Run 2 — Warm bridge, cache miss (9.0s)
[15:36:55.356] [app] ScreenCaptureManager: Screenshot captured 1920x1080, WebP 122 KB
[15:36:55.492] [app] PostHog: Tracked event 'floating_bar_query_sent'
[15:36:56.707] [app] APIClient: Quota plan=Operator unit=questions used=279.0 limit=500.0 allowed=true
[15:36:56.708] [app] AgentBridge stderr: [agent] Query mode: act
[15:36:56.708] [app] AgentBridge stderr: [agent] Reusing existing ACP session: d8343433-... (key=floating)
[15:37:02.624] [app] AgentBridge stderr: [agent] Usage: cost=$0.24093, cacheWrite=38331, cacheRead=0, total=38388
[15:37:02.625] [app] AgentBridge stderr: [agent] Prompt completed: stopReason=end_turn
[15:37:04.342] [app] PostHog: Tracked event 'chat_agent_query_completed'
Run 3 — Cache hit (7.9s)
[15:38:05.801] [app] ScreenCaptureManager: Screenshot captured 1920x1080, WebP 124 KB
[15:38:05.934] [app] PostHog: Tracked event 'floating_bar_query_sent'
[15:38:07.819] [app] APIClient: Quota plan=Operator unit=questions used=281.0 limit=500.0 allowed=true
[15:38:07.820] [app] AgentBridge stderr: [agent] Reusing existing ACP session: d8343433-... (key=floating)
[15:38:13.062] [app] AgentBridge stderr: [agent] Usage: cost=$0.03039, cacheWrite=1629, cacheRead=38331, total=40004
[15:38:13.062] [app] AgentBridge stderr: [agent] Prompt completed: stopReason=end_turn
[15:38:13.719] [app] PostHog: Tracked event 'chat_agent_query_completed'
Run 4 — Cache miss (7.7s)
[15:43:51.787] [app] ScreenCaptureManager: Screenshot captured 1920x1080, WebP 130 KB
[15:43:51.924] [app] PostHog: Tracked event 'floating_bar_query_sent'
[15:43:53.686] [app] APIClient: Quota plan=Operator unit=questions used=283.0 limit=500.0 allowed=true
[15:43:53.688] [app] AgentBridge stderr: [agent] Reusing existing ACP session: d8343433-... (key=floating)
[15:43:59.012] [app] AgentBridge stderr: [agent] Usage: cost=$0.26072, cacheWrite=41576, cacheRead=0, total=41613
[15:43:59.012] [app] AgentBridge stderr: [agent] Prompt completed: stopReason=end_turn
[15:43:59.511] [app] PostHog: Tracked event 'chat_agent_query_completed'
Run 5 — Cache hit (8.1s)
[15:45:00.514] [app] ScreenCaptureManager: Screenshot captured 1920x1080, WebP 130 KB
[15:45:00.652] [app] PostHog: Tracked event 'floating_bar_query_sent'
[15:45:02.446] [app] APIClient: Quota plan=Operator unit=questions used=285.0 limit=500.0 allowed=true
[15:45:02.447] [app] AgentBridge stderr: [agent] Reusing existing ACP session: d8343433-... (key=floating)
[15:45:07.212] [app] AgentBridge stderr: [agent] Usage: cost=$0.03183, cacheWrite=1609, cacheRead=41576, total=43227
[15:45:08.591] [app] PostHog: Tracked event 'chat_agent_query_completed'
Run 6 — Worst case (11.1s)
[15:47:55.272] [app] ScreenCaptureManager: Screenshot captured 1920x1080, WebP 134 KB
[15:47:55.274] [app] PostHog: Tracked event 'floating_bar_query_sent'
[15:47:57.882] [app] APIClient: Quota plan=Operator unit=questions used=287.0 limit=500.0 allowed=true
[15:47:57.883] [app] AgentBridge stderr: [agent] Reusing existing ACP session: d8343433-... (key=floating)
[15:48:04.333] [app] AgentBridge stderr: [agent] Usage: cost=$0.03272, cacheWrite=1616, cacheRead=43185, total=44843 [CACHE MISS]
[15:48:04.333] [app] AgentBridge stderr: [agent] Prompt completed: stopReason=end_turn
[15:48:06.386] [app] PostHog: Tracked event 'chat_agent_query_completed'
Run 7 — Best case (6.6s)
[15:49:44.856] [app] ScreenCaptureManager: Screenshot captured 1920x1080, WebP 134 KB
[15:49:44.857] [app] PostHog: Tracked event 'floating_bar_query_sent'
[15:49:45.571] [app] APIClient: Quota plan=Operator unit=questions used=289.0 limit=500.0 allowed=true
[15:49:45.572] [app] AgentBridge stderr: [agent] Reusing existing ACP session: d8343433-... (key=floating)
[15:49:50.953] [app] AgentBridge stderr: [agent] Usage: cost=$0.03329, cacheWrite=1616, cacheRead=44799, total=46449
[15:49:50.953] [app] AgentBridge stderr: [agent] Prompt completed: stopReason=end_turn
[15:49:51.419] [app] PostHog: Tracked event 'chat_agent_query_completed'

Tested by ren (AI agent) for @beastoin — 2026-04-23

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingdesktopp1Priority: Critical (score 22-29)

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions