Skip to content

Send raw error to analytics for chat agent failures#5984

Merged
kodjima33 merged 1 commit into
mainfrom
worktree-fix-onboarding-restart-recovery
Mar 24, 2026
Merged

Send raw error to analytics for chat agent failures#5984
kodjima33 merged 1 commit into
mainfrom
worktree-fix-onboarding-restart-recovery

Conversation

@kodjima33
Copy link
Copy Markdown
Collaborator

Summary

Previously chat_agent_error PostHog events only contained the sanitized user-friendly message (e.g., "AI service is temporarily unavailable"), making it impossible to diagnose failures remotely. Now sends the raw BridgeError alongside (e.g., agentError("Internal error: Invalid API key")) so we can see the actual cause in analytics.

This was discovered investigating Miles Feldstein's failed onboarding — we could see the error happened but couldn't determine the root cause because the raw error was stripped.

🤖 Generated with Claude Code

Previously only the sanitized user-friendly message was sent to PostHog,
making it impossible to diagnose failures remotely. Now sends the raw
BridgeError alongside so we can see the actual cause (API key errors,
connection issues, etc.) in analytics.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@kodjima33 kodjima33 merged commit c550d01 into main Mar 24, 2026
1 check passed
@kodjima33 kodjima33 deleted the worktree-fix-onboarding-restart-recovery branch March 24, 2026 05:10
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 24, 2026

Greptile Summary

This PR improves observability for chat agent failures by forwarding the raw BridgeError string to PostHog and Mixpanel analytics alongside the existing sanitized user-facing message. The AnalyticsManager change is clean and backward-compatible (optional parameter, 500-char cap, deduplication guard).

Key concern:

  • The rawError for BridgeError.agentError is produced via String(describing:), which emits the full associated value (e.g. agentError("Error: Invalid key …")). BridgeError.errorDescription was explicitly written to redact auth/key-related content before any exposure — that sanitization is bypassed here. If the upstream AI provider embeds a credential value in an error message, it would flow unsanitized to two third-party analytics services.

Confidence Score: 3/5

  • Not safe to merge until the raw error is sanitized before being sent to third-party analytics.
  • The change has a clear, legitimate purpose and the AnalyticsManager side is well implemented. However, ChatProvider.swift forwards potentially sensitive content (auth/key error details) to PostHog and Mixpanel by bypassing the sanitization that BridgeError.errorDescription was explicitly built to enforce. Because this touches a security boundary — credential data possibly reaching external services — a targeted fix is needed before merging.
  • desktop/Desktop/Sources/Providers/ChatProvider.swift — the rawError construction for BridgeError.agentError needs the same keyword-based redaction that already exists in BridgeError.errorDescription.

Important Files Changed

Filename Overview
desktop/Desktop/Sources/AnalyticsManager.swift Adds optional rawError parameter to chatAgentError, truncates to 500 chars, and only includes it when it differs from the user-friendly error. Logic is clean and backward-compatible.
desktop/Desktop/Sources/Providers/ChatProvider.swift Uses String(describing: bridgeError) for BridgeError.agentError, which exposes the raw associated-value string (potentially containing API keys or auth tokens) to PostHog and Mixpanel, bypassing the sanitization built into BridgeError.errorDescription.

Sequence Diagram

sequenceDiagram
    participant Bridge as ACPBridge (Node)
    participant CP as ChatProvider
    participant AM as AnalyticsManager
    participant PH as PostHog
    participant MX as Mixpanel

    Bridge->>CP: throw BridgeError.agentError(rawMsg)
    CP->>CP: error.localizedDescription → sanitized user string
    CP->>CP: String(describing: bridgeError) → raw enum string
    Note over CP: ⚠️ rawMsg may contain sensitive data
    CP->>AM: chatAgentError(error: sanitized, rawError: raw)
    AM->>PH: track("chat_agent_error", {error, raw_error})
    AM->>MX: track("Chat Agent Error", {error, raw_error})
    Note over PH,MX: raw_error bypasses BridgeError.errorDescription sanitization
Loading

Reviews (1): Last reviewed commit: "fix(desktop): send raw error to analytic..." | Re-trigger Greptile

Comment on lines +2196 to +2200
if let bridgeError = error as? BridgeError {
rawError = String(describing: bridgeError)
} else {
rawError = "\(error)"
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Raw error bypasses security sanitization before reaching analytics

String(describing: bridgeError) for BridgeError.agentError produces the raw enum representation including its full associated string value. This gets forwarded to PostHog and Mixpanel, but the BridgeError.errorDescription implementation was specifically written to redact auth/key-related content (checking for "api key", "api_key", "leaked", "unauthorized", "forbidden", etc.) before ever exposing the message. That sanitization is now bypassed for analytics.

If the node bridge surfaces an error from the upstream AI provider that embeds a credential value in its message text, the raw string would be sent to third-party analytics in full. The 500-char truncation is not a reliable mitigation because sensitive content typically appears at the beginning of such messages.

A safer approach is to apply the same redaction before sending to analytics:

case .agentError(let msg):
    let lower = msg.lowercased()
    let sensitiveKeywords = ["api key", "api_key", "leaked", "unauthorized",
                             "permission denied", "invalid key", "forbidden"]
    let isAuthError = sensitiveKeywords.contains(where: lower.contains)
    rawError = isAuthError
        ? "agentError([redacted auth/key error])"
        : "agentError(\(String(msg.prefix(200))))"

Glucksberg pushed a commit to Glucksberg/omi-local that referenced this pull request Apr 28, 2026
## Summary
Previously `chat_agent_error` PostHog events only contained the
sanitized user-friendly message (e.g., "AI service is temporarily
unavailable"), making it impossible to diagnose failures remotely. Now
sends the raw `BridgeError` alongside (e.g., `agentError("Internal
error: Invalid API key")`) so we can see the actual cause in analytics.

This was discovered investigating Miles Feldstein's failed onboarding —
we could see the error happened but couldn't determine the root cause
because the raw error was stripped.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant