feat(insights): investigation pipeline overhaul#459
Conversation
…ation - Extract getOAuthToken + createCachedTokenFn into shared oauth-token.ts, replacing identical copy-pasted token-fetch logic in search-console.ts, github-tools.ts, and generation.ts - Extract getUserProviderToken in RPC integrations router, deduplicating the token lookup in checkSearchConsoleAccess and listGitHubRepos - Extract useOAuthConnect hook in dashboard integrations UI, deduplicating the linkSocial mutation between GSC and GitHub rows - Extract querySearchAnalytics as a standalone testable function - Move ALWAYS_ON_TOOLS set to module scope in generation.ts - Add 8 unit tests for search console query mapping, error handling, and request shape
Start the Slack stream eagerly with a task_update chunk (status: in_progress) before the agent produces output, so users see the animated thinking card instead of just a reaction emoji. Resolves thinking to "complete" on first text flush and "error" on failures. Also desloppified respond.ts: removed dead lazy-start logic (tryStartStream, shouldStream, streamStartAttempted), extracted flushAndStop/recoverFromError/logStreamError helpers, added early bail-out for pre-aborted signals, and added a streaming-unavailable fallback test.
Replace BAD_EVENTS_COLUMN_REPLACEMENTS blocklist with proper AGENT_TABLE_COLUMNS schema map. Validates alias.column patterns against the actual table schema — catches cross-table misuse like es.browser_name on error_spans before it hits ClickHouse.
…eadability - Extract prompt builders and data fetchers into prompts.ts (266 lines) - generation.ts: 1076 → 804 lines - detection.ts: replace if/else filter chain with METRIC_FILTERS lookup - detection.ts: deduplicate 5 signals.push() blocks with makeWowSignal() - detection.ts: 584 → 510 lines - validate.ts: replace inline array with SENTIMENT_DIVERGENCE_TYPES Set - Share OrgWebsiteRow interface between generation and prompts - Extract isEnabled() for tool filter readability
…grade - createInvestigationTools() shared across insights pipeline and chat agent - Chat agent now has scrape, GSC, and GitHub tools (was scrape only) - AgentContext gains organizationId for OAuth token lookups - Add actions schema to insights (fix_goal, add_custom_event, etc.) - Add actions column to analytics_insights table - Chat agent prompt updated with investigation tool guidance
Add InsightAction type and render action buttons on insight cards. Actions include fix_goal, add_custom_event, create_funnel, etc. Currently shows a toast on click — wire to actual mutations next.
- createToolkit() assembles tools from capabilities (analytics, investigation, mutations, memory, dashboard) - Chat agent simplified from 12 imports to one createToolkit() call - Insights pipeline gains mutation tools (create_annotation, fix_goal, create_funnel) for executing actions during investigation - Fix all lint errors (block statements, formatting)
…actions - Add session query types to web_metrics description (interesting_sessions, session_list, session_flow, session_pages) - Investigation prompt now guides agent to use session-level queries for user behavior analysis (step 6 in strategy) - Agent told it can execute mutations directly (create_annotation with confirmed=true, update_goal for target mismatches) - Verified: agent generates specific add_custom_event actions with exact event names, elements, and pages
- Add code_fix action type with prompt, file_hint, error_message params - Dashboard copies code_fix prompt to clipboard on click - investigate_further also copies prompt to clipboard - Agent generates cursor-ready prompts with exact files, changes, and error context from GitHub search results
Deduplicate period-pair query boilerplate in enrichSegments and enrichErrors. Both functions called queryFn 4 times with nearly identical params differing only in date range. The helper creates a curried function that handles the current/previous split.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
2 Skipped Deployments
|
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: ASSERTIVE Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
The latest updates on your projects. Learn more about Unkey Deploy
|
Greptile SummaryThis PR overhauled the investigation pipeline end-to-end: detection now covers errors, revenue, web vitals, and custom events via WoW signals on top of the existing z-score path; enrichment adds a vitals context block and a
Confidence Score: 3/5Not safe to merge as-is: insight generation writes will break in production without a DB migration for the new Two independent issues affect the production path: the
Important Files Changed
Sequence DiagramsequenceDiagram
participant W as Insights Worker
participant D as detection.ts
participant E as enrichment.ts
participant G as generation.ts
participant TK as toolkit.ts
participant AI as LLM (smart routing)
participant DB as PostgreSQL
participant CH as ClickHouse
W->>D: detectSignals(websiteId)
D->>CH: z-score (events_by_date)
D->>CH: WoW x10 parallel (summary, errors, revenue, vitals, custom_events)
D-->>W: DetectedSignal[]
W->>E: enrichSignals(signals)
E->>CH: segments + errors + vitals + annotations (parallel per signal)
E->>DB: GitHub commits/PRs (via OAuth token)
E-->>W: EnrichedSignal[]
W->>G: analyzeWebsite(enrichedSignals)
G->>DB: annotations + recentInsights + dismissedPatterns + siteContext
G->>TK: createToolkit(investigation + mutations)
TK-->>G: "scrape_page + search_console + github_* + annotation/goal tools"
G->>AI: ToolLoopAgent (model by tier + critical-signal flag)
AI->>CH: web_metrics / execute_sql
AI->>TK: "search_console / scrape_page / github_*"
AI-->>G: emit_insight (rootCause, evidence, actions)
G->>DB: INSERT/UPDATE analyticsInsights (with actions column)
|
| investigationDepth: text("investigation_depth").$type< | ||
| "surface" | "investigated" | "deep" | ||
| >(), |
There was a problem hiding this comment.
New
actions column has no accompanying migration
analyticsInsights gains a jsonb actions column. Without a Drizzle migration applied before deploying, every INSERT/UPDATE that references this column (which insightRow() in generation.ts now always does via actions: insight.actions ?? null) will throw a PostgreSQL "column does not exist" error and break insight generation entirely. Ensure the migration is generated and included in the deployment runbook.
| return; | ||
| } | ||
| if (action.type === "investigate_further" && action.params.prompt) { | ||
| navigator.clipboard.writeText(action.params.prompt); | ||
| toast.success("Copied investigation prompt"); | ||
| return; | ||
| } | ||
| toast.info(`${action.label}`); | ||
| }; | ||
|
|
||
| return ( | ||
| <button | ||
| className="inline-flex items-center gap-1 rounded-md border border-border/60 bg-background px-2 py-1 text-foreground/80 text-xs transition-colors hover:bg-accent hover:text-foreground" | ||
| onClick={handleClick} | ||
| type="button" | ||
| > | ||
| {ACTION_ICONS[action.type]} | ||
| {action.label} | ||
| </button> | ||
| ); |
There was a problem hiding this comment.
Unimplemented action types silently no-op
fix_goal, create_funnel, add_custom_event, create_annotation, update_config, and add_tracking all fall through to toast.info(action.label) with no actual effect. Users who click these pills will see a toast that implies action but nothing will happen. Either wire them to real handlers or guard against rendering them until they're implemented.
| const targetCount = Math.max( | ||
| 1, | ||
| Math.min(10, config.maxInsightsPerWebsite || 2) | ||
| ); |
There was a problem hiding this comment.
config.maxInsightsPerWebsite || 2 treats 0 as falsy and silently replaces it with 2. Use ?? so only null/undefined trigger the fallback.
| const targetCount = Math.max( | |
| 1, | |
| Math.min(10, config.maxInsightsPerWebsite || 2) | |
| ); | |
| const targetCount = Math.max( | |
| 1, | |
| Math.min(10, config.maxInsightsPerWebsite ?? 2) | |
| ); |
| } | ||
|
|
||
| export function createCachedTokenFn( | ||
| providerId: string, | ||
| organizationId: string, | ||
| preferUserId?: string | ||
| ): () => Promise<string | null> { | ||
| let cached: string | undefined; | ||
| return async () => { | ||
| if (cached !== undefined) { | ||
| return cached; | ||
| } | ||
| const token = await getOAuthToken(providerId, organizationId, preferUserId); | ||
| if (token) { | ||
| cached = token; | ||
| } | ||
| return token; | ||
| }; | ||
| } |
There was a problem hiding this comment.
createCachedTokenFn caches tokens without a TTL
The closure caches the first non-null token indefinitely. OAuth access tokens (Google's in particular) expire in ~1 hour. If the same tool-instance is ever reused across agent runs that span the token's lifetime, all subsequent Search Console calls will receive 401 errors that surface as unhelpful tool-level failures. Consider storing a fetch-time timestamp and re-fetching after a safe TTL (e.g. 50 minutes).
There was a problem hiding this comment.
13 issues found across 43 files
Confidence score: 2/5
- There is high merge risk due to a concrete auth-scoping bug in
packages/ai/src/ai/tools/utils/oauth-token.ts: token lookup is not organization-scoped, which can return another org’s user token and is user-impacting/security-sensitive. apps/insights/src/prompts.tsintroduces prompt-safety risk because user-provided annotation text is embedded without sanitization, and dismissed-pattern limiting lacks explicit ordering, so prompt behavior can be manipulated or inconsistent.- Several medium-severity runtime/reliability issues increase regression risk, including expired-token reuse in
packages/ai/src/ai/tools/utils/oauth-token.ts, over-limit truncation inpackages/ai/src/ai/insights/validate.ts, and schema mismatch inpackages/ai/src/ai/schemas/smart-insights-output.tsthat can reject valid non-string action params. - Pay close attention to
packages/ai/src/ai/tools/utils/oauth-token.ts,apps/insights/src/prompts.ts,packages/ai/src/ai/schemas/smart-insights-output.ts, andpackages/ai/src/ai/tools/toolkit.ts- these contain the highest-impact auth, prompt-safety, validation, and capability-gating risks.
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="apps/insights/src/prompts.ts">
<violation number="1" location="apps/insights/src/prompts.ts:49">
P1: Sanitize annotation text before embedding it in prompts to reduce prompt-injection/control-content risk from user-provided annotations.</violation>
</file>
<file name="packages/ai/src/ai/tools/github-tools.ts">
<violation number="1" location="packages/ai/src/ai/tools/github-tools.ts:34">
P2: Using `createCachedTokenFn` here drops negative-result caching, causing repeated DB token lookups when no GitHub account is connected.</violation>
</file>
<file name="packages/ai/src/ai/tools/search-console.ts">
<violation number="1" location="packages/ai/src/ai/tools/search-console.ts:11">
P2: Enforce `YYYY-MM-DD` date format in the input schema instead of relying on description text only.</violation>
</file>
<file name="packages/ai/src/ai/tools/toolkit.ts">
<violation number="1" location="packages/ai/src/ai/tools/toolkit.ts:58">
P2: Fail fast when `investigation` capability is requested without required `domain`/`organizationId` instead of silently omitting those tools.</violation>
</file>
<file name="packages/ai/src/ai/schemas/smart-insights-output.ts">
<violation number="1" location="packages/ai/src/ai/schemas/smart-insights-output.ts:148">
P2: `actions.params` is typed as `Record<string, string>`, but documented and existing action payloads need non-string values (e.g. funnel steps arrays, booleans). This makes valid actions fail schema validation.</violation>
</file>
Architecture diagram
sequenceDiagram
participant UI as Dashboard UI
participant API as API Route
participant Agent as Agent Config
participant Toolkit as createToolkit()
participant Detection as detection.ts
participant Enrichment as enrichment.ts
participant Generation as generation.ts
participant LLM as LLM (GPT/Sonnet/Opus)
participant DB as Database
participant Ext as External Services
participant Slack as Slack Client
Note over UI,Slack: Investigation Pipeline Overhaul - Runtime Flow
UI->>API: Fetch insights for website
API->>Generation: generateInsights(orgId, websiteId, config)
Generation->>DB: fetchDismissedPatterns()
DB-->>Generation: patterns marked "not helpful"
Generation->>DB: fetchRecentAnnotations()
DB-->>Generation: annotation timeline
Generation->>DB: fetchRecentInsightsForPrompt()
DB-->>Generation: recent insight titles
Note over Detection,Generation: Detection Phase
Generation->>Detection: detectSignals(params, queryFn)
Detection->>Detection: z-score analysis on daily data
Detection->>Detection: WoW analysis on summary_metrics
Detection->>Detection: WoW error_summary, revenue_overview
Detection->>Detection: WoW vitals_overview p75 values
Detection->>Detection: WoW custom_events_discovery
Detection->>Detection: Filter by MIN_THRESHOLD constants
Detection-->>Generation: DetectedSignal[]
Note over Enrichment,Generation: Enrichment Phase
Generation->>Enrichment: enrichSignals(signals, queryFn, annotationQueryFn)
Enrichment->>Enrichment: computeWindow() - full lookback for WoW
Enrichment->>Enrichment: enrichSegments() - top movers by dimension
Enrichment->>Enrichment: enrichErrors() - top new error types
Enrichment->>Enrichment: enrichVitals() - p75 comparisons
Enrichment->>Enrichment: enrichAnnotations() - timeline events
Enrichment->>Ext: enrichGitHub() - commits/PRs matching keywords
Enrichment-->>Generation: EnrichedSignal[]
Note over Generation,Slack: Investigation & Generation Phase
Generation->>LLM: Build investigation prompt with signals, org context
Generation->>Toolkit: createToolkit({capabilities: [...]})
Toolkit->>Toolkit: Compose tools by capability flags
Toolkit->>Toolkit: Add analytics tools (get_data, execute_sql)
Toolkit->>Toolkit: Add investigation tools (scrape_page, search_console, github)
Toolkit->>Toolkit: Add mutation tools (create_annotation, create_funnel)
Toolkit->>Toolkit: Add memory/dashboard tools
LLM->>Agent: Execute toolkit tools during generation
Agent->>Agent: Model routing: quick/balanced/deep based on config + critical signals
alt hasCriticalSignals AND tier is balanced
Agent->>Agent: Upgrade to Opus (deep model)
end
Agent->>Ext: scrape_page() via Firecrawl API
Agent->>Ext: search_console() via Google Search Console API
Agent->>Ext: github_commits/PRs via GitHub API
Ext-->>Agent: Results
Agent->>Agent: Call emit_insight with rootCause, evidence, investigationDepth
Agent->>Agent: Call create_annotation to mark deploys
Agent-->>Generation: ParsedInsight[] with actions
Note over Generation,UI: Validation & Storage
Generation->>Generation: validateInsight() - truncates description/suggestion >300 chars
Generation->>DB: Store insight with actions, evidence, rootCause
DB-->>Generation: insight ID
Generation->>UI: Return insight data
UI->>UI: toInsightCardViewModel() - separate metrics vs investigationEvidence
UI->>UI: Render action pills (code_fix, investigate_further, etc.)
alt code_fix action clicked
UI->>UI: navigator.clipboard.writeText(prompt)
UI->>UI: toast("Copied for Cursor/Claude Code")
end
Note over Slack,Agent: Slack Thinking Experience
Slack->>Agent: streamAgentToSlack(run)
Agent->>Slack: startThinkingStream() - immediate "thinking" card
Slack->>Slack: Show task_update "in_progress" with plan mode
Agent->>Agent: Stream agent response chunks
alt Thinking resolved
Agent->>Slack: appendStream() with chunk + task_update "complete"
else Error encountered
Agent->>Slack: appendStream() with task_update "error" + error text
end
Agent->>Slack: stopStream() with final markdown_text
alt Streaming unavailable
Agent->>Slack: say() fallback with agent error message
end
Note over UI,DB: Settings Integration
UI->>API: checkSearchConsoleAccess()
API->>Ext: GET /webmasters/v3/sites
Ext-->>API: 200 OK
API-->>UI: { hasAccess: true }
UI->>API: linkSocial("google", GSC_SCOPES)
API->>API: OAuth flow with Google Search Console scope
API-->>UI: Connection established
Shadow auto-approve: would not auto-approve because issues were found.
Tip: instead of fixing issues one by one fix them all with cubic
Re-trigger cubic
| const lines = rows.map((row) => { | ||
| const date = dayjs(row.xValue).format("YYYY-MM-DD"); | ||
| const tags = row.tags?.length ? ` [${row.tags.join(", ")}]` : ""; | ||
| return `- ${date}: ${row.text}${tags}`; |
There was a problem hiding this comment.
P1: Sanitize annotation text before embedding it in prompts to reduce prompt-injection/control-content risk from user-provided annotations.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/insights/src/prompts.ts, line 49:
<comment>Sanitize annotation text before embedding it in prompts to reduce prompt-injection/control-content risk from user-provided annotations.</comment>
<file context>
@@ -0,0 +1,288 @@
+ const lines = rows.map((row) => {
+ const date = dayjs(row.xValue).format("YYYY-MM-DD");
+ const tags = row.tags?.length ? ` [${row.tags.join(", ")}]` : "";
+ return `- ${date}: ${row.text}${tags}`;
+ });
+
</file context>
| Object.assign(tools, ANALYTICS_TOOLS); | ||
| } | ||
|
|
||
| if (caps.has("investigation") && params.domain && params.organizationId) { |
There was a problem hiding this comment.
P2: Fail fast when investigation capability is requested without required domain/organizationId instead of silently omitting those tools.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/ai/src/ai/tools/toolkit.ts, line 58:
<comment>Fail fast when `investigation` capability is requested without required `domain`/`organizationId` instead of silently omitting those tools.</comment>
<file context>
@@ -0,0 +1,82 @@
+ Object.assign(tools, ANALYTICS_TOOLS);
+ }
+
+ if (caps.has("investigation") && params.domain && params.organizationId) {
+ Object.assign(
+ tools,
</file context>
| ]), | ||
| label: z.string().describe("Button label (e.g. 'Fix goal target')"), | ||
| params: z | ||
| .record(z.string(), z.string()) |
There was a problem hiding this comment.
P2: actions.params is typed as Record<string, string>, but documented and existing action payloads need non-string values (e.g. funnel steps arrays, booleans). This makes valid actions fail schema validation.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/ai/src/ai/schemas/smart-insights-output.ts, line 148:
<comment>`actions.params` is typed as `Record<string, string>`, but documented and existing action payloads need non-string values (e.g. funnel steps arrays, booleans). This makes valid actions fail schema validation.</comment>
<file context>
@@ -121,21 +123,41 @@ export const insightSchema = z.object({
+ ]),
+ label: z.string().describe("Button label (e.g. 'Fix goal target')"),
+ params: z
+ .record(z.string(), z.string())
+ .describe(
+ "Action-specific parameters. code_fix: {prompt, file_hint, error_message} — generates a cursor/claude-code-ready prompt."
</file context>
| .record(z.string(), z.string()) | |
| .record(z.string(), z.unknown()) |
- OAuth token: scope preferUserId lookup to org membership (P1) - OAuth token: add 45-min TTL to cached tokens (P2) - Error detection: catch 0→N error spikes (P2) - Prompts: use ?? instead of || for maxInsights (P2) - Prompts: order dismissed patterns by recency (P2) - Validate: fix truncation overflow (P2) - Dashboard: handle clipboard write failures (P2)
34446f4 to
031291c
Compare
- Vitals enrichment: check previous-period sample size too (P2) - OAuth cache: negative-result caching with 5-min TTL to avoid repeated DB lookups when no account connected (P2)
There was a problem hiding this comment.
0 issues found across 3 files (changes from recent commits).
Shadow auto-approve: would not auto-approve. Auto-approval blocked by 5 unresolved issues from previous reviews.
Re-trigger cubic
Summary
createToolkit()across chat agent, insights pipeline, and MCPcode_fixfor cursor/claude-code,investigate_further,add_custom_event,create_annotationTest plan
bun testin apps/insights (61 tests)bun testin packages/db (43 SQL validation tests)Summary by cubic
Overhauled the insights investigation pipeline with stronger detection, richer enrichment, and shared investigation tools. Adds GSC/GitHub/scrape support, actionable insight pills, org-scoped OAuth, and an instant Slack “thinking” card.
New Features
createToolkit()with scrape,search_console, and GitHub tools for both chat and insights; includes mutation tools (annotations/goals/funnels).code_fix,investigate_further,add_custom_event,create_annotation; dashboard shows action pills, copies prompts, and includes root cause + investigation evidence.Refactors
createToolkit(); updated DB/RPC schemas (insightactions,evidence,rootCause,investigationDepth) and tightened SQL validator with per-table alias/column checks.getOAuthToken/createCachedTokenFnwith org-scoped lookup, 45‑min cache, and 5‑min negative-result cache when no account is connected; deduped GitHub/GSC/RPC; agent context now includesorganizationId.aito^6.0.188; added Search Console tests and refreshed detection/enrichment tests, including a fix to check previous-period sample size for vitals.Written for commit 6ab5526. Summary will update on new commits. Review in cubic