chore: Tweaks to Haiku prompt to improve repo selection.#60092
Conversation
d0fb56e to
47c14f7
Compare
There was a problem hiding this comment.
Pull request overview
This PR tunes the Slack repo-selection classifier so Haiku is less likely to filter out tasks that should proceed to repository discovery, especially SDK/app crashes, team-owned site performance issues, and tracking-related wrong-data reports.
Changes:
- Removes
"trace"from the no-repo heuristic terms to avoid matching “stack trace.” - Expands the Haiku prompt with own-code vs PostHog SaaS guidance and a wrong-data tracking exception.
- Updates the local Slack repo-selection eval with new/updated boundary cases.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
products/slack_app/backend/api.py |
Adjusts heuristic terms and classifier prompt logic. |
posthog/temporal/ai/eval_slack_repo_selection.py |
Updates expected eval outcomes and adds regression cases for routing boundaries. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "the team's code → no_repo. Important exception: 'wrong data', 'missing events', or " | ||
| "'numbers look off' in PostHog usually means the team's tracking code is broken (wrong " | ||
| "event names, identification logic, SDK setup) — that's a code fix in their repo → " | ||
| "needs_repo. When in doubt, lean needs_repo=true — the discovery agent can still report " |
Prompt To Fix All With AIFix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
posthog/temporal/ai/eval_slack_repo_selection.py:169-192
These two cases are now `agent/found` outcomes but still sit under the `# --- Haiku gate short-circuits (heuristic + LLM) ---` section comment. Now that they represent the fixed (pass-through-to-agent) behaviour they would be easier to find alongside the other agent-path cases below the `# --- Agent path ---` marker.
```suggestion
Case(
name="posthog_product_hang",
```
Reviews (1): Last reviewed commit: "Merge branch 'master' into repo-selectio..." | Re-trigger Greptile |
| Case( | ||
| name="marketing_site", | ||
| description="Haiku LLM filters as ops/perf rather than code change.", | ||
| description="Perf complaint about a site the team likely owns code for — agent should route to docs/marketing repo.", | ||
| text_template="@PostHog the docs site loads really slowly on mobile", | ||
| thread_messages=[ | ||
| {"user": "tester", "text": "@PostHog the docs site loads really slowly on mobile"}, | ||
| {"user": "other", "text": "yeah I noticed the same on /docs/getting-started"}, | ||
| ], | ||
| expected_stage="haiku", | ||
| expected_outcome="no_repo", | ||
| note=( | ||
| "Ideally the agent would route this to a docs/marketing repo if connected. " | ||
| "Haiku LLM treats it as a perf/CDN question instead. Out of this PR's scope; " | ||
| "candidate for a follow-up Haiku-tuning PR backed by this eval." | ||
| ), | ||
| expected_stage="agent", | ||
| expected_outcome="found", | ||
| ), | ||
| Case( | ||
| name="sdk_specific_trace_trip", | ||
| description="Haiku heuristic trips on 'trace' in 'stack trace'.", | ||
| description="App/SDK crash with stack trace — agent should route to the relevant SDK repo (e.g. posthog-ios).", | ||
| text_template="@PostHog the iOS SDK is crashing on app launch after upgrade to 3.19", | ||
| thread_messages=[ | ||
| {"user": "tester", "text": "@PostHog the iOS SDK is crashing on app launch after upgrade to 3.19"}, | ||
| {"user": "other", "text": "stack trace shows PostHogReplay.start() failing"}, | ||
| ], | ||
| expected_stage="agent", | ||
| expected_outcome="found", | ||
| ), | ||
| Case( | ||
| name="posthog_product_hang", |
There was a problem hiding this comment.
These two cases are now
agent/found outcomes but still sit under the # --- Haiku gate short-circuits (heuristic + LLM) --- section comment. Now that they represent the fixed (pass-through-to-agent) behaviour they would be easier to find alongside the other agent-path cases below the # --- Agent path --- marker.
| Case( | |
| name="marketing_site", | |
| description="Haiku LLM filters as ops/perf rather than code change.", | |
| description="Perf complaint about a site the team likely owns code for — agent should route to docs/marketing repo.", | |
| text_template="@PostHog the docs site loads really slowly on mobile", | |
| thread_messages=[ | |
| {"user": "tester", "text": "@PostHog the docs site loads really slowly on mobile"}, | |
| {"user": "other", "text": "yeah I noticed the same on /docs/getting-started"}, | |
| ], | |
| expected_stage="haiku", | |
| expected_outcome="no_repo", | |
| note=( | |
| "Ideally the agent would route this to a docs/marketing repo if connected. " | |
| "Haiku LLM treats it as a perf/CDN question instead. Out of this PR's scope; " | |
| "candidate for a follow-up Haiku-tuning PR backed by this eval." | |
| ), | |
| expected_stage="agent", | |
| expected_outcome="found", | |
| ), | |
| Case( | |
| name="sdk_specific_trace_trip", | |
| description="Haiku heuristic trips on 'trace' in 'stack trace'.", | |
| description="App/SDK crash with stack trace — agent should route to the relevant SDK repo (e.g. posthog-ios).", | |
| text_template="@PostHog the iOS SDK is crashing on app launch after upgrade to 3.19", | |
| thread_messages=[ | |
| {"user": "tester", "text": "@PostHog the iOS SDK is crashing on app launch after upgrade to 3.19"}, | |
| {"user": "other", "text": "stack trace shows PostHogReplay.start() failing"}, | |
| ], | |
| expected_stage="agent", | |
| expected_outcome="found", | |
| ), | |
| Case( | |
| name="posthog_product_hang", | |
| Case( | |
| name="posthog_product_hang", |
Prompt To Fix With AI
This is a comment left during a code review.
Path: posthog/temporal/ai/eval_slack_repo_selection.py
Line: 169-192
Comment:
These two cases are now `agent/found` outcomes but still sit under the `# --- Haiku gate short-circuits (heuristic + LLM) ---` section comment. Now that they represent the fixed (pass-through-to-agent) behaviour they would be easier to find alongside the other agent-path cases below the `# --- Agent path ---` marker.
```suggestion
Case(
name="posthog_product_hang",
```
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
<!-- Authoring rules (agents: read).
Title: <type>(<scope>): <description> — type=feat|fix|chore, scope required, lowercase, no period, <72 chars.
✅ feat(insights): add retention graph export
❌ feat: Added retention export. (capitalized, period, no scope)
Description: high-level rationale, not a step-by-step replay.
Public OSS repo: no internal customers, incidents, or operational metrics.
-->
## Problem
- The Haiku gate was filtering out things it shouldn't — SDK crashes (the heuristic's `"trace"` substring tripped on `"stack trace"`), perf complaints about team-owned sites (the LLM treated them as ops/CDN issues), and "wrong data / missing events" reports (which are almost always tracking bugs in the team's own code, not PostHog).
<!-- Who are we building for, what are their needs, why is this important? -->
<!-- Does this fix an issue? Uncomment the line below with the issue ID to automatically close it when merged -->
<!-- Closes #ISSUE_ID -->
## Changes
- Dropped `"trace"` from the heuristic and tightened the Haiku prompt with principled own-code-vs-PostHog-SaaS guidance
- Plus an explicit "wrong data = tracking bug → needs_repo" exception; added 3 eval cases (`posthog_product_hang`, `sdk_instrumentation_on_own_site`, `wrong_data_tracking_bug`) as boundary regression guards.
<!-- If there are frontend changes, please include screenshots. -->
<!-- If a reference design was involved, include a link to the relevant Figma frame! -->
## How did you test this code?
<!-- Describe steps to reproduce and verify the changes, and what the expected behavior is. -->
<!-- Include automated tests if possible, otherwise describe the manual testing routine. -->
<!-- Agents: do NOT claim manual testing you haven't done. State that you're an agent and list only the automated tests you actually ran. -->
👉 _Stay up-to-date with_ [_PostHog coding conventions_](https://posthog.com/docs/contribute/coding-conventions) _for a smoother review._
## Publish to changelog?
<!-- For features only -->
<!-- If publishing, you must provide changelog details in the #changelog Slack channel. You will receive a follow-up PR comment or notification. -->
<!-- If not, write "no" or "do not publish to changelog" to explicitly opt-out of posting to #changelog. Removing this entire section will not prevent posting. -->
## Docs update
<!-- Add the `skip-inkeep-docs` label if this PR should not trigger an automatic docs update from the Inkeep agent. -->
## 🤖 Agent context
<!-- Fill this section if an agent co-authored or authored this PR. Remove it for fully human-authored PRs. -->
<!-- Include:
- tools/agent used and link to session. List the agent and tool names used, but do not include tool call results.
- decisions made along the way (what was tried, rejected, chosen, and why)
- anything else that helps reviewers
Write reviewer-facing prose. Do not paste user prompts verbatim — paraphrase the intent in your own words.
DO NOT INCLUDE sensitive data that may have been shared in an agent session.
-->
<!-- Rules for agent-authored PRs:
- All PRs must be attributable to a human author, even if agent-assisted.
- Do not add a human Co-authored-by just for the sake of attribution — if no human was involved in the changes, own it as agent-authored.
- Agent-authored PRs always require human review — do not self-merge or auto-approve.
- Do NOT claim manual testing you haven't done.
- GitHub PR descriptions render markdown, not fixed-width text. Do not hard-wrap prose at a column width or use space-aligned tables — use real markdown tables, headings, and fenced code blocks, and let GitHub flow the text.
-->
<!-- Authoring rules (agents: read).
Title: <type>(<scope>): <description> — type=feat|fix|chore, scope required, lowercase, no period, <72 chars.
✅ feat(insights): add retention graph export
❌ feat: Added retention export. (capitalized, period, no scope)
Description: high-level rationale, not a step-by-step replay.
Public OSS repo: no internal customers, incidents, or operational metrics.
-->
## Problem
- The Haiku gate was filtering out things it shouldn't — SDK crashes (the heuristic's `"trace"` substring tripped on `"stack trace"`), perf complaints about team-owned sites (the LLM treated them as ops/CDN issues), and "wrong data / missing events" reports (which are almost always tracking bugs in the team's own code, not PostHog).
<!-- Who are we building for, what are their needs, why is this important? -->
<!-- Does this fix an issue? Uncomment the line below with the issue ID to automatically close it when merged -->
<!-- Closes #ISSUE_ID -->
## Changes
- Dropped `"trace"` from the heuristic and tightened the Haiku prompt with principled own-code-vs-PostHog-SaaS guidance
- Plus an explicit "wrong data = tracking bug → needs_repo" exception; added 3 eval cases (`posthog_product_hang`, `sdk_instrumentation_on_own_site`, `wrong_data_tracking_bug`) as boundary regression guards.
<!-- If there are frontend changes, please include screenshots. -->
<!-- If a reference design was involved, include a link to the relevant Figma frame! -->
## How did you test this code?
<!-- Describe steps to reproduce and verify the changes, and what the expected behavior is. -->
<!-- Include automated tests if possible, otherwise describe the manual testing routine. -->
<!-- Agents: do NOT claim manual testing you haven't done. State that you're an agent and list only the automated tests you actually ran. -->
👉 _Stay up-to-date with_ [_PostHog coding conventions_](https://posthog.com/docs/contribute/coding-conventions) _for a smoother review._
## Publish to changelog?
<!-- For features only -->
<!-- If publishing, you must provide changelog details in the #changelog Slack channel. You will receive a follow-up PR comment or notification. -->
<!-- If not, write "no" or "do not publish to changelog" to explicitly opt-out of posting to #changelog. Removing this entire section will not prevent posting. -->
## Docs update
<!-- Add the `skip-inkeep-docs` label if this PR should not trigger an automatic docs update from the Inkeep agent. -->
## 🤖 Agent context
<!-- Fill this section if an agent co-authored or authored this PR. Remove it for fully human-authored PRs. -->
<!-- Include:
- tools/agent used and link to session. List the agent and tool names used, but do not include tool call results.
- decisions made along the way (what was tried, rejected, chosen, and why)
- anything else that helps reviewers
Write reviewer-facing prose. Do not paste user prompts verbatim — paraphrase the intent in your own words.
DO NOT INCLUDE sensitive data that may have been shared in an agent session.
-->
<!-- Rules for agent-authored PRs:
- All PRs must be attributable to a human author, even if agent-assisted.
- Do not add a human Co-authored-by just for the sake of attribution — if no human was involved in the changes, own it as agent-authored.
- Agent-authored PRs always require human review — do not self-merge or auto-approve.
- Do NOT claim manual testing you haven't done.
- GitHub PR descriptions render markdown, not fixed-width text. Do not hard-wrap prose at a column width or use space-aligned tables — use real markdown tables, headings, and fenced code blocks, and let GitHub flow the text.
-->

Problem
"trace"substring tripped on"stack trace"), perf complaints about team-owned sites (the LLM treated them as ops/CDN issues), and "wrong data / missing events" reports (which are almost always tracking bugs in the team's own code, not PostHog).Changes
"trace"from the heuristic and tightened the Haiku prompt with principled own-code-vs-PostHog-SaaS guidanceposthog_product_hang,sdk_instrumentation_on_own_site,wrong_data_tracking_bug) as boundary regression guards.How did you test this code?
👉 Stay up-to-date with PostHog coding conventions for a smoother review.
Publish to changelog?
Docs update
🤖 Agent context