Skip to content

chore: Tweaks to Haiku prompt to improve repo selection.#60092

Merged
sortafreel merged 3 commits into
masterfrom
repo-selection-haiku-fixes
May 29, 2026
Merged

chore: Tweaks to Haiku prompt to improve repo selection.#60092
sortafreel merged 3 commits into
masterfrom
repo-selection-haiku-fixes

Conversation

@sortafreel
Copy link
Copy Markdown
Contributor

@sortafreel sortafreel commented May 26, 2026

Problem

  • The Haiku gate was filtering out things it shouldn't — SDK crashes (the heuristic's "trace" substring tripped on "stack trace"), perf complaints about team-owned sites (the LLM treated them as ops/CDN issues), and "wrong data / missing events" reports (which are almost always tracking bugs in the team's own code, not PostHog).

Changes

  • Dropped "trace" from the heuristic and tightened the Haiku prompt with principled own-code-vs-PostHog-SaaS guidance
  • Plus an explicit "wrong data = tracking bug → needs_repo" exception; added 3 eval cases (posthog_product_hang, sdk_instrumentation_on_own_site, wrong_data_tracking_bug) as boundary regression guards.

How did you test this code?

👉 Stay up-to-date with PostHog coding conventions for a smoother review.

Publish to changelog?

Docs update

🤖 Agent context

Copy link
Copy Markdown
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@sortafreel sortafreel force-pushed the repo-selection-haiku-fixes branch from d0fb56e to 47c14f7 Compare May 26, 2026 14:08
@sortafreel sortafreel requested review from a team May 26, 2026 15:03
@sortafreel sortafreel marked this pull request as ready for review May 28, 2026 09:43
Copilot AI review requested due to automatic review settings May 28, 2026 09:43
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR tunes the Slack repo-selection classifier so Haiku is less likely to filter out tasks that should proceed to repository discovery, especially SDK/app crashes, team-owned site performance issues, and tracking-related wrong-data reports.

Changes:

  • Removes "trace" from the no-repo heuristic terms to avoid matching “stack trace.”
  • Expands the Haiku prompt with own-code vs PostHog SaaS guidance and a wrong-data tracking exception.
  • Updates the local Slack repo-selection eval with new/updated boundary cases.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
products/slack_app/backend/api.py Adjusts heuristic terms and classifier prompt logic.
posthog/temporal/ai/eval_slack_repo_selection.py Updates expected eval outcomes and adds regression cases for routing boundaries.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1074 to +1077
"the team's code → no_repo. Important exception: 'wrong data', 'missing events', or "
"'numbers look off' in PostHog usually means the team's tracking code is broken (wrong "
"event names, identification logic, SDK setup) — that's a code fix in their repo → "
"needs_repo. When in doubt, lean needs_repo=true — the discovery agent can still report "
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 28, 2026

Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
posthog/temporal/ai/eval_slack_repo_selection.py:169-192
These two cases are now `agent/found` outcomes but still sit under the `# --- Haiku gate short-circuits (heuristic + LLM) ---` section comment. Now that they represent the fixed (pass-through-to-agent) behaviour they would be easier to find alongside the other agent-path cases below the `# --- Agent path ---` marker.

```suggestion
    Case(
        name="posthog_product_hang",
```

Reviews (1): Last reviewed commit: "Merge branch 'master' into repo-selectio..." | Re-trigger Greptile

Comment on lines 169 to +192
Case(
name="marketing_site",
description="Haiku LLM filters as ops/perf rather than code change.",
description="Perf complaint about a site the team likely owns code for — agent should route to docs/marketing repo.",
text_template="@PostHog the docs site loads really slowly on mobile",
thread_messages=[
{"user": "tester", "text": "@PostHog the docs site loads really slowly on mobile"},
{"user": "other", "text": "yeah I noticed the same on /docs/getting-started"},
],
expected_stage="haiku",
expected_outcome="no_repo",
note=(
"Ideally the agent would route this to a docs/marketing repo if connected. "
"Haiku LLM treats it as a perf/CDN question instead. Out of this PR's scope; "
"candidate for a follow-up Haiku-tuning PR backed by this eval."
),
expected_stage="agent",
expected_outcome="found",
),
Case(
name="sdk_specific_trace_trip",
description="Haiku heuristic trips on 'trace' in 'stack trace'.",
description="App/SDK crash with stack trace — agent should route to the relevant SDK repo (e.g. posthog-ios).",
text_template="@PostHog the iOS SDK is crashing on app launch after upgrade to 3.19",
thread_messages=[
{"user": "tester", "text": "@PostHog the iOS SDK is crashing on app launch after upgrade to 3.19"},
{"user": "other", "text": "stack trace shows PostHogReplay.start() failing"},
],
expected_stage="agent",
expected_outcome="found",
),
Case(
name="posthog_product_hang",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 These two cases are now agent/found outcomes but still sit under the # --- Haiku gate short-circuits (heuristic + LLM) --- section comment. Now that they represent the fixed (pass-through-to-agent) behaviour they would be easier to find alongside the other agent-path cases below the # --- Agent path --- marker.

Suggested change
Case(
name="marketing_site",
description="Haiku LLM filters as ops/perf rather than code change.",
description="Perf complaint about a site the team likely owns code for — agent should route to docs/marketing repo.",
text_template="@PostHog the docs site loads really slowly on mobile",
thread_messages=[
{"user": "tester", "text": "@PostHog the docs site loads really slowly on mobile"},
{"user": "other", "text": "yeah I noticed the same on /docs/getting-started"},
],
expected_stage="haiku",
expected_outcome="no_repo",
note=(
"Ideally the agent would route this to a docs/marketing repo if connected. "
"Haiku LLM treats it as a perf/CDN question instead. Out of this PR's scope; "
"candidate for a follow-up Haiku-tuning PR backed by this eval."
),
expected_stage="agent",
expected_outcome="found",
),
Case(
name="sdk_specific_trace_trip",
description="Haiku heuristic trips on 'trace' in 'stack trace'.",
description="App/SDK crash with stack trace — agent should route to the relevant SDK repo (e.g. posthog-ios).",
text_template="@PostHog the iOS SDK is crashing on app launch after upgrade to 3.19",
thread_messages=[
{"user": "tester", "text": "@PostHog the iOS SDK is crashing on app launch after upgrade to 3.19"},
{"user": "other", "text": "stack trace shows PostHogReplay.start() failing"},
],
expected_stage="agent",
expected_outcome="found",
),
Case(
name="posthog_product_hang",
Case(
name="posthog_product_hang",
Prompt To Fix With AI
This is a comment left during a code review.
Path: posthog/temporal/ai/eval_slack_repo_selection.py
Line: 169-192

Comment:
These two cases are now `agent/found` outcomes but still sit under the `# --- Haiku gate short-circuits (heuristic + LLM) ---` section comment. Now that they represent the fixed (pass-through-to-agent) behaviour they would be easier to find alongside the other agent-path cases below the `# --- Agent path ---` marker.

```suggestion
    Case(
        name="posthog_product_hang",
```

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@sortafreel sortafreel merged commit 0073089 into master May 29, 2026
228 checks passed
@sortafreel sortafreel deleted the repo-selection-haiku-fixes branch May 29, 2026 09:29
@deployment-status-posthog
Copy link
Copy Markdown

deployment-status-posthog Bot commented May 29, 2026

Deploy status

Environment Status Deployed At Workflow
dev ✅ Deployed 2026-05-29 09:52 UTC Run
prod-us ✅ Deployed 2026-05-29 10:04 UTC Run
prod-eu ✅ Deployed 2026-05-29 10:07 UTC Run

webjunkie pushed a commit that referenced this pull request May 29, 2026
<!-- Authoring rules (agents: read).
     Title: <type>(<scope>): <description> — type=feat|fix|chore, scope required, lowercase, no period, <72 chars.
       ✅ feat(insights): add retention graph export
       ❌ feat: Added retention export.   (capitalized, period, no scope)
     Description: high-level rationale, not a step-by-step replay.
     Public OSS repo: no internal customers, incidents, or operational metrics.
-->

## Problem

- The Haiku gate was filtering out things it shouldn't — SDK crashes (the heuristic's `"trace"` substring tripped on `"stack trace"`), perf complaints about team-owned sites (the LLM treated them as ops/CDN issues), and "wrong data / missing events" reports (which are almost always tracking bugs in the team's own code, not PostHog).

<!-- Who are we building for, what are their needs, why is this important? -->

<!-- Does this fix an issue? Uncomment the line below with the issue ID to automatically close it when merged -->

<!-- Closes #ISSUE_ID -->

## Changes

- Dropped `"trace"` from the heuristic and tightened the Haiku prompt with principled own-code-vs-PostHog-SaaS guidance 
- Plus an explicit "wrong data = tracking bug → needs_repo" exception; added 3 eval cases (`posthog_product_hang`, `sdk_instrumentation_on_own_site`, `wrong_data_tracking_bug`) as boundary regression guards.

<!-- If there are frontend changes, please include screenshots. -->

<!-- If a reference design was involved, include a link to the relevant Figma frame! -->

## How did you test this code?

<!-- Describe steps to reproduce and verify the changes, and what the expected behavior is. -->

<!-- Include automated tests if possible, otherwise describe the manual testing routine. -->

<!-- Agents: do NOT claim manual testing you haven't done. State that you're an agent and list only the automated tests you actually ran. -->

👉 _Stay up-to-date with_ [_PostHog coding conventions_](https://posthog.com/docs/contribute/coding-conventions) _for a smoother review._

## Publish to changelog?

<!-- For features only -->

<!-- If publishing, you must provide changelog details in the #changelog Slack channel. You will receive a follow-up PR comment or notification. -->

<!-- If not, write "no" or "do not publish to changelog" to explicitly opt-out of posting to #changelog. Removing this entire section will not prevent posting. -->

## Docs update

<!-- Add the `skip-inkeep-docs` label if this PR should not trigger an automatic docs update from the Inkeep agent. -->

## 🤖 Agent context

<!-- Fill this section if an agent co-authored or authored this PR. Remove it for fully human-authored PRs. -->

<!-- Include:
     - tools/agent used and link to session. List the agent and tool names used, but do not include tool call results.
     - decisions made along the way (what was tried, rejected, chosen, and why)
     - anything else that helps reviewers
     Write reviewer-facing prose. Do not paste user prompts verbatim — paraphrase the intent in your own words.
     DO NOT INCLUDE sensitive data that may have been shared in an agent session.
-->

<!-- Rules for agent-authored PRs:
     - All PRs must be attributable to a human author, even if agent-assisted.
     - Do not add a human Co-authored-by just for the sake of attribution — if no human was involved in the changes, own it as agent-authored.
     - Agent-authored PRs always require human review — do not self-merge or auto-approve.
     - Do NOT claim manual testing you haven't done.
     - GitHub PR descriptions render markdown, not fixed-width text. Do not hard-wrap prose at a column width or use space-aligned tables — use real markdown tables, headings, and fenced code blocks, and let GitHub flow the text.
-->
mayteio pushed a commit that referenced this pull request May 29, 2026
<!-- Authoring rules (agents: read).
     Title: <type>(<scope>): <description> — type=feat|fix|chore, scope required, lowercase, no period, <72 chars.
       ✅ feat(insights): add retention graph export
       ❌ feat: Added retention export.   (capitalized, period, no scope)
     Description: high-level rationale, not a step-by-step replay.
     Public OSS repo: no internal customers, incidents, or operational metrics.
-->

## Problem

- The Haiku gate was filtering out things it shouldn't — SDK crashes (the heuristic's `"trace"` substring tripped on `"stack trace"`), perf complaints about team-owned sites (the LLM treated them as ops/CDN issues), and "wrong data / missing events" reports (which are almost always tracking bugs in the team's own code, not PostHog).

<!-- Who are we building for, what are their needs, why is this important? -->

<!-- Does this fix an issue? Uncomment the line below with the issue ID to automatically close it when merged -->

<!-- Closes #ISSUE_ID -->

## Changes

- Dropped `"trace"` from the heuristic and tightened the Haiku prompt with principled own-code-vs-PostHog-SaaS guidance 
- Plus an explicit "wrong data = tracking bug → needs_repo" exception; added 3 eval cases (`posthog_product_hang`, `sdk_instrumentation_on_own_site`, `wrong_data_tracking_bug`) as boundary regression guards.

<!-- If there are frontend changes, please include screenshots. -->

<!-- If a reference design was involved, include a link to the relevant Figma frame! -->

## How did you test this code?

<!-- Describe steps to reproduce and verify the changes, and what the expected behavior is. -->

<!-- Include automated tests if possible, otherwise describe the manual testing routine. -->

<!-- Agents: do NOT claim manual testing you haven't done. State that you're an agent and list only the automated tests you actually ran. -->

👉 _Stay up-to-date with_ [_PostHog coding conventions_](https://posthog.com/docs/contribute/coding-conventions) _for a smoother review._

## Publish to changelog?

<!-- For features only -->

<!-- If publishing, you must provide changelog details in the #changelog Slack channel. You will receive a follow-up PR comment or notification. -->

<!-- If not, write "no" or "do not publish to changelog" to explicitly opt-out of posting to #changelog. Removing this entire section will not prevent posting. -->

## Docs update

<!-- Add the `skip-inkeep-docs` label if this PR should not trigger an automatic docs update from the Inkeep agent. -->

## 🤖 Agent context

<!-- Fill this section if an agent co-authored or authored this PR. Remove it for fully human-authored PRs. -->

<!-- Include:
     - tools/agent used and link to session. List the agent and tool names used, but do not include tool call results.
     - decisions made along the way (what was tried, rejected, chosen, and why)
     - anything else that helps reviewers
     Write reviewer-facing prose. Do not paste user prompts verbatim — paraphrase the intent in your own words.
     DO NOT INCLUDE sensitive data that may have been shared in an agent session.
-->

<!-- Rules for agent-authored PRs:
     - All PRs must be attributable to a human author, even if agent-assisted.
     - Do not add a human Co-authored-by just for the sake of attribution — if no human was involved in the changes, own it as agent-authored.
     - Agent-authored PRs always require human review — do not self-merge or auto-approve.
     - Do NOT claim manual testing you haven't done.
     - GitHub PR descriptions render markdown, not fixed-width text. Do not hard-wrap prose at a column width or use space-aligned tables — use real markdown tables, headings, and fenced code blocks, and let GitHub flow the text.
-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants