feat(experiments/mcp): Write diagnosing-experiment-results skill and Evals by mp-hog · Pull Request #59029 · PostHog/posthog

mp-hog · 2026-05-19T14:50:35Z

Problem

Users frequently ask "why does my experiment look wrong / biased / empty?". There was no symptom-driven path from the user's complaint to the right diagnostic, and no eval coverage for this class of question.

Changes

New skill diagnosing-experiment-results with a symptom-to-diagnostic-logic table and five reference files covering:
- bias & skew
- empty experiments
- significance / interpretation traps
- PostHog-vs-SQL discrepancies
- mid-run-change surprises
Sandboxed evals for each diagnostic group, plus shared seeders and scorers (including an anti-bundle check that fails the agent if it collapses co-occurring findings into a single conclusion).

How did you test this code?

The basis for all the knowledge were actual support interactions, docs and the code itself
Running through many sanitization iterations:
- Reduce to actual issues (not hallucinated)
- Cleaning up content to be generally applicable (not too custom, not overfitted)
- Make it actionable and supportive
- Validating claims against code and SQL
New sandboxed Evals
Agent simulating against posthogs own experiments: Tested against some of our own experiments and checking if answers match desired diagnosis

Publish to changelog?

no

Docs update

No

🤖 Agent context

Authored with Claude Code across many different sessions

github-actions · 2026-05-19T15:03:12Z

🎭 Playwright didn't run on this PR — your changes touch code that could affect E2E behavior, but Playwright is opt-in via label now to keep CI cost down.

Add the run-playwright label if you want an E2E sweep before merging — CI will pick it up automatically.

Most PRs don't need this. Real regressions still get caught on master and fix-forward.

github-actions · 2026-05-19T15:10:11Z

Size Change: -7.37 kB (-0.01%)

Total Size: 118 MB

📦 View Changed

Filename	Size	Change
`frontend/dist/SupportTicketScene`	26.4 kB	-7.37 kB (-21.8%)	🎉

ℹ️ View Unchanged

Filename	Size	Change
`frontend/dist/368Hedgehogs`	5.47 kB	0 B
`frontend/dist/abap`	14.2 kB	0 B
`frontend/dist/AccountConnected`	2.99 kB	0 B
`frontend/dist/Action`	24.8 kB	0 B
`frontend/dist/Actions`	1.2 kB	0 B
`frontend/dist/AdvancedActivityLogsScene`	39.9 kB	0 B
`frontend/dist/AgenticAuthorize`	5.7 kB	0 B
`frontend/dist/apex`	3.99 kB	0 B
`frontend/dist/ApprovalDetail`	16.5 kB	0 B
`frontend/dist/architecture-7EHR7CIX`	372 B	0 B
`frontend/dist/architectureDiagram-3BPJPVTR`	151 kB	0 B
`frontend/dist/array.full.es5.js`	347 kB	0 B
`frontend/dist/array.full.js`	427 kB	0 B
`frontend/dist/array.js`	191 kB	0 B
`frontend/dist/AsyncMigrations`	13.4 kB	0 B
`frontend/dist/AuthenticatedShell`	176 kB	0 B
`frontend/dist/AuthorizationStatus`	969 B	0 B
`frontend/dist/azcli`	885 B	0 B
`frontend/dist/bat`	1.88 kB	0 B
`frontend/dist/BatchExportScene`	60.8 kB	0 B
`frontend/dist/bicep`	2.59 kB	0 B
`frontend/dist/Billing`	731 B	0 B
`frontend/dist/BillingSection`	21 kB	0 B
`frontend/dist/blockDiagram-GPEHLZMM`	72.5 kB	0 B
`frontend/dist/BoxPlot`	5.29 kB	+35 B (+0.67%)
`frontend/dist/browserAll-0QZMN1W2`	37.4 kB	0 B
`frontend/dist/BusinessKnowledgeScene`	18.9 kB	0 B
`frontend/dist/ButtonPrimitives`	796 B	+34 B (+4.46%)
`frontend/dist/c4Diagram-AAUBKEIU`	70.8 kB	0 B
`frontend/dist/CalendarHeatMap`	9 kB	0 B
`frontend/dist/cameligo`	2.23 kB	0 B
`frontend/dist/changeRequestsLogic`	748 B	0 B
`frontend/dist/classDiagram-4FO5ZUOK`	1.28 kB	0 B
`frontend/dist/classDiagram-v2-Q7XG4LA2`	1.28 kB	0 B
`frontend/dist/CLIAuthorize`	11.6 kB	0 B
`frontend/dist/CLILive`	4.26 kB	0 B
`frontend/dist/clojure`	9.68 kB	0 B
`frontend/dist/CodeEditorInline`	730 B	+34 B (+4.89%)	🔍
`frontend/dist/coffee`	3.63 kB	0 B
`frontend/dist/Cohort`	27.7 kB	0 B
`frontend/dist/CohortCalculationHistory`	6.47 kB	0 B
`frontend/dist/Cohorts`	9.64 kB	0 B
`frontend/dist/ConfirmOrganization`	4.76 kB	0 B
`frontend/dist/conversations.js`	67.3 kB	0 B
`frontend/dist/cose-bilkent-S5V4N54A`	82.8 kB	0 B
`frontend/dist/Coupons`	963 B	0 B
`frontend/dist/cpp`	5.33 kB	0 B
`frontend/dist/Create`	898 B	0 B
`frontend/dist/crisp-chat-integration.js`	1.97 kB	0 B
`frontend/dist/csharp`	4.56 kB	0 B
`frontend/dist/csp`	1.45 kB	0 B
`frontend/dist/css`	4.54 kB	0 B
`frontend/dist/cssMode`	4.2 kB	0 B
`frontend/dist/CustomCssScene`	3.8 kB	0 B
`frontend/dist/CustomerAnalyticsConfigurationScene`	2.27 kB	-36 B (-1.56%)
`frontend/dist/CustomerAnalyticsScene`	28.1 kB	0 B
`frontend/dist/CustomerJourneyBuilderScene`	2.08 kB	0 B
`frontend/dist/CustomerJourneyTemplatesScene`	7.72 kB	0 B
`frontend/dist/customizations.full.js`	18 kB	0 B
`frontend/dist/CyclotronJobInputAssignee`	1.57 kB	0 B
`frontend/dist/CyclotronJobInputBusinessHours`	2.92 kB	0 B
`frontend/dist/CyclotronJobInputTicketTags`	954 B	0 B
`frontend/dist/cypher`	3.42 kB	0 B
`frontend/dist/dagre-BM42HDAG`	11.9 kB	0 B
`frontend/dist/dart`	4.29 kB	0 B
`frontend/dist/Dashboard`	1.38 kB	0 B
`frontend/dist/Dashboards`	21.8 kB	0 B
`frontend/dist/DashboardTemplateCopyScene`	5.95 kB	0 B
`frontend/dist/DataManagementScene`	884 B	0 B
`frontend/dist/DataPipelinesNewScene`	2.55 kB	0 B
`frontend/dist/DataWarehouseScene`	46.5 kB	0 B
`frontend/dist/Deactivated`	1.37 kB	0 B
`frontend/dist/dead-clicks-autocapture.js`	14.3 kB	0 B
`frontend/dist/DeadLetterQueue`	5.63 kB	0 B
`frontend/dist/DebugScene`	20.2 kB	0 B
`frontend/dist/decompressionWorker`	2.85 kB	0 B
`frontend/dist/decompressionWorker.js`	2.85 kB	0 B
`frontend/dist/DecompressionWorkerManager`	329 B	0 B
`frontend/dist/DefinitionEdit`	8.82 kB	0 B
`frontend/dist/DefinitionView`	24.3 kB	0 B
`frontend/dist/Deployment`	3.94 kB	+35 B (+0.9%)
`frontend/dist/DeploymentProject`	5.43 kB	0 B
`frontend/dist/Deployments`	9.17 kB	0 B
`frontend/dist/DestinationsScene`	2.92 kB	0 B
`frontend/dist/diagram-2AECGRRQ`	6.66 kB	0 B
`frontend/dist/diagram-5GNKFQAL`	3.61 kB	0 B
`frontend/dist/diagram-KO2AKTUF`	11.5 kB	0 B
`frontend/dist/diagram-LMA3HP47`	5.02 kB	0 B
`frontend/dist/diagram-OG6HWLK6`	11.8 kB	0 B
`frontend/dist/dist`	643 B	0 B
`frontend/dist/dockerfile`	1.91 kB	0 B
`frontend/dist/EarlyAccessFeature`	991 B	0 B
`frontend/dist/EarlyAccessFeatures`	3.05 kB	0 B
`frontend/dist/ecl`	5.38 kB	0 B
`frontend/dist/EditorScene`	1.38 kB	0 B
`frontend/dist/elixir`	10.3 kB	0 B
`frontend/dist/elk.bundled`	1.44 MB	0 B
`frontend/dist/EmailMFAVerify`	3.26 kB	0 B
`frontend/dist/EndpointScene`	39.9 kB	0 B
`frontend/dist/EndpointsScene`	23.9 kB	0 B
`frontend/dist/erDiagram-TEJ5UH35`	27.7 kB	0 B
`frontend/dist/ErrorTrackingIssueFingerprintsScene`	7.26 kB	0 B
`frontend/dist/ErrorTrackingIssueScene`	101 kB	+979 B (+0.98%)
`frontend/dist/ErrorTrackingScene`	27.1 kB	+75 B (+0.28%)
`frontend/dist/EvaluationTemplates`	779 B	0 B
`frontend/dist/eventmodeling-FCH6USID`	375 B	0 B
`frontend/dist/EventsScene`	2.81 kB	0 B
`frontend/dist/exception-autocapture.js`	11.8 kB	0 B
`frontend/dist/Experiment`	211 kB	0 B
`frontend/dist/Experiments`	19.8 kB	0 B
`frontend/dist/exporter`	19 kB	0 B
`frontend/dist/exporter.js`	19 kB	0 B
`frontend/dist/ExporterDashboardScene`	1.78 kB	0 B
`frontend/dist/ExporterHeatmapScene`	19.5 kB	0 B
`frontend/dist/ExporterInsightScene`	2.88 kB	0 B
`frontend/dist/ExporterInterviewScene`	309 kB	0 B
`frontend/dist/ExporterNotebookScene`	2.71 MB	0 B
`frontend/dist/ExporterRecordingScene`	995 B	0 B
`frontend/dist/exporterSharedChunkAnchors`	288 kB	0 B
`frontend/dist/exporterSharedChunkAnchors.js`	288 kB	0 B
`frontend/dist/ExportsScene`	4.23 kB	0 B
`frontend/dist/FeatureFlag`	134 kB	0 B
`frontend/dist/FeatureFlags`	844 B	0 B
`frontend/dist/FeatureFlagTemplatesScene`	7.28 kB	0 B
`frontend/dist/FlappyHog`	5.99 kB	0 B
`frontend/dist/flow9`	1.85 kB	0 B
`frontend/dist/flowDiagram-I6XJVG4X`	61.6 kB	0 B
`frontend/dist/freemarker2`	16.7 kB	0 B
`frontend/dist/fsharp`	3.02 kB	0 B
`frontend/dist/ganttDiagram-6RSMTGT7`	50.9 kB	0 B
`frontend/dist/gitGraph-WXDBUCRP`	360 B	0 B
`frontend/dist/gitGraphDiagram-PVQCEYII`	30.2 kB	0 B
`frontend/dist/go`	2.69 kB	0 B
`frontend/dist/graphql`	2.3 kB	0 B
`frontend/dist/Group`	15.1 kB	0 B
`frontend/dist/Groups`	4.15 kB	0 B
`frontend/dist/GroupsNew`	7.58 kB	0 B
`frontend/dist/handlebars`	7.38 kB	0 B
`frontend/dist/hcl`	3.63 kB	0 B
`frontend/dist/HealthCategoryDetailScene`	7.48 kB	0 B
`frontend/dist/HealthScene`	12.4 kB	0 B
`frontend/dist/HeatmapNewScene`	5.27 kB	0 B
`frontend/dist/HeatmapRecordingScene`	4.25 kB	0 B
`frontend/dist/HeatmapScene`	6.8 kB	0 B
`frontend/dist/HeatmapsScene`	4.13 kB	0 B
`frontend/dist/hls`	394 kB	0 B
`frontend/dist/HogFunctionScene`	59.5 kB	0 B
`frontend/dist/hogql_parser_wasm_browser`	1.53 MB	0 B
`frontend/dist/HogRepl`	7.61 kB	0 B
`frontend/dist/html`	5.62 kB	0 B
`frontend/dist/htmlMode`	4.65 kB	0 B
`frontend/dist/image-blob-reduce.esm`	49.5 kB	0 B
`frontend/dist/InboxScene`	59.9 kB	0 B
`frontend/dist/index`	60.7 kB	0 B
`frontend/dist/index.js`	60.7 kB	0 B
`frontend/dist/info-J43DQDTF`	348 B	0 B
`frontend/dist/infoDiagram-5YYISTIA`	1.32 kB	0 B
`frontend/dist/ini`	1.14 kB	0 B
`frontend/dist/InsightQuickStart`	5.67 kB	0 B
`frontend/dist/InsightScene`	34.6 kB	0 B
`frontend/dist/IntegrationsRedirect`	976 B	0 B
`frontend/dist/intercom-integration.js`	2.03 kB	0 B
`frontend/dist/InviteSignup`	15.2 kB	0 B
`frontend/dist/ishikawaDiagram-YF4QCWOH`	18 kB	0 B
`frontend/dist/java`	3.26 kB	0 B
`frontend/dist/javascript`	1.02 kB	0 B
`frontend/dist/journeyDiagram-JHISSGLW`	24 kB	0 B
`frontend/dist/jsonMode`	13.9 kB	0 B
`frontend/dist/julia`	7.26 kB	0 B
`frontend/dist/kanban-definition-UN3LZRKU`	21.2 kB	0 B
`frontend/dist/katex`	266 kB	0 B
`frontend/dist/kotlin`	3.44 kB	0 B
`frontend/dist/lazy`	146 kB	0 B
`frontend/dist/LegacyPluginScene`	20.9 kB	0 B
`frontend/dist/LegalDocumentNewScene`	59.6 kB	-37 B (-0.06%)
`frontend/dist/LegalDocumentsScene`	5.21 kB	0 B
`frontend/dist/LemonTextAreaMarkdown`	706 B	0 B
`frontend/dist/less`	3.93 kB	0 B
`frontend/dist/lexon`	2.47 kB	0 B
`frontend/dist/lib`	2.25 kB	0 B
`frontend/dist/Link`	733 B	0 B
`frontend/dist/LinkScene`	25 kB	-36 B (-0.14%)
`frontend/dist/LinksScene`	4.4 kB	0 B
`frontend/dist/liquid`	4.57 kB	0 B
`frontend/dist/LiveDebugger`	19.4 kB	0 B
`frontend/dist/LiveEventsTable`	5.47 kB	0 B
`frontend/dist/LLMAnalyticsClusterScene`	21.5 kB	+40 B (+0.19%)
`frontend/dist/LLMAnalyticsClustersScene`	54.8 kB	0 B
`frontend/dist/LLMAnalyticsDatasetScene`	20.7 kB	-34 B (-0.16%)
`frontend/dist/LLMAnalyticsDatasetsScene`	3.49 kB	0 B
`frontend/dist/LLMAnalyticsEvaluation`	59.6 kB	0 B
`frontend/dist/LLMAnalyticsEvaluationsScene`	28 kB	-34 B (-0.12%)
`frontend/dist/LLMAnalyticsPlaygroundScene`	37.5 kB	0 B
`frontend/dist/LLMAnalyticsScene`	117 kB	-345 B (-0.29%)
`frontend/dist/LLMAnalyticsSessionScene`	13.6 kB	-41 B (-0.3%)
`frontend/dist/LLMAnalyticsTag`	27.3 kB	-36 B (-0.13%)
`frontend/dist/LLMAnalyticsTagsScene`	7.15 kB	0 B
`frontend/dist/LLMAnalyticsTraceScene`	130 kB	0 B
`frontend/dist/LLMAnalyticsUsers`	730 B	-34 B (-4.45%)
`frontend/dist/LLMASessionFeedbackDisplay`	5.04 kB	0 B
`frontend/dist/LLMPromptScene`	17.8 kB	0 B
`frontend/dist/LLMPromptsScene`	4.68 kB	0 B
`frontend/dist/LLMSkillScene`	827 B	0 B
`frontend/dist/LLMSkillsScene`	810 B	0 B
`frontend/dist/Login`	8.86 kB	0 B
`frontend/dist/Login2FA`	4.49 kB	0 B
`frontend/dist/logs.js`	38.9 kB	0 B
`frontend/dist/LogsAlertDetailScene`	17.2 kB	0 B
`frontend/dist/LogsSamplingDetailScene`	4.63 kB	0 B
`frontend/dist/LogsSamplingNewScene`	2.15 kB	+40 B (+1.89%)
`frontend/dist/LogsScene`	17.7 kB	0 B
`frontend/dist/lua`	2.16 kB	0 B
`frontend/dist/m3`	2.85 kB	0 B
`frontend/dist/main`	819 kB	0 B
`frontend/dist/ManagedMigration`	14.4 kB	0 B
`frontend/dist/markdown`	3.83 kB	0 B
`frontend/dist/MarketingAnalyticsScene`	40.3 kB	0 B
`frontend/dist/MaterializedColumns`	10.4 kB	0 B
`frontend/dist/Max`	888 B	0 B
`frontend/dist/mdx`	5.43 kB	0 B
`frontend/dist/memlens.lib.bundle`	27.9 kB	0 B
`frontend/dist/mermaid.core`	28.5 kB	0 B
`frontend/dist/MermaidDiagram`	2.11 kB	0 B
`frontend/dist/MessageTemplate`	16.5 kB	0 B
`frontend/dist/MetricsScene`	1.04 kB	-34 B (-3.16%)
`frontend/dist/mindmap-definition-RKZ34NQL`	24.8 kB	0 B
`frontend/dist/mips`	2.62 kB	0 B
`frontend/dist/ModelsScene`	18.9 kB	0 B
`frontend/dist/MonacoDiffEditor`	471 B	0 B
`frontend/dist/monacoEditorWorker`	288 kB	0 B
`frontend/dist/monacoEditorWorker.js`	288 kB	0 B
`frontend/dist/monacoJsonWorker`	419 kB	0 B
`frontend/dist/monacoJsonWorker.js`	419 kB	0 B
`frontend/dist/monacoTsWorker`	7.02 MB	0 B
`frontend/dist/monacoTsWorker.js`	7.02 MB	0 B
`frontend/dist/MoveToPostHogCloud`	4.7 kB	0 B
`frontend/dist/msdax`	4.95 kB	0 B
`frontend/dist/mysql`	11.3 kB	0 B
`frontend/dist/NavTabChat`	7.41 kB	0 B
`frontend/dist/NewSourceScene`	980 B	0 B
`frontend/dist/NewTabScene`	1.72 kB	0 B
`frontend/dist/NodeDetailScene`	16.9 kB	0 B
`frontend/dist/NotebookCanvasScene`	3.55 kB	0 B
`frontend/dist/NotebookPanel`	5.51 kB	0 B
`frontend/dist/NotebookScene`	8.75 kB	0 B
`frontend/dist/NotebooksScene`	7.84 kB	0 B
`frontend/dist/OAuthAuthorize`	844 B	0 B
`frontend/dist/objective-c`	2.44 kB	0 B
`frontend/dist/Onboarding`	771 kB	0 B
`frontend/dist/OnboardingCouponRedemption`	1.44 kB	0 B
`frontend/dist/packet-YPE3B663`	354 B	0 B
`frontend/dist/pascal`	3.03 kB	0 B
`frontend/dist/pascaligo`	2.04 kB	0 B
`frontend/dist/passkeyLogic`	722 B	0 B
`frontend/dist/PasswordReset`	4.6 kB	0 B
`frontend/dist/PasswordResetComplete`	3.23 kB	0 B
`frontend/dist/PendingDeletion`	2.38 kB	0 B
`frontend/dist/perl`	8.29 kB	0 B
`frontend/dist/PersonScene`	18.5 kB	0 B
`frontend/dist/PersonsScene`	5.91 kB	0 B
`frontend/dist/pgsql`	13.5 kB	0 B
`frontend/dist/php`	8.06 kB	0 B
`frontend/dist/pie-LRSECV5Y`	345 B	0 B
`frontend/dist/pieDiagram-4H26LBE5`	4.92 kB	0 B
`frontend/dist/PipelineStatusScene`	9.35 kB	0 B
`frontend/dist/pla`	1.72 kB	0 B
`frontend/dist/posthog`	146 kB	0 B
`frontend/dist/postiats`	7.9 kB	0 B
`frontend/dist/powerquery`	17 kB	0 B
`frontend/dist/powershell`	3.31 kB	0 B
`frontend/dist/PreflightCheck`	5.81 kB	0 B
`frontend/dist/product-tours.js`	115 kB	0 B
`frontend/dist/ProductTour`	275 kB	0 B
`frontend/dist/ProductTours`	4.92 kB	0 B
`frontend/dist/ProjectHomepage`	19.9 kB	0 B
`frontend/dist/protobuf`	9.09 kB	0 B
`frontend/dist/pug`	4.86 kB	0 B
`frontend/dist/python`	4.8 kB	0 B
`frontend/dist/qsharp`	3.23 kB	0 B
`frontend/dist/quadrantDiagram-W4KKPZXB`	34.4 kB	0 B
`frontend/dist/QueryPerformance`	8.86 kB	0 B
`frontend/dist/r`	3.16 kB	0 B
`frontend/dist/radar-GUYGQ44K`	351 B	0 B
`frontend/dist/razor`	9.38 kB	0 B
`frontend/dist/react-json-view`	121 kB	0 B
`frontend/dist/recorder-v2.js`	98.6 kB	0 B
`frontend/dist/recorder.js`	98.6 kB	0 B
`frontend/dist/redis`	3.59 kB	0 B
`frontend/dist/redshift`	11.8 kB	0 B
`frontend/dist/RegionMap`	29.6 kB	0 B
`frontend/dist/render-query`	26.7 MB	0 B
`frontend/dist/render-query.js`	26.7 MB	0 B
`frontend/dist/ReplayLens`	21.8 kB	0 B
`frontend/dist/ReplayLensesScene`	12.2 kB	+36 B (+0.3%)
`frontend/dist/requirementDiagram-4Y6WPE33`	31.9 kB	0 B
`frontend/dist/ResourceTransfer`	9.42 kB	0 B
`frontend/dist/restructuredtext`	3.94 kB	0 B
`frontend/dist/RevenueAnalyticsScene`	25.8 kB	0 B
`frontend/dist/ruby`	8.54 kB	0 B
`frontend/dist/rust`	4.2 kB	0 B
`frontend/dist/sankeyDiagram-5OEKKPKP`	24 kB	0 B
`frontend/dist/SavedInsights`	902 B	0 B
`frontend/dist/sb`	1.86 kB	0 B
`frontend/dist/scala`	7.36 kB	0 B
`frontend/dist/schema`	720 kB	0 B
`frontend/dist/SchemaScene`	21.2 kB	-36 B (-0.17%)
`frontend/dist/scheme`	1.8 kB	0 B
`frontend/dist/scss`	6.45 kB	0 B
`frontend/dist/SdkDoctorScene`	9.65 kB	0 B
`frontend/dist/sequenceDiagram-3UESZ5HK`	117 kB	0 B
`frontend/dist/SessionAttributionExplorerScene`	6.87 kB	0 B
`frontend/dist/SessionGroupSummariesTable`	4.84 kB	0 B
`frontend/dist/SessionGroupSummaryScene`	17.2 kB	+36 B (+0.21%)
`frontend/dist/SessionProfileScene`	15.3 kB	0 B
`frontend/dist/SessionRecordingDetail`	2 kB	0 B
`frontend/dist/SessionRecordingFilePlaybackScene`	4.71 kB	0 B
`frontend/dist/SessionRecordings`	980 B	0 B
`frontend/dist/SessionRecordingsKiosk`	10.2 kB	0 B
`frontend/dist/SessionRecordingsPlaylistScene`	5.28 kB	0 B
`frontend/dist/SessionRecordingsSettingsScene`	2.14 kB	0 B
`frontend/dist/SessionsScene`	4.22 kB	0 B
`frontend/dist/SettingsScene`	3.33 kB	0 B
`frontend/dist/sharedChunkAnchors`	236 kB	0 B
`frontend/dist/sharedChunkAnchors.js`	236 kB	0 B
`frontend/dist/SharedMetric`	6.03 kB	0 B
`frontend/dist/SharedMetrics`	787 B	0 B
`frontend/dist/shell`	3.11 kB	0 B
`frontend/dist/SignupContainer`	28.4 kB	0 B
`frontend/dist/Site`	1.43 kB	0 B
`frontend/dist/solidity`	18.6 kB	0 B
`frontend/dist/sophia`	2.8 kB	0 B
`frontend/dist/SourceScene`	962 B	0 B
`frontend/dist/SourcesScene`	6.17 kB	0 B
`frontend/dist/sparql`	2.59 kB	0 B
`frontend/dist/sql`	10.3 kB	0 B
`frontend/dist/SqlVariableEditScene`	7.49 kB	0 B
`frontend/dist/st`	7.44 kB	0 B
`frontend/dist/StartupProgram`	21.4 kB	0 B
`frontend/dist/stateDiagram-AJRCARHV`	11.3 kB	0 B
`frontend/dist/stateDiagram-v2-BHNVJYJU`	1.18 kB	0 B
`frontend/dist/StripeConfirmInstall`	3.78 kB	0 B
`frontend/dist/SubscriptionScene`	14.4 kB	0 B
`frontend/dist/SubscriptionsScene`	5.42 kB	0 B
`frontend/dist/SupportSettingsScene`	1.68 kB	+35 B (+2.13%)
`frontend/dist/SupportTicketsScene`	971 B	+34 B (+3.63%)
`frontend/dist/Survey`	1.12 kB	0 B
`frontend/dist/SurveyFormBuilder`	1.78 kB	0 B
`frontend/dist/Surveys`	26.6 kB	0 B
`frontend/dist/surveys.js`	94.7 kB	0 B
`frontend/dist/SurveyWizard`	71.5 kB	0 B
`frontend/dist/swift`	5.3 kB	0 B
`frontend/dist/SystemStatus`	17.3 kB	0 B
`frontend/dist/systemverilog`	7.65 kB	0 B
`frontend/dist/TaskDetailScene`	23.3 kB	0 B
`frontend/dist/TaskTracker`	14.5 kB	0 B
`frontend/dist/tcl`	3.61 kB	0 B
`frontend/dist/TextCardMarkdownEditor`	11.2 kB	+28 B (+0.25%)
`frontend/dist/timeline-definition-PNZ67QCA`	31.3 kB	0 B
`frontend/dist/toolbar`	15 MB	0 B
`frontend/dist/toolbar.js`	15 MB	0 B
`frontend/dist/ToolbarLaunch`	2.71 kB	0 B
`frontend/dist/tracing-headers.js`	1.74 kB	0 B
`frontend/dist/TracingScene`	53.9 kB	-39 B (-0.07%)
`frontend/dist/TransformationsScene`	6.4 kB	0 B
`frontend/dist/treemap-LRROVOQU`	357 B	0 B
`frontend/dist/treeView-BLDUP644`	360 B	0 B
`frontend/dist/TrendsBarChart`	6.84 kB	0 B
`frontend/dist/TrendsLineChart`	6.84 kB	0 B
`frontend/dist/tsMode`	24 kB	0 B
`frontend/dist/twig`	6.01 kB	0 B
`frontend/dist/TwoFactorReset`	4.27 kB	0 B
`frontend/dist/typescript`	274 B	0 B
`frontend/dist/typespec`	2.86 kB	0 B
`frontend/dist/Unsubscribe`	1.9 kB	0 B
`frontend/dist/UserInterview`	6.25 kB	0 B
`frontend/dist/UserInterviewResponse`	5.54 kB	0 B
`frontend/dist/UserInterviews`	3.77 kB	0 B
`frontend/dist/vb`	5.83 kB	0 B
`frontend/dist/vennDiagram-CIIHVFJN`	41.6 kB	0 B
`frontend/dist/VercelConnect`	5.23 kB	0 B
`frontend/dist/VercelLinkError`	2.5 kB	0 B
`frontend/dist/VerifyEmail`	5.02 kB	0 B
`frontend/dist/vimMode`	211 kB	0 B
`frontend/dist/VisualReviewIndexScene`	2.41 kB	0 B
`frontend/dist/VisualReviewRunScene`	44.3 kB	0 B
`frontend/dist/VisualReviewRunsScene`	7.21 kB	0 B
`frontend/dist/VisualReviewSettingsScene`	11 kB	-35 B (-0.32%)
`frontend/dist/VisualReviewSnapshotHistoryScene`	12.4 kB	+37 B (+0.3%)
`frontend/dist/VisualReviewSnapshotOverviewScene`	17.5 kB	-36 B (-0.21%)
`frontend/dist/wardley-L42UT6IY`	352 B	0 B
`frontend/dist/wardleyDiagram-YWT4CUSO`	26.2 kB	0 B
`frontend/dist/web-vitals-with-attribution.js`	11.8 kB	0 B
`frontend/dist/web-vitals.js`	6.39 kB	0 B
`frontend/dist/WebAnalyticsScene`	9.59 kB	0 B
`frontend/dist/WebGLRenderer-DYjOwNoG`	60.4 kB	0 B
`frontend/dist/WebGPURenderer-B_wkl_Ja`	36.3 kB	0 B
`frontend/dist/WebScriptsScene`	2.78 kB	0 B
`frontend/dist/WebVitals`	7.41 kB	0 B
`frontend/dist/WebVitalsPathBreakdown`	3.89 kB	0 B
`frontend/dist/webworkerAll-puPV1rBA`	397 B	0 B
`frontend/dist/wgsl`	7.38 kB	0 B
`frontend/dist/Wizard`	4.7 kB	0 B
`frontend/dist/WorkflowScene`	110 kB	0 B
`frontend/dist/WorkflowsScene`	60.1 kB	0 B
`frontend/dist/WorldMap`	1.04 MB	0 B
`frontend/dist/xml`	3.02 kB	0 B
`frontend/dist/xychartDiagram-2RQKCTM6`	39.6 kB	0 B
`frontend/dist/yaml`	4.64 kB	0 B

_{compressed-size-action}

stamphog

Large PR (3176 lines, 16 files) with zero reviews that exceeded the size gate ceiling. The changes span new eval test infrastructure, extensive skill documentation, and auto-generated API type updates across two products — needs a human reviewer to validate correctness and scope.

greptile-apps · 2026-05-19T15:26:29Z

Prompt To Fix All With AI

Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
ee/hogai/eval/sandboxed/experiments/eval_midrun_changes.py:85-135
**Behavioral check bundled into `diagnosis_group` — violates the anti-bundling pattern established in `eval_interpretation_traps.py`**

The `ship_variant_under_uncertainty` case explicitly tests "should I ship?" and the case description states the correct answer is "not yet / not on this evidence." However, the behavioral check ("advises AGAINST") is embedded inside the `diagnosis_group` description text rather than enforced with a separate `AdvisesAgainstShipping` scorer + `"advises_against_shipping": True` in `expected`.

`eval_interpretation_traps.py` contains an explicit comment explaining exactly why this bundling is a regression vector:
> "Both are needed: the previous version of this case relied on `diagnosis_group` alone, where the LLM judge's 'must recommend waiting' criterion was bundled into the diagnosis description — splitting them separates content from behavior and makes regressions in either dimension visible independently."

The same failure mode applies here: the agent can identify the diagnostic correctly but still recommend shipping, and that behavioral regression would not be caught by `CitesDiagnosticGroup` alone. `AdvisesAgainstShipping` should be added to the scorers list, and `"advises_against_shipping": True` should be added to this case's `expected` dict — mirroring the pattern in `eval_interpretation_traps.py`.

_{Reviews (1): Last reviewed commit: "Fix CI" | Re-trigger Greptile}

jurajmajerik

Great job! 🙌

…lts-skill

deployment-status-posthog · 2026-05-20T15:33:21Z

Deploy status

Environment	Status	Deployed At	Workflow
dev	✅ Deployed	2026-05-20 15:33 UTC	Run
prod-us	✅ Deployed	2026-05-20 15:48 UTC	Run
prod-eu	✅ Deployed	2026-05-20 15:55 UTC	Run

Write diagnosing-experiment-results skill and Evals

d2129f5

mp-hog added the skip-inkeep-docs Use this label to skip an Inkeep docs PR in posthog.com label May 19, 2026

chore: update OpenAPI generated types

27780da

Fix CI

6eb222f

mp-hog marked this pull request as ready for review May 19, 2026 15:23

mp-hog added the stamphog Request AI review from stamphog label May 19, 2026

assign-reviewers-posthog Bot requested review from a team, TueHaulund, arnohillen, fasyy612, ksvat and nicowaltz and removed request for a team May 19, 2026 15:23

stamphog Bot reviewed May 19, 2026

View reviewed changes

stamphog Bot removed the stamphog Request AI review from stamphog label May 19, 2026

greptile-apps Bot reviewed May 19, 2026

View reviewed changes

Comment thread ee/hogai/eval/sandboxed/experiments/eval_midrun_changes.py

Fix CI AI feedback

ae85126

jurajmajerik approved these changes May 20, 2026

View reviewed changes

Merge branch 'master' into experiments-mcp/diagnosing-experiment-resu…

151b395

…lts-skill

mp-hog merged commit 36f166e into master May 20, 2026
217 of 219 checks passed

mp-hog deleted the experiments-mcp/diagnosing-experiment-results-skill branch May 20, 2026 15:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(experiments/mcp): Write diagnosing-experiment-results skill and Evals#59029

feat(experiments/mcp): Write diagnosing-experiment-results skill and Evals#59029
mp-hog merged 5 commits into
masterfrom
experiments-mcp/diagnosing-experiment-results-skill

mp-hog commented May 19, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 19, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

stamphog Bot left a comment

Uh oh!

greptile-apps Bot commented May 19, 2026

Uh oh!

Uh oh!

jurajmajerik left a comment

Uh oh!

Uh oh!

deployment-status-posthog Bot commented May 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mp-hog commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Changes

How did you test this code?

Publish to changelog?

Docs update

🤖 Agent context

Uh oh!

github-actions Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

stamphog Bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot commented May 19, 2026

Uh oh!

Uh oh!

jurajmajerik left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

deployment-status-posthog Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploy status

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mp-hog commented May 19, 2026 •

edited

Loading

github-actions Bot commented May 19, 2026 •

edited

Loading

deployment-status-posthog Bot commented May 20, 2026 •

edited

Loading