feat(hogql): tag user-supplied hogql in queries + prometheus label by sampennington · Pull Request #60214 · PostHog/posthog

sampennington · 2026-05-27T10:22:51Z

Problem

When ClickHouse returns an error, we have no easy way to separate failures we caused (a platform bug in how we built the SQL) from failures the user caused (a syntax error or bad column reference in HogQL they wrote). The query_type tag tells us a query came from HogQLQueryRunner, but it doesn't surface user HogQL embedded inside otherwise platform-generated queries — math_hogql on a trend, an HogQLPropertyFilter on a funnel, a pathsHogQLExpression, etc.

Same distinction is missing from real-time observability: the posthog_query_execution_total Prometheus counter labels failures as category="user_error" but lumps user-HogQL failures together with structured-input builder bugs.

Changes

Adds a contains_user_hogql: bool field to QueryTags (serialized into system.query_log.log_comment) and a small helper tag_contains_user_hogql() that flips it. The helper is called at every canonical site where a user-controlled HogQL string is handed to parse_expr/parse_order_expr/parse_select:

Schema field	Site
`HogQLQuery.query` (SQL editor)	`posthog/hogql_queries/hogql_query_runner.py`
`HogQLPropertyFilter.key`	`posthog/hogql/property.py`
`EventsNode.math_hogql`	`posthog/hogql_queries/insights/trends/aggregation_operations.py`
`BreakdownFilter.breakdown` (when `breakdown_type='hogql'`)	`posthog/hogql_queries/insights/trends/breakdown.py`
`EventsQuery.select / where / orderBy`	`posthog/hogql_queries/events_query_runner.py`
`FunnelsFilter.funnelAggregateByHogQL`	`posthog/hogql_queries/insights/funnels/funnel_event_query.py`, paths runner
`PathsFilter.pathsHogQLExpression`	`products/product_analytics/backend/hogql_queries/paths/paths_query_runner.py`
`LifecycleDataWarehouseNode.{timestamp,aggregation_target,created_at}_field`	`posthog/hogql_queries/insights/lifecycle/lifecycle_query_runner.py`
`ExperimentDataWarehouseNode.{data_warehouse_join_key, math_property}` and experiment `math_hogql`	`posthog/hogql_queries/experiments/metric_source.py`, `base_query_utils.py`

The flag is optional and excluded from JSON when unset, so platform-only queries are unaffected. Once landed, log triage queries like

SELECT
    countIf(JSONExtractBool(log_comment, 'contains_user_hogql')) AS user,
    count() - countIf(JSONExtractBool(log_comment, 'contains_user_hogql')) AS platform
FROM system.query_log
WHERE type = 'ExceptionWhileProcessing'

become possible.

Also adds a has_user_authored_hogql label to the posthog_query_execution_total Prometheus counter, sourced from the same QueryTags.contains_user_hogql flag. This lets observability target builder bugs precisely in real time — category="error", has_user_authored_hogql="false" is a high-confidence builder-bug signal — without a separate detector to maintain. (Folds in the work from the now-closed #58715.)

How did you test this code?

I'm an agent. I did not run tests in this environment (no Python venv with project deps was available). I did:

syntax-check every edited file with python3 -c "import ast; ast.parse(open(...).read())"
add unit tests for the new field and helper (pure-Python, no ClickHouse needed)
add two integration tests in TestQueryTaggingSourceInQueryLog confirming the flag lands in system.query_log for a HogQL query and is absent for a platform-built sync_execute
add a parameterized integration test in test_query_runner.py confirming the Prometheus label reflects the QueryTags flag for both success and error paths

Please run hogli test posthog/clickhouse/test/test_query_tagging.py posthog/hogql_queries/test/test_query_runner.py before merge.

Publish to changelog?

no

🤖 Agent context

Authored by Claude (Opus 4.7) via the Claude Code agent SDK in response to a request to "tag queries that contain custom hogql so we can tell our errors from users'."
Approach: explored the query-tagging architecture and enumerated every user-HogQL parse site rather than trying to auto-detect inside parse_expr. Considered using cache_origin=CacheOrigin.USER as the trigger (since the parse cache already classifies user vs builtin), but only two sites currently pass CacheOrigin.USER; making it implicit would either miss sites or require touching all of them anyway. Explicit tag_contains_user_hogql() calls are grep-able and conservative.
Why a single boolean: starting simple. If we later want user_hogql_kinds: list[str] to attribute by which fragment failed (filter vs breakdown vs math), the helper is the one place to extend.
Skipped instrumenting breakdown_type == "event_metadata": that branch parses the property name string, but the name comes from a constrained taxonomy picker, not free-form HogQL.
The Prometheus label was originally a separate PR (feat(insights): label query metric with has_user_authored_hogql #58715) that walked query.model_dump() schema-side. Folded in here because the Prometheus and system.query_log surfaces need to stay in lockstep and the QueryTags flag is a strictly better detection source (covers more sites: events query select/where/orderBy, paths expression, lifecycle/experiment warehouse fields — all missed by the schema walker).

Adds a `contains_user_hogql` field to `QueryTags` that's serialized into ClickHouse `system.query_log.log_comment`. It lights up whenever a query includes a HogQL string that originated from the end user — full SQL editor queries, HogQL property filters, custom `math_hogql`/`breakdown` expressions, `EventsQuery` select/where/orderBy strings, `funnelAggregateByHogQL`, `pathsHogQLExpression`, and the field-expression strings on `DataWarehouseNode` / `ExperimentDataWarehouseNode`. Triage motivation: when ClickHouse returns an error, this flag separates queries we built end-to-end (likely platform bugs) from queries that embed user-authored HogQL (likely user input issues). The tag is set at the canonical parse sites where user strings hit `parse_expr`/`parse_select`. Generated-By: PostHog Code Task-Id: 0d95bc6d-f72d-4511-ad52-408b5fcea5a5

github-actions · 2026-05-27T10:23:43Z

🎭 Playwright didn't run on this PR — your changes touch code that could affect E2E behavior, but Playwright is opt-in via label now to keep CI cost down.

Add the run-playwright label if you want an E2E sweep before merging — CI will pick it up automatically.

Most PRs don't need this. Real regressions still get caught on master and fix-forward.

greptile-apps · 2026-05-27T10:27:49Z

Prompt To Fix All With AI

Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
posthog/hogql_queries/events_query_runner.py:208-211
**Unconditional tag causes false positives in internal callers**

`tag_contains_user_hogql()` fires every time `to_query()` is called, but `EventsQueryRunner` is also invoked by platform code that constructs `EventsQuery` programmatically. For example, `hogql_cohort_query.py` (line 200–233) builds `EventsQuery(select=["person_id", "count()"], where=[f"count() >= {count}"])` entirely in platform code and calls `events_query_runner.to_query()` directly. The resulting ClickHouse query will carry `contains_user_hogql: true` even though no user-supplied HogQL is involved, so any platform error from that path will be mis-attributed as a user error — exactly the opposite of what the feature is meant to prevent.

Moving the call to `_calculate()` instead of `to_query()` would fix the false positive for callers that use `to_query()` as a sub-query helper, since those callers never invoke `_calculate()`.

_{Reviews (1): Last reviewed commit: "feat(hogql): tag clickhouse queries that..." | Re-trigger Greptile}

Two follow-ups on top of the initial commit: 1. Ruff import-sort failures (CI: Python code quality). Sorted the new `tag_contains_user_hogql` imports into the project's own `posthog.clickhouse.*` block in `posthog/hogql/property.py`, `posthog/hogql_queries/experiments/base_query_utils.py`, `posthog/hogql_queries/experiments/metric_source.py`, `posthog/hogql_queries/insights/trends/breakdown.py`, and the inline import inside the new test in `posthog/clickhouse/test/test_query_tagging.py`. 2. Greptile flagged that tagging in `EventsQueryRunner.to_query()` causes false positives for platform callers that use it as a sub-query helper (e.g. `hogql_cohort_query.py` builds `EventsQuery(select=["person_id", "count()"], where=[f"count() >= {count}"])` from platform constants and calls `to_query()` directly, never `_calculate()`). Moved the `tag_contains_user_hogql()` call to `_calculate()` so it only fires when the runner is the top-level execution path — which is exactly where `EventsQuery.select / where / orderBy` came from the end user. Generated-By: PostHog Code Task-Id: 0d95bc6d-f72d-4511-ad52-408b5fcea5a5

sampennington

🤖 Automated review

Verdict: Needs discussion

No correctness or security issues — this is observability-only and the helper is well-shaped. Main concerns:

Coverage symmetry. A grep over parse_expr(user_field) surfaces ~3-4 sites in other files that look identical to the ones tagged here (funnels-over-DW, customer-analytics usage metrics, ActorsQuery.select/orderBy, possibly SessionsQuery). Either tag them too or document why they're out of scope.
Performance. tag_contains_user_hogql() calls tag_queries(...) which does a model_copy(deep=True) of a ~100-field QueryTags. Several call sites (@property methods in lifecycle_query_runner.py, the breakdown loops, recursive property_to_expr) re-fire on every access. One-line idempotency short-circuit in the helper fixes everything at once.
Comment/docstring duplication. The field-level comment and the helper docstring enumerate the same caller list in two places; both will rot.

Good call already on moving the EventsQuery tag into _calculate() instead of to_query() — the comment explaining the hogql_cohort_query.py sub-query pitfall is exactly the kind of "why" worth keeping.

Non-anchored findings

Missed parse sites (Should fix or document):

posthog/hogql_queries/insights/funnels/funnel_event_query.py:~252 — parse_expr(table_entity.aggregation_target_field) is the same DW-field pattern that's tagged in lifecycle_query_runner.py (timestamp_field/target_field/created_at_field). Asymmetric with the rest of the PR.
products/customer_analytics/backend/hogql_queries/usage_metrics_query_runner.py:~218 — parse_expr(timestamp_field) from GroupUsageMetric.Source.DATA_WAREHOUSE. Same shape.
posthog/hogql_queries/actors_query_runner.py:~349 / :~404 — parse_expr(self.query.select) and parse_order_expr(self.query.orderBy). ActorsQuery is the persons-explorer sibling of EventsQuery (which is tagged); leaving it out is inconsistent.
posthog/hogql_queries/sessions_query_runner.py:~137/~301/~527 — same select/where/orderBy pattern for SessionsQuery. If that query node is user-facing, should be tagged; if not, worth a one-line comment ruling it out.

Architectural alternative (Suggestion): posthog/hogql/parser.py already has a CacheOrigin.USER classification that property.py even passes explicitly. Setting contains_user_hogql=True inside _parse_cached whenever cache_origin resolves to USER would replace all 12 scattered call sites with one source of truth, and would automatically catch the missed sites above (and any future ones). Sibling tags like has_joins / has_json_operations already follow this "derive centrally" pattern in HogQLQuery._execute_clickhouse_query. Heavier lift than this PR — fine as a follow-up.

Reviewers: code-reviewer, personal, reuse, quality, efficiency. Inline comments above are tagged with the reviewer that raised them and the level of concern.

…rcuit Addresses the PR-author review (coverage symmetry, performance, comment duplication) — no logic changes to the helper or how the tag is consumed, just wider, cheaper, and de-duplicated. Coverage symmetry — tag the remaining user-HogQL parse sites that mirror ones already tagged: - `posthog/hogql_queries/insights/funnels/funnel_event_query.py` — `parse_expr(table_entity.aggregation_target_field)` for funnels over a DataWarehouseNode (same shape as the lifecycle DW field tags). - `products/customer_analytics/backend/hogql_queries/usage_metrics_query_runner.py` — `timestamp_field`, `math_property`, and `key_field` parses from `GroupUsageMetric` DW sources. - `posthog/hogql_queries/actors_query_runner.py` — `ActorsQuery.select` / `orderBy`. Tag in `_calculate()` (same reason as `EventsQueryRunner`: `hogql_cohort_query._actors_query_from_source` calls `to_query()` with a platform-constant `select=["id"]` and must not false-positive). - `posthog/hogql_queries/sessions_query_runner.py` — `SessionsQuery.select` / `where` / `orderBy`. Performance — `tag_contains_user_hogql()` now short-circuits when the flag is already set, avoiding the `model_copy(deep=True)` inside `tag_queries` on every subsequent call. Matters for hot paths: recursive `property_to_expr`, breakdown loops, `@property` accessors in `lifecycle_query_runner.py`. New test verifies the short-circuit returns the same `QueryTags` object. Comment dedupe — the caller list lived in two places (the field-level comment on `contains_user_hogql` and the helper docstring) and would rot on every new site. Kept the canonical list in `tag_contains_user_hogql` and shortened the field comment to point at the helper. Generated-By: PostHog Code Task-Id: 0d95bc6d-f72d-4511-ad52-408b5fcea5a5

- Shorten contains_user_hogql field comment and helper docstring to one line — the enumeration drifted vs the actual call sites. - breakdown.py: flatten the duplicated `if breakdown_type == "hogql"` branch so the tag call isn't visually buried. - metric_source.py: add a one-liner naming the user-HogQL field on ExperimentDataWarehouseNode, since the tag isn't beside its parse_expr call. - test_query_tagging.py: drop narrating comments above the two contains_user_hogql tests — the test names say it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

sampennington · 2026-05-27T15:46:08Z

Quick follow-up on the coverage symmetry concern from the "Needs discussion" verdict.

The four sites flagged as look-alike gaps are all already tagged on HEAD — looks like the verdict was based on the initial commit (bdb4a30d633) before the coverage extension commit (507da8f9b58) landed:

Flagged	Tagged at
`ActorsQuery.select`/`orderBy`	`actors_query_runner.py:258` (in `_calculate()`, same shape as EventsQuery)
`SessionsQuery`	`sessions_query_runner.py:614`
funnels-over-DW (`FunnelsDataWarehouseNode.{timestamp,aggregation_target}_field`)	`funnel_event_query.py:248` and `:563`
customer-analytics usage metrics	`usage_metrics_query_runner.py`

I also swept the other plausible call sites — cdp/filters.py, session_recordings/queries/, experiments/hogql_aggregation_utils.py, experiments/exposure_query_logic.py, experiments/experiment_query_builder.py — and every parse_expr/parse_select in those files is either a template string with placeholders (platform-built, doesn't need tagging) or routes through an already-tagged entry point (e.g. from_source in metric_source.py).

So the coverage-symmetry concern is resolved on HEAD. The other two concerns from the verdict (perf early-return + comment/docstring duplication) are also addressed: early-return at query_tagging.py:551-552, and the field comment + helper docstring are now both one-liners after 3fcd1b0759e.

A failed query carries a ClickHouse error code that cannot say whether the user (a bad HogQL expression they wrote) or our query builder (structured input, but invalid SQL generated) caused it. The existing `category` label lumps both together. This labels the `posthog_query_execution_total` counter with `has_user_authored_hogql` so observability can target builder bugs precisely: `category="error", has_user_authored_hogql="false"` is a high-confidence builder-bug signal. The label reads `QueryTags.contains_user_hogql` — the canonical flag set by `tag_contains_user_hogql()` at every HogQL parse site (see `posthog.clickhouse.query_tagging`). Reusing that flag keeps the Prometheus and `system.query_log` surfaces in lockstep and inherits the broader site coverage (events query select/where/orderBy, paths expression, lifecycle warehouse fields, experiment warehouse fields) without a separate detector to maintain. Stacked on the user-HogQL tagging work in PR #60214. Generated-By: PostHog Code Task-Id: a72d30b8-cbf7-4b97-834f-212596a51d6e

…ser-hogql

Rename the Prometheus label on `posthog_query_execution_total` from `has_user_authored_hogql` to `contains_user_hogql` so it matches the `QueryTags.contains_user_hogql` field that already lands in `system.query_log.log_comment`. One name everywhere makes log-vs-metric triage queries copy-paste cleanly. Generated-By: PostHog Code Task-Id: 3664bd5c-c410-40e4-966f-a051ad518227

…nflict Drops the prom-label additions from this file so the GitHub merges API can fast-merge master in cleanly. Re-applied after the merge. Generated-By: PostHog Code Task-Id: 3664bd5c-c410-40e4-966f-a051ad518227

…ser-hogql

Restores the `contains_user_hogql` Prometheus label and `_contains_user_hogql_label()` helper that were temporarily removed to let the GitHub merges API auto-merge master in. Generated-By: PostHog Code Task-Id: 3664bd5c-c410-40e4-966f-a051ad518227

deployment-status-posthog · 2026-05-28T16:35:02Z

Deploy status

Environment	Status	Deployed At	Workflow
dev	✅ Deployed	2026-05-28 16:34 UTC	Run
prod-us	✅ Deployed	2026-05-28 17:00 UTC	Run
prod-eu	✅ Deployed	2026-05-28 17:02 UTC	Run

sampennington requested a review from a team as a code owner May 27, 2026 10:22

assign-reviewers-posthog Bot requested a review from a team May 27, 2026 10:23

greptile-apps Bot reviewed May 27, 2026

View reviewed changes

Comment thread posthog/hogql_queries/events_query_runner.py Outdated

sampennington commented May 27, 2026

View reviewed changes

sampennington and others added 2 commits May 27, 2026 12:02

sampennington mentioned this pull request May 28, 2026

feat(insights): label query metric with has_user_authored_hogql #58715

Closed

sampennington changed the title ~~feat(hogql): tag clickhouse queries that contain user-supplied hogql~~ feat(hogql): tag user-supplied hogql in queries + prometheus label May 28, 2026

Merge branch 'master' into posthog-code/tag-clickhouse-queries-with-u…

1bd4b36

…ser-hogql

gesh approved these changes May 28, 2026

View reviewed changes

assign-reviewers-posthog Bot requested a review from a team May 28, 2026 14:03

sampennington force-pushed the posthog-code/tag-clickhouse-queries-with-user-hogql branch from c3ebc65 to 3f5b962 Compare May 28, 2026 14:21

sampennington added 3 commits May 28, 2026 15:21

Merge branch 'master' into posthog-code/tag-clickhouse-queries-with-u…

1ccd09d

…ser-hogql

sampennington removed the request for review from a team May 28, 2026 14:30

georgemunyoro approved these changes May 28, 2026

View reviewed changes

sampennington merged commit 24ec393 into master May 28, 2026
227 of 229 checks passed

sampennington deleted the posthog-code/tag-clickhouse-queries-with-user-hogql branch May 28, 2026 16:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(hogql): tag user-supplied hogql in queries + prometheus label#60214

feat(hogql): tag user-supplied hogql in queries + prometheus label#60214
sampennington merged 10 commits into
masterfrom
posthog-code/tag-clickhouse-queries-with-user-hogql

sampennington commented May 27, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 27, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented May 27, 2026

Uh oh!

Uh oh!

sampennington left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sampennington commented May 27, 2026

Uh oh!

Uh oh!

deployment-status-posthog Bot commented May 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

sampennington commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Changes

How did you test this code?

Publish to changelog?

🤖 Agent context

Uh oh!

github-actions Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps Bot commented May 27, 2026

Uh oh!

Uh oh!

sampennington left a comment

Choose a reason for hiding this comment

🤖 Automated review

Non-anchored findings

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sampennington commented May 27, 2026

Uh oh!

Uh oh!

deployment-status-posthog Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploy status

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sampennington commented May 27, 2026 •

edited

Loading

github-actions Bot commented May 27, 2026 •

edited

Loading

deployment-status-posthog Bot commented May 28, 2026 •

edited

Loading