Skip to content

feat(hogql): tag user-supplied hogql in queries + prometheus label#60214

Merged
sampennington merged 10 commits into
masterfrom
posthog-code/tag-clickhouse-queries-with-user-hogql
May 28, 2026
Merged

feat(hogql): tag user-supplied hogql in queries + prometheus label#60214
sampennington merged 10 commits into
masterfrom
posthog-code/tag-clickhouse-queries-with-user-hogql

Conversation

@sampennington
Copy link
Copy Markdown
Contributor

@sampennington sampennington commented May 27, 2026

Problem

When ClickHouse returns an error, we have no easy way to separate failures we caused (a platform bug in how we built the SQL) from failures the user caused (a syntax error or bad column reference in HogQL they wrote). The query_type tag tells us a query came from HogQLQueryRunner, but it doesn't surface user HogQL embedded inside otherwise platform-generated queries — math_hogql on a trend, an HogQLPropertyFilter on a funnel, a pathsHogQLExpression, etc.

Same distinction is missing from real-time observability: the posthog_query_execution_total Prometheus counter labels failures as category="user_error" but lumps user-HogQL failures together with structured-input builder bugs.

Changes

Adds a contains_user_hogql: bool field to QueryTags (serialized into system.query_log.log_comment) and a small helper tag_contains_user_hogql() that flips it. The helper is called at every canonical site where a user-controlled HogQL string is handed to parse_expr/parse_order_expr/parse_select:

Schema field Site
HogQLQuery.query (SQL editor) posthog/hogql_queries/hogql_query_runner.py
HogQLPropertyFilter.key posthog/hogql/property.py
EventsNode.math_hogql posthog/hogql_queries/insights/trends/aggregation_operations.py
BreakdownFilter.breakdown (when breakdown_type='hogql') posthog/hogql_queries/insights/trends/breakdown.py
EventsQuery.select / where / orderBy posthog/hogql_queries/events_query_runner.py
FunnelsFilter.funnelAggregateByHogQL posthog/hogql_queries/insights/funnels/funnel_event_query.py, paths runner
PathsFilter.pathsHogQLExpression products/product_analytics/backend/hogql_queries/paths/paths_query_runner.py
LifecycleDataWarehouseNode.{timestamp,aggregation_target,created_at}_field posthog/hogql_queries/insights/lifecycle/lifecycle_query_runner.py
ExperimentDataWarehouseNode.{data_warehouse_join_key, math_property} and experiment math_hogql posthog/hogql_queries/experiments/metric_source.py, base_query_utils.py

The flag is optional and excluded from JSON when unset, so platform-only queries are unaffected. Once landed, log triage queries like

SELECT
    countIf(JSONExtractBool(log_comment, 'contains_user_hogql')) AS user,
    count() - countIf(JSONExtractBool(log_comment, 'contains_user_hogql')) AS platform
FROM system.query_log
WHERE type = 'ExceptionWhileProcessing'

become possible.

Also adds a has_user_authored_hogql label to the posthog_query_execution_total Prometheus counter, sourced from the same QueryTags.contains_user_hogql flag. This lets observability target builder bugs precisely in real time — category="error", has_user_authored_hogql="false" is a high-confidence builder-bug signal — without a separate detector to maintain. (Folds in the work from the now-closed #58715.)

How did you test this code?

I'm an agent. I did not run tests in this environment (no Python venv with project deps was available). I did:

  • syntax-check every edited file with python3 -c "import ast; ast.parse(open(...).read())"
  • add unit tests for the new field and helper (pure-Python, no ClickHouse needed)
  • add two integration tests in TestQueryTaggingSourceInQueryLog confirming the flag lands in system.query_log for a HogQL query and is absent for a platform-built sync_execute
  • add a parameterized integration test in test_query_runner.py confirming the Prometheus label reflects the QueryTags flag for both success and error paths

Please run hogli test posthog/clickhouse/test/test_query_tagging.py posthog/hogql_queries/test/test_query_runner.py before merge.

Publish to changelog?

no

🤖 Agent context

  • Authored by Claude (Opus 4.7) via the Claude Code agent SDK in response to a request to "tag queries that contain custom hogql so we can tell our errors from users'."
  • Approach: explored the query-tagging architecture and enumerated every user-HogQL parse site rather than trying to auto-detect inside parse_expr. Considered using cache_origin=CacheOrigin.USER as the trigger (since the parse cache already classifies user vs builtin), but only two sites currently pass CacheOrigin.USER; making it implicit would either miss sites or require touching all of them anyway. Explicit tag_contains_user_hogql() calls are grep-able and conservative.
  • Why a single boolean: starting simple. If we later want user_hogql_kinds: list[str] to attribute by which fragment failed (filter vs breakdown vs math), the helper is the one place to extend.
  • Skipped instrumenting breakdown_type == "event_metadata": that branch parses the property name string, but the name comes from a constrained taxonomy picker, not free-form HogQL.
  • The Prometheus label was originally a separate PR (feat(insights): label query metric with has_user_authored_hogql #58715) that walked query.model_dump() schema-side. Folded in here because the Prometheus and system.query_log surfaces need to stay in lockstep and the QueryTags flag is a strictly better detection source (covers more sites: events query select/where/orderBy, paths expression, lifecycle/experiment warehouse fields — all missed by the schema walker).

Adds a `contains_user_hogql` field to `QueryTags` that's serialized into
ClickHouse `system.query_log.log_comment`. It lights up whenever a query
includes a HogQL string that originated from the end user — full SQL editor
queries, HogQL property filters, custom `math_hogql`/`breakdown` expressions,
`EventsQuery` select/where/orderBy strings, `funnelAggregateByHogQL`,
`pathsHogQLExpression`, and the field-expression strings on
`DataWarehouseNode` / `ExperimentDataWarehouseNode`.

Triage motivation: when ClickHouse returns an error, this flag separates
queries we built end-to-end (likely platform bugs) from queries that embed
user-authored HogQL (likely user input issues). The tag is set at the
canonical parse sites where user strings hit `parse_expr`/`parse_select`.

Generated-By: PostHog Code
Task-Id: 0d95bc6d-f72d-4511-ad52-408b5fcea5a5
@sampennington sampennington requested a review from a team as a code owner May 27, 2026 10:22
@assign-reviewers-posthog assign-reviewers-posthog Bot requested a review from a team May 27, 2026 10:23
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 27, 2026

🎭 Playwright didn't run on this PR — your changes touch code that could affect E2E behavior, but Playwright is opt-in via label now to keep CI cost down.

Add the run-playwright label if you want an E2E sweep before merging — CI will pick it up automatically.

Most PRs don't need this. Real regressions still get caught on master and fix-forward.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 27, 2026

Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
posthog/hogql_queries/events_query_runner.py:208-211
**Unconditional tag causes false positives in internal callers**

`tag_contains_user_hogql()` fires every time `to_query()` is called, but `EventsQueryRunner` is also invoked by platform code that constructs `EventsQuery` programmatically. For example, `hogql_cohort_query.py` (line 200–233) builds `EventsQuery(select=["person_id", "count()"], where=[f"count() >= {count}"])` entirely in platform code and calls `events_query_runner.to_query()` directly. The resulting ClickHouse query will carry `contains_user_hogql: true` even though no user-supplied HogQL is involved, so any platform error from that path will be mis-attributed as a user error — exactly the opposite of what the feature is meant to prevent.

Moving the call to `_calculate()` instead of `to_query()` would fix the false positive for callers that use `to_query()` as a sub-query helper, since those callers never invoke `_calculate()`.

Reviews (1): Last reviewed commit: "feat(hogql): tag clickhouse queries that..." | Re-trigger Greptile

Comment thread posthog/hogql_queries/events_query_runner.py Outdated
Two follow-ups on top of the initial commit:

1. Ruff import-sort failures (CI: Python code quality). Sorted the new
   `tag_contains_user_hogql` imports into the project's own
   `posthog.clickhouse.*` block in `posthog/hogql/property.py`,
   `posthog/hogql_queries/experiments/base_query_utils.py`,
   `posthog/hogql_queries/experiments/metric_source.py`,
   `posthog/hogql_queries/insights/trends/breakdown.py`, and the inline
   import inside the new test in `posthog/clickhouse/test/test_query_tagging.py`.

2. Greptile flagged that tagging in `EventsQueryRunner.to_query()` causes
   false positives for platform callers that use it as a sub-query helper
   (e.g. `hogql_cohort_query.py` builds `EventsQuery(select=["person_id",
   "count()"], where=[f"count() >= {count}"])` from platform constants and
   calls `to_query()` directly, never `_calculate()`). Moved the
   `tag_contains_user_hogql()` call to `_calculate()` so it only fires when
   the runner is the top-level execution path — which is exactly where
   `EventsQuery.select / where / orderBy` came from the end user.

Generated-By: PostHog Code
Task-Id: 0d95bc6d-f72d-4511-ad52-408b5fcea5a5
Copy link
Copy Markdown
Contributor Author

@sampennington sampennington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Automated review

Verdict: Needs discussion

No correctness or security issues — this is observability-only and the helper is well-shaped. Main concerns:

  1. Coverage symmetry. A grep over parse_expr(user_field) surfaces ~3-4 sites in other files that look identical to the ones tagged here (funnels-over-DW, customer-analytics usage metrics, ActorsQuery.select/orderBy, possibly SessionsQuery). Either tag them too or document why they're out of scope.
  2. Performance. tag_contains_user_hogql() calls tag_queries(...) which does a model_copy(deep=True) of a ~100-field QueryTags. Several call sites (@property methods in lifecycle_query_runner.py, the breakdown loops, recursive property_to_expr) re-fire on every access. One-line idempotency short-circuit in the helper fixes everything at once.
  3. Comment/docstring duplication. The field-level comment and the helper docstring enumerate the same caller list in two places; both will rot.

Good call already on moving the EventsQuery tag into _calculate() instead of to_query() — the comment explaining the hogql_cohort_query.py sub-query pitfall is exactly the kind of "why" worth keeping.

Non-anchored findings

Missed parse sites (Should fix or document):

  • posthog/hogql_queries/insights/funnels/funnel_event_query.py:~252parse_expr(table_entity.aggregation_target_field) is the same DW-field pattern that's tagged in lifecycle_query_runner.py (timestamp_field/target_field/created_at_field). Asymmetric with the rest of the PR.
  • products/customer_analytics/backend/hogql_queries/usage_metrics_query_runner.py:~218parse_expr(timestamp_field) from GroupUsageMetric.Source.DATA_WAREHOUSE. Same shape.
  • posthog/hogql_queries/actors_query_runner.py:~349 / :~404parse_expr(self.query.select) and parse_order_expr(self.query.orderBy). ActorsQuery is the persons-explorer sibling of EventsQuery (which is tagged); leaving it out is inconsistent.
  • posthog/hogql_queries/sessions_query_runner.py:~137/~301/~527 — same select/where/orderBy pattern for SessionsQuery. If that query node is user-facing, should be tagged; if not, worth a one-line comment ruling it out.

Architectural alternative (Suggestion): posthog/hogql/parser.py already has a CacheOrigin.USER classification that property.py even passes explicitly. Setting contains_user_hogql=True inside _parse_cached whenever cache_origin resolves to USER would replace all 12 scattered call sites with one source of truth, and would automatically catch the missed sites above (and any future ones). Sibling tags like has_joins / has_json_operations already follow this "derive centrally" pattern in HogQLQuery._execute_clickhouse_query. Heavier lift than this PR — fine as a follow-up.


Reviewers: code-reviewer, personal, reuse, quality, efficiency. Inline comments above are tagged with the reviewer that raised them and the level of concern.

Comment thread posthog/clickhouse/query_tagging.py Outdated
Comment thread posthog/clickhouse/query_tagging.py
Comment thread posthog/clickhouse/query_tagging.py Outdated
Comment thread posthog/hogql_queries/insights/trends/breakdown.py Outdated
Comment thread posthog/hogql_queries/experiments/metric_source.py
Comment thread posthog/clickhouse/test/test_query_tagging.py Outdated
Comment thread posthog/clickhouse/test/test_query_tagging.py Outdated
sampennington and others added 2 commits May 27, 2026 12:02
…rcuit

Addresses the PR-author review (coverage symmetry, performance, comment
duplication) — no logic changes to the helper or how the tag is consumed,
just wider, cheaper, and de-duplicated.

Coverage symmetry — tag the remaining user-HogQL parse sites that mirror
ones already tagged:
- `posthog/hogql_queries/insights/funnels/funnel_event_query.py` —
  `parse_expr(table_entity.aggregation_target_field)` for funnels over a
  DataWarehouseNode (same shape as the lifecycle DW field tags).
- `products/customer_analytics/backend/hogql_queries/usage_metrics_query_runner.py`
  — `timestamp_field`, `math_property`, and `key_field` parses from
  `GroupUsageMetric` DW sources.
- `posthog/hogql_queries/actors_query_runner.py` — `ActorsQuery.select` /
  `orderBy`. Tag in `_calculate()` (same reason as `EventsQueryRunner`:
  `hogql_cohort_query._actors_query_from_source` calls `to_query()` with
  a platform-constant `select=["id"]` and must not false-positive).
- `posthog/hogql_queries/sessions_query_runner.py` — `SessionsQuery.select`
  / `where` / `orderBy`.

Performance — `tag_contains_user_hogql()` now short-circuits when the flag
is already set, avoiding the `model_copy(deep=True)` inside `tag_queries`
on every subsequent call. Matters for hot paths: recursive
`property_to_expr`, breakdown loops, `@property` accessors in
`lifecycle_query_runner.py`. New test verifies the short-circuit returns
the same `QueryTags` object.

Comment dedupe — the caller list lived in two places (the field-level
comment on `contains_user_hogql` and the helper docstring) and would rot
on every new site. Kept the canonical list in `tag_contains_user_hogql`
and shortened the field comment to point at the helper.

Generated-By: PostHog Code
Task-Id: 0d95bc6d-f72d-4511-ad52-408b5fcea5a5
- Shorten contains_user_hogql field comment and helper docstring to
  one line — the enumeration drifted vs the actual call sites.
- breakdown.py: flatten the duplicated `if breakdown_type == "hogql"`
  branch so the tag call isn't visually buried.
- metric_source.py: add a one-liner naming the user-HogQL field on
  ExperimentDataWarehouseNode, since the tag isn't beside its
  parse_expr call.
- test_query_tagging.py: drop narrating comments above the two
  contains_user_hogql tests — the test names say it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sampennington
Copy link
Copy Markdown
Contributor Author

Quick follow-up on the coverage symmetry concern from the "Needs discussion" verdict.

The four sites flagged as look-alike gaps are all already tagged on HEAD — looks like the verdict was based on the initial commit (bdb4a30d633) before the coverage extension commit (507da8f9b58) landed:

Flagged Tagged at
ActorsQuery.select/orderBy actors_query_runner.py:258 (in _calculate(), same shape as EventsQuery)
SessionsQuery sessions_query_runner.py:614
funnels-over-DW (FunnelsDataWarehouseNode.{timestamp,aggregation_target}_field) funnel_event_query.py:248 and :563
customer-analytics usage metrics usage_metrics_query_runner.py

I also swept the other plausible call sites — cdp/filters.py, session_recordings/queries/, experiments/hogql_aggregation_utils.py, experiments/exposure_query_logic.py, experiments/experiment_query_builder.py — and every parse_expr/parse_select in those files is either a template string with placeholders (platform-built, doesn't need tagging) or routes through an already-tagged entry point (e.g. from_source in metric_source.py).

So the coverage-symmetry concern is resolved on HEAD. The other two concerns from the verdict (perf early-return + comment/docstring duplication) are also addressed: early-return at query_tagging.py:551-552, and the field comment + helper docstring are now both one-liners after 3fcd1b0759e.

A failed query carries a ClickHouse error code that cannot say whether the
user (a bad HogQL expression they wrote) or our query builder (structured
input, but invalid SQL generated) caused it. The existing `category` label
lumps both together.

This labels the `posthog_query_execution_total` counter with
`has_user_authored_hogql` so observability can target builder bugs
precisely: `category="error", has_user_authored_hogql="false"` is a
high-confidence builder-bug signal.

The label reads `QueryTags.contains_user_hogql` — the canonical flag set
by `tag_contains_user_hogql()` at every HogQL parse site (see
`posthog.clickhouse.query_tagging`). Reusing that flag keeps the
Prometheus and `system.query_log` surfaces in lockstep and inherits the
broader site coverage (events query select/where/orderBy, paths
expression, lifecycle warehouse fields, experiment warehouse fields)
without a separate detector to maintain.

Stacked on the user-HogQL tagging work in PR #60214.

Generated-By: PostHog Code
Task-Id: a72d30b8-cbf7-4b97-834f-212596a51d6e
@sampennington sampennington changed the title feat(hogql): tag clickhouse queries that contain user-supplied hogql feat(hogql): tag user-supplied hogql in queries + prometheus label May 28, 2026
Rename the Prometheus label on `posthog_query_execution_total` from
`has_user_authored_hogql` to `contains_user_hogql` so it matches the
`QueryTags.contains_user_hogql` field that already lands in
`system.query_log.log_comment`. One name everywhere makes log-vs-metric
triage queries copy-paste cleanly.

Generated-By: PostHog Code
Task-Id: 3664bd5c-c410-40e4-966f-a051ad518227
@assign-reviewers-posthog assign-reviewers-posthog Bot requested a review from a team May 28, 2026 14:03
@sampennington sampennington force-pushed the posthog-code/tag-clickhouse-queries-with-user-hogql branch from c3ebc65 to 3f5b962 Compare May 28, 2026 14:21
…nflict

Drops the prom-label additions from this file so the GitHub merges API can fast-merge master in cleanly. Re-applied after the merge.

Generated-By: PostHog Code
Task-Id: 3664bd5c-c410-40e4-966f-a051ad518227
Restores the `contains_user_hogql` Prometheus label and `_contains_user_hogql_label()` helper that were temporarily removed to let the GitHub merges API auto-merge master in.

Generated-By: PostHog Code
Task-Id: 3664bd5c-c410-40e4-966f-a051ad518227
@sampennington sampennington removed the request for review from a team May 28, 2026 14:30
@sampennington sampennington merged commit 24ec393 into master May 28, 2026
227 of 229 checks passed
@sampennington sampennington deleted the posthog-code/tag-clickhouse-queries-with-user-hogql branch May 28, 2026 16:04
@deployment-status-posthog
Copy link
Copy Markdown

deployment-status-posthog Bot commented May 28, 2026

Deploy status

Environment Status Deployed At Workflow
dev ✅ Deployed 2026-05-28 16:34 UTC Run
prod-us ✅ Deployed 2026-05-28 17:00 UTC Run
prod-eu ✅ Deployed 2026-05-28 17:02 UTC Run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants