feat(hogql): warn on unknown taxonomy references#60695
Conversation
Query snapshots: Backend query snapshots updatedChanges: 1 snapshots (1 modified, 0 added, 0 deleted) What this means:
Next steps:
|
Wire validate_taxonomy_references into the execute_sql MCP tool so unknown event/property references are flagged on the path agents actually use, instead of only via the (v2-unexposed) query-validate tool. Warnings are prepended to the result; the query still runs. Also flip the validator to an indexed name__in existence check so the hot path no longer materializes the whole team taxonomy, and allowlist dynamic property patterns ($feature/*, $feature_enrollment/*, $survey_responded/*, $survey_dismissed/*) to avoid guaranteed false positives.
The quick-fix `fix` text replaces the whole marked range, which spans the quotes (event literals, property keys) or the `properties.` prefix (property field access). Returning the bare suggested name therefore stripped them — `event = '$pagevisit'` became `event = $pageview`. Build the replacement from a per-reference template so it rebuilds the full token; the message keeps the bare name. Nested `properties.a.b` warns without a fix rather than mangle the suffix.
Prompt To Fix All With AIFix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
posthog/hogql/metadata.py:104
**Taxonomy DB error sets `isValid=False` on valid queries**
`validate_taxonomy_references` now issues live Django ORM queries (`EventDefinition.objects.filter` / `PropertyDefinition.objects.filter`) from inside the broad `except Exception` handler that sets `response.isValid = False`. A transient DB error — table lock, query timeout, connection blip — would mark a syntactically valid HogQL query as invalid and surface an opaque `Unexpected OperationalError` to the caller. The `mcp_tool.py` path already protects against this with its own inner `try/except`, but this call site is unguarded. Wrapping the call in a narrow `try/except Exception` here and appending a non-fatal warning (similar to what `_get_taxonomy_warnings` does on failure) would preserve the existing validity semantics.
### Issue 2 of 2
ee/hogai/tools/execute_sql/mcp_tool.py:61-64
The comment says "Reuses the already-validated query's AST parse" but the method actually re-parses the query string with a fresh `parse_select` call — it doesn't share the AST from `_validate_hogql_query`. The second sentence ("no ClickHouse round-trip") is accurate, but the first is misleading.
```suggestion
@database_sync_to_async(thread_sensitive=False)
def _get_taxonomy_warnings(self, query: str) -> list[HogQLNotice]:
# Re-parse the (already-validated) query string — no ClickHouse round-trip needed. Any parse
# failure is already surfaced by _validate_hogql_query, so swallow it here rather than double-report.
```
Reviews (1): Last reviewed commit: "fix(hogql): keep quotes/prefix in taxono..." | Re-trigger Greptile |
…put) Address review feedback: - Fail open on DB errors: the advisory taxonomy lookup is wrapped so a transient DatabaseError no longer bubbles into the metadata `except` that flips isValid=False (and no longer breaks the execute_sql tool call). - Sanitize agent-facing names: event/property names are externally writable, so strip control chars/newlines and cap length before embedding them in the execute_sql <taxonomy_warnings> block to prevent prompt injection. - Correct a misleading comment: _get_taxonomy_warnings re-parses the query string (cheap; avoids threading the mutated AST out of the shared validator).
…-taxonomy-validation # Conflicts: # ee/hogai/tools/execute_sql/mcp_tool.py # ee/hogai/tools/execute_sql/test/test_mcp_tool.py
Query snapshots: Backend query snapshots updatedChanges: 1 snapshots (1 modified, 0 added, 0 deleted) What this means:
Next steps:
|
PR overviewAll previously flagged issues have been addressed. No open security concerns remain on this pull request. Security reviewNo open security issues remain on this pull request. Fixed/addressed: 3 · PR risk: 0/10 |
Stripping control characters alone left a crafted event/property name able to embed the literal `</taxonomy_warnings>` and close the wrapper early. Also strip angle brackets so names stay contained as data inside the delimited block.
|
🎭 Playwright report · View test results →
These issues are not necessarily caused by your changes. |
Query snapshots: Backend query snapshots updatedChanges: 1 snapshots (1 modified, 0 added, 0 deleted) What this means:
Next steps:
|
Stripping control chars/brackets stops tag breakout but not plain-text influence, since taxonomy names are externally writable. Explicitly tell the agent the embedded names are user-supplied data to compare against, never instructions to follow — keeps the suggestion useful while reducing injection.
ClickHouse migration SQL per cloud environment
|
Query snapshots: Backend query snapshots updatedChanges: 1 snapshots (1 modified, 0 added, 0 deleted) What this means:
Next steps:
|
mariusandra
left a comment
There was a problem hiding this comment.
I see from the screenshots that the warnings are shifted by 1 character. Worth looking into perhaps? Otherwise, one comment inline, rest looks good 👍
A suggested event/property name is user-controlled taxonomy data and can contain quotes, backticks, dots or spaces. Splice it back into the marked range via escape_hogql_string / escape_hogql_identifier so the one-click fix can never produce broken HogQL (e.g. an event named o'brien).
The single-statement branch hardcoded the metadata marker offset to 0 instead of the statement's start, so leading whitespace/newlines shifted every squiggle by that many characters. Use the statement start, matching the multi-statement branch.
|
On the 1-character shift: the backend offsets are correct — for |
|
Size Change: 0 B Total Size: 81 MB ℹ️ View Unchanged
|
Problem
HogQL metadata validation catches invalid schema references, such as unknown tables or fields, but it has not warned on common taxonomy-backed references. That makes it easy to write syntactically valid queries that silently produce misleading results:
WHERE event = 'purchase'returns zero rows ifpurchaseis not a known event, andproperties.country_codecollapses to a null bucket if that property does not exist.This is especially risky for agent-assisted querying, where a valid-looking empty or null result gets reported as a real product insight instead of a likely typo or hallucinated taxonomy name. It is also the one judgment signal the SQL editor lacked relative to a typed schema: identifiers (tables, columns) already get red squiggles and "did you mean", but event/property values did not.
Changes
Adds a warning-only HogQL taxonomy validation pass and surfaces it on the two paths people actually author HogQL through: the metadata endpoint (SQL editor) and the
execute_sqlMCP tool (agents).Partial-match in a list (only the bad names get flagged)

Validation pass (
posthog/hogql/taxonomy_validation.py, wired intoget_hogql_metadata): traverses the parsed AST, collects obvious event literals and directproperties.*references, and emits dedupedHogQLNoticewarnings while keepingisValid=True. Supported shapes:event = 'x','x' = event,event IN ('x', 'y'),properties.foo,properties['foo']. Includes a best-effortfixsuggestion with a$-prefix special case (pageview→$pageview).SQL editor: because warnings flow through the existing metadata path, the editor renders them as yellow squiggles with a hover message and a one-click "Replace with…" quick-fix — no frontend change required.
execute_sqlMCP tool (ee/hogai/tools/execute_sql/mcp_tool.py): runs the same validation on the already-parsed AST (no extra ClickHouse round-trip) and prepends a<taxonomy_warnings>block to the result. This is the agent-facing surface —query-validateis not exposed in the v2 MCP, so without this the warnings would never reach an agent. The query still runs; the warning is advisory.Performance: the validator does an indexed
name__inexistence check over just the referenced names (usually 1–5) instead of materializing the whole team taxonomy. When every name is valid it returns there; the full name list is fetched only on the rarer "a name was unknown" path, where it is needed for fuzzy suggestions (and doubles as the empty-taxonomy guard). Queries with no event/property literals pay zero DB cost.Noise control: dynamic property patterns (
$feature/*,$feature_enrollment/*,$survey_responded/*,$survey_dismissed/*) are allowlisted so legitimate dynamic props never false-positive.Quick-fix correctness: the marked range covers the whole token (quotes for a literal,
properties.prefix for a field), so thefixtext rebuilds the whole token rather than returning the bare name — otherwise applying the quick-fix stripped the quotes (event = '$pagevisit'→event = $pageview). Nestedproperties.a.bwarns without offering a fix rather than mangle the suffix.Robustness & safety: taxonomy validation is advisory, so it fails open — a transient
DatabaseErrorduring the lookup is swallowed (logged) and skips the warnings rather than bubbling into the metadataexceptthat setsisValid=Falseor breaking theexecute_sqlcall. Because event/property names are externally writable, the names embedded in the agent-facing<taxonomy_warnings>block are sanitized (control chars/newlines stripped, length capped) to prevent a crafted taxonomy name from injecting instructions into agent context.Intentionally out of scope: hard errors (warning-only), property value validation, event-scoped property validation, dynamic-expression analysis, query-generation repair loops, ClickHouse execution-error normalization, and the in-product Max
execute_sqltool (tool.py) — only the MCP variant is wired so far.How did you test this code?
I'm an agent (Claude Code). Automated tests I ran:
New tests cover: unknown event/property warnings, close-match suggestions, known names producing no warning, the dynamic-property allowlist, the hot path not fetching the full taxonomy, two round-trip tests that apply the quick-fix replacement exactly as Monaco does and assert valid HogQL, fail-open on a simulated
DatabaseError, and sanitization of injected newlines/control chars in agent-facing warnings. Pre-commit hooks (ruff lint + format,tytype check) pass.Manual verification against the local dev stack (the Hedgebox demo project, not production): exercised the local MCP
execute-sqltool and the metadata function directly, and confirmed warnings fire for unknown/typo'd names with suggestions, stay silent for valid names, suppress allowlisted dynamic props, and that the quick-fix replacement round-trips to valid quoted HogQL.Publish to changelog?
no
Docs update
No docs update needed — this adds backend validation warnings and editor/agent surfacing, with no new user-facing docs.
🤖 Agent context
Agent-assisted, human-directed across two iterations (the initial validation pass, then this iteration's
execute_sqlsurfacing, performance flip, allowlist, and quick-fix fix). The decisions and caveats below are the parts worth a reviewer's attention.Key decisions:
execute_sql, not justquery-validate.query-validateis a v1 tool flaggednew_mcp: falseand is not exposed by the v2 MCP, andexecute_sqldid not run taxonomy validation — so the warnings had no agent-facing surface. Wiring it intoexecute_sqlputs the signal on the path agents actually call, unconditionally, rather than behind an opt-in validate step they would not take.name__inflip keeps the common (all-valid) case to a small indexed lookup, with the full list fetched only when a name is actually unknown.fixreplaces the whole marked range, so returning a bare name dropped the quotes/prefix; it now reconstructs the token so the replacement stays valid.Caveats / follow-ups for reviewers:
events, not full per-field semantic resolution — a derived/warehouse column namedevent/propertiesin a mixed-scope join could mis-warn. Warning-only, so low harm.e.event,e.properties.x) are not matched — a safe false-negative.execute_sqltool is wired; the in-productexecute_sqltool is not, so parity is a deliberate open question.Requires human review before merge.