feat(web-analytics): emit cache_key_hash on query and lazy precompute logs#60378
Merged
Conversation
…lines)" This reverts commit 5f3be93.
…plicit log metadata
Contributor
|
🎭 Playwright report · View test results →
These issues are not necessarily caused by your changes. |
…m CH tags; add skill
Contributor
Prompt To Fix All With AIFix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
products/web_analytics/backend/hogql_queries/test/test_web_lazy_precompute_common.py:204-216
**Test name directly contradicts its assertion**
`test_property_order_does_not_fragment_key` asserts `!=`, meaning property order *does* fragment the key. The name implies the opposite. A future developer looking for the test that enforces "order is irrelevant" will find this one, see `!=`, and be confused — or worse, flip the assertion back to `==` while renaming without realising the intent was to document current behaviour, not desired behaviour. The comment inside the test explains the distinction well, but the method name overrides that at first glance.
The name should match what the assertion actually proves, e.g. `test_property_order_currently_fragments_key` or `test_property_order_is_not_canonicalized`.
### Issue 2 of 2
products/web_analytics/skills/evaluating-precompute-eligibility/SKILL.md:339-344
The last two bullets reference personal local paths (`~/notes/work/posthog/...`) that only exist on the author's machine. Other engineers using this SKILL will hit dead links.
```suggestion
- The rollout plan of record (cohort definitions, Cache-A / B / C / D tier
framework) — ask the web-analytics team for the current location.
- The `q/key` methodology note (cost model and break-even-at-1.0 derivation) —
ask the web-analytics team for the current location.
```
Reviews (1): Last reviewed commit: "Merge branch 'master' into lricoy/web-an..." | Re-trigger Greptile |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7eeb33a50f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
jordanm-posthog
approved these changes
May 28, 2026
arthurdedeus
approved these changes
May 28, 2026
jabahamondes
approved these changes
May 28, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Our q/key (queries-per-distinct-cache-key) analysis for the lazy precompute rollout currently relies on reconstructing the cache key from
system.query_logproperties JSON. Loki/Posthog logs have the structuredweb_analytics_querylog line on a 14-day retention horizon — but it only carriesfilter_count, not the dimensions that actually fragment the cache key (filter values, dateRange, breakdownBy).We also can't currently join "this query went to precompute" (logged by the analytics_platform framework) against "this query came from web analytics" (logged by the runner) — they share no stable identifier across log streams.
Changes
Adds a stable
cache_key_hashfield to two structured log lines:web_analytics_query(inWebAnalyticsQueryRunner.calculate'sfinallyblock). Computed bycompute_query_cache_key_hash(self.query, self.team.timezone)inweb_lazy_precompute_common.py. SHA-256 of canonical JSON over the user-facing query schema (kind + properties with values + dateRange + breakdownBy + conversionGoal + sampling + interval + compareFilter + filterTestAccounts + timezone), stripping fields that don't influence the logical cache key (useWebAnalyticsPrecompute,modifiers,version,tags,response,limit/offset,limitBy).lazy_computation.executed(inLazyComputationExecutor.execute). Accepts an optionalcache_key_hashparameter onexecute()andensure_precomputed(). All six web analytics precompute callers (overview,stats,paths,frustration,goals,vitals_paths) compute and pass it.Same hashing mechanic (SHA-256 / canonical JSON) as the existing
compute_query_hashin the lazy framework. Distinct from the framework's internalquery_hashbecause that one hashes the post-build INSERT AST: this one hashes the input schema, so it stays stable across runner-code changes that don't alter the logical cache key, and can be computed regardless of whether the query was eligible for the precompute path.Hash computation is wrapped in
try/exceptat every call site so a hash-time failure cannot affect the request.How did you test this code?
I'm an agent. Automated tests I ran locally:
products/web_analytics/backend/hogql_queries/test/test_web_lazy_precompute_common.py— 12 cases covering stability across calls, fragmentation by query kind / date range / breakdown / filter value / filter operator / timezone / compareFilter, and exclusion of the per-query opt-in toggle. Also asserts hash format (64-char hex).test_web_analytics_query_runner.py(18 passed),test_lazy_computation_executor.py(111 passed), full lazy-precompute integration suites for overview / goals / stats / paths / frustration / vitals_paths (145 passed, 17 skipped).ruff check+ruff formatclean.No manual prod traffic verification has been done — the log line lands in Loki once deployed.
Publish to changelog?
no — observability/internal-tooling change only.
Docs update
Not needed.
🤖 Agent context
Authored end-to-end by Claude Code (Opus 4.7) in an ad-hoc session focused on improving observability of the web analytics lazy precompute rollout.
Decisions worth noting for reviewers:
query_hash. The framework's hash is computed over the post-build precompute AST. It is the right identity for the precompute job table but unsuitable for cross-path attribution — it doesn't exist when a query isn't eligible for precompute, and it changes when the INSERT AST construction changes even if the user query is logically identical. The new field is documented as a logical (not numerical) sibling of the framework hash.useWebAnalyticsPrecompute,modifiers,version,tags,response,limit,offset,limitBy). Same query with the precompute opt-in toggled should share a cache key; same query with different pagination should share a cache key (limit/offset are applied at read time). Other dropped fields are non-cache-key metadata.cache_key_hashas a caller-provided opaque string rather than recomputing it fromQueryInfo. The framework is generic over query types (not web-analytics-specific), so the input-schema hash conceptually belongs to the caller.