Skip to content

chore(hogql): tune parse_*_seconds histogram buckets for sub-ms parses#60414

Merged
robbie-c merged 1 commit into
masterfrom
claude/sleepy-kalam-cd7393
May 29, 2026
Merged

chore(hogql): tune parse_*_seconds histogram buckets for sub-ms parses#60414
robbie-c merged 1 commit into
masterfrom
claude/sleepy-kalam-cd7393

Conversation

@robbie-c
Copy link
Copy Markdown
Member

Problem

#60201 shipped per-backend parse timings on the existing Prometheus histograms parse_expr_seconds, parse_order_expr_seconds, parse_select_seconds, and parse_full_template_string_seconds in posthog/hogql/parser.py. Looking at the live metrics, the default Prometheus buckets (lowest bound 5ms) are far too coarse for these parses. rust-py runs around 10μs, cpp is typically under 1ms, and pathological queries can take seconds. With the default buckets, every typical parse lands in the lowest bucket and histogram_quantile is effectively useless at this scale.

Means (_sum / _count) are fine, and the current dashboards use those, but quantiles are not usable.

Changes

Add an explicit buckets= tuple on the four parse_*_seconds histograms: a 1-2-5 progression from 5μs through 10s (14 buckets). This is the entire change; no other files are touched.

_PARSE_DURATION_BUCKETS = (5e-6, 1e-5, 5e-5, 1e-4, 5e-4, 1e-3, 5e-3, 1e-2, 5e-2, 1e-1, 5e-1, 1, 5, 10)

How did you test this code?

Agent-authored (Claude Code); requires human review. This is a histogram bucket config change with no logic touches. Python syntax check on posthog/hogql/parser.py passes. No tests assert on bucket values; CI will run the full backend suite. No manual or UI testing was done.

Publish to changelog?

no, internal observability tuning.

Docs update

No user-facing changes.

🤖 Agent context

Source: observation from #60201 once the shadow-parser rollout reached Grafana. The existing histograms were created with prometheus_client's default buckets, which start at 5ms; with rust-py parses around 10μs and cpp typically sub-ms, p50/p90/p99 all collapsed into the lowest bucket. The fix sets buckets= on the same Histogram instances and leaves everything else untouched.

Behavior note for the rollout window: quantile queries that span the bucket-change boundary will look odd for a short period (old samples have only the 5ms-and-up buckets, new samples have the full 5μs-through-10s range). Mean-based dashboards (_sum / _count) are unaffected.

Cardinality: modest. 14 buckets × 4 rules × small number of backends × small number of versions, which is well within typical Prometheus sizing.

Tools: Claude Code (Edit, Bash, gh).

The Histogram instances for parse_expr_seconds, parse_order_expr_seconds,
parse_select_seconds, and parse_full_template_string_seconds were created
with the default Prometheus buckets, whose lowest bound is 5ms. Real parses
run an order of magnitude or more below that: rust-py around 10μs, cpp
typically under 1ms. Every typical parse landed in the lowest bucket, so
histogram_quantile was useless at this scale.

Replace with a 1-2-5 progression from 5μs through 10s, which keeps usable
resolution across the full range while still capturing pathological queries
that take seconds.

Means (_sum / _count) are unaffected, so existing dashboards built on those
keep working.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@robbie-c robbie-c requested a review from a team as a code owner May 28, 2026 11:14
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 28, 2026

Reviews (1): Last reviewed commit: "chore(hogql): tune parse_*_seconds histo..." | Re-trigger Greptile

@github-actions
Copy link
Copy Markdown
Contributor

🎭 Playwright report · View test results →

⚠️ 1 flaky test:

  • Split a person with multiple distinct IDs (chromium)

These issues are not necessarily caused by your changes.
Annoyed by this comment? Help fix flakies and failures and it'll disappear!

@robbie-c robbie-c added the stamphog Request AI review from stamphog label May 28, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pure observability config change — adds finer-grained histogram buckets to existing Prometheus metrics. No logic, no data model, no API contract changes.

@robbie-c robbie-c enabled auto-merge (squash) May 28, 2026 12:19
@robbie-c robbie-c merged commit 1f61663 into master May 29, 2026
236 of 238 checks passed
@robbie-c robbie-c deleted the claude/sleepy-kalam-cd7393 branch May 29, 2026 10:28
mayteio pushed a commit that referenced this pull request May 29, 2026
#60414)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@deployment-status-posthog
Copy link
Copy Markdown

deployment-status-posthog Bot commented May 29, 2026

Deploy status

Environment Status Deployed At Workflow
dev ✅ Deployed 2026-05-29 10:52 UTC Run
prod-us ✅ Deployed 2026-05-29 11:15 UTC Run
prod-eu ✅ Deployed 2026-05-29 11:18 UTC Run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stamphog Request AI review from stamphog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants