goodhistogram: add QueryQuantiles for alloc-free live quantile reads by kyle-a-wong · Pull Request #5 · cockroachdb/goodhistogram

kyle-a-wong · 2026-05-11T13:24:26Z

Summary

Adds Histogram.QueryQuantiles(dst, qs []float64) []float64, an alloc-free quantile-read path that reads atomic counters in place rather than materializing a Snapshot. The result reflects the same eventual consistency Snapshot() already accepts.

This targets hot-path consumers that read quantiles per recorded sample (e.g. an "is this query slow?" detector keyed by SQL fingerprint), where the existing Snapshot + ValuesAtQuantiles path's per-call allocations dominate the cost.

Approach

Two-pass walk over the bucket array. Pass 1 sums in-range counts to a total; pass 2 walks forward with a 3-bucket sliding window of (prev, curr, next) counts to compute trapezoidal boundary densities on the fly, avoiding the avgDensity / boundaryDensity scratch slices used by ValuesAtQuantiles.
qs must be sorted ascending. With a stack-backed dst (e.g. var buf [4]float64; h.QueryQuantiles(buf[:0], qs)) the call is fully alloc-free.
Boundary-density behavior at the rightmost bucket (dR=0) matches the existing ValuesAtQuantiles for parity. Likely a separate, pre-existing bug; preserving it here keeps results bit-for-bit identical and is best fixed in its own PR.
Record is unchanged; no contention regression.

Numbers (Apple M3 Pro)

3 quantiles, n=10k populated:

Method	ns/op	B/op	allocs/op
`Snapshot` + `ValuesAtQuantiles`	815	2912	10
`QueryQuantiles`	288	0	0

2 quantiles, n=1k and n=100k both: 779 → 282 ns/op, 2840 → 0 B/op, 9 → 0 allocs.

Test plan

TestQueryQuantilesAgreesWithSnapshot — 12 distributions × 10 quantiles, exact agreement with Snapshot.ValuesAtQuantiles
TestQueryQuantilesEdges — empty, only-underflow, only-overflow, empty-qs
TestQueryQuantilesAllocFree — verifies 0 allocs/op when dst has cap
BenchmarkQueryPath and BenchmarkQueryPathThreeQuantiles for the numbers above
Full go test ./... passes

cockroachlabs-cla-agent · 2026-05-11T13:24:40Z

All committers have signed the CLA.

Add Histogram.QueryQuantiles(dst, qs), which estimates quantile values by reading atomic counters in place rather than copying them into a Snapshot. The result reflects the same eventual consistency Snapshot already accepts (counters read independently and may observe a slightly inconsistent total). This targets hot-path consumers that read quantiles per recorded sample (e.g. an "is this query slow?" detector keyed by SQL fingerprint), where the existing Snapshot + ValuesAtQuantiles path's per-call allocations dominate the cost. The walk uses a 3-bucket sliding window of (prev, curr, next) counts to compute trapezoidal boundary densities on the fly, avoiding the n-sized avgDensity / boundaryDensity scratch slices used by ValuesAtQuantiles. The qs slice must be sorted ascending; with a stack-backed dst, the call is fully alloc-free. Boundary-density behavior at the rightmost bucket (dR=0) matches the existing ValuesAtQuantiles for parity; this is preserved deliberately so results agree across the two methods. The agreement test covers all 12 distributions in the existing benchmark suite. Per-call benchmark on Apple M3 Pro, n=10k populated, 3 quantiles: Snapshot+ValuesAtQuantiles 815 ns/op 2912 B/op 10 allocs/op QueryQuantiles 288 ns/op 0 B/op 0 allocs/op

angles-n-daemons · 2026-05-11T14:58:48Z

This is a question I haven't thought much about - but do you think there's any reason we need snapshots to do quantile estimation?

If snapshots are inherently inconsistent, and there's no locks on the structure, should we just move fully over to direct quantile querying? Does this affect the API that prometheus users are familiar with in any meaningful way?

kyle-a-wong · 2026-05-11T17:23:54Z

but do you think there's any reason we need snapshots to do quantile estimation?

Im not sure if we need snapshots to do quantile estimations, but i would assume we still want to support doing quantile estimations of snapshots right?

angles-n-daemons · 2026-05-13T14:30:47Z

Yeah that seems to make sense to me

angles-n-daemons

Nice cleanup — sliding window is a clean improvement over the scratch-slice version. A few thoughts inline. Two whole-PR notes:

gofmt -l . flags both new files (spaces instead of tabs) — gofmt -w should sort it.
No -race-friendly test exercising Record and QueryQuantiles concurrently. Even a smoke test would help lock in the lock-free contract against future regressions.

angles-n-daemons · 2026-05-15T19:47:11Z

+// qs MUST be sorted in ascending order. dst must have cap >= len(qs); pass a
+// stack-backed slice (e.g. var buf [4]float64; h.QueryQuantiles(buf[:0], qs))
+// to make the call fully alloc-free.
+func (h *Histogram) QueryQuantiles(dst, qs []float64) []float64 {


Could you rename to fit alongside the existing ValueAtQuantile / ValuesAtQuantiles? Something like ValuesAtQuantilesInto(dst, qs) keeps the relationship to the snapshot version obvious and follows the Go convention of *Into for caller-provided buffers. Two different verbs for the same conceptual op makes the API harder to discover.

Done in 47a1510 — renamed to ValuesAtQuantilesInto. Agreed the verb mismatch made the relationship hard to find in godoc.

angles-n-daemons · 2026-05-15T19:47:11Z

+                }
+                dR = (currD + nextD) / 2.0
+            }
+            // dR remains 0 at i==n-1 to match existing behavior.


Worth fixing in the same PR rather than mirroring it. The dR=0 quirk biases the reported value low by ~20% of bucket width or more in the rightmost bucket — and p99 in long-tailed latency distributions almost always lands exactly there. If the natural near-term consumer is something like AnomalyDetector (per-statement comparison against reported p99), low-biased p99 means more false-positive slow-query flags in production.

One-line fix in quantile.go (add boundaryDensity[n] = avgDensity[n-1] after the loop), plus mirroring it here (set dR = currD when i == n-1, parallel to the existing i == 0 → dL = currD case). The "exact parity" tests would need to update, but they're effectively pinned to the wrong invariant.

Fixed in f4b1ed8 (root fix in quantile.go for both ValueAtQuantile and ValuesAtQuantiles, mirrored in ValuesAtQuantilesInto). Worth noting the existing snapshot quantile tests in histogram_test.go didn't actually pin the old behavior — they all still pass — so the "tests pinned to the wrong invariant" concern didn't materialize. Nothing in the tree was depending on the bias.

angles-n-daemons · 2026-05-15T19:47:11Z

+            got := h.QueryQuantiles(buf[:0], qs)
+
+            for i, q := range qs {
+                if math.Abs(got[i]-want[i]) > 1e-6*math.Max(1, math.Abs(want[i])) {


Tolerance is 1e-6 here, but the PR description claims bit-for-bit parity. Both paths feed the same arguments to trapezoidalSolve in the same order, so equality should hold exactly — would tighten this to got[i] != want[i]. (Or if you keep the tolerance, soften the parity claim in the description.)

Done in 40fad13 — tightened to got[i] != want[i]. Confirmed both paths still agree exactly after the dR fix landed in f4b1ed8.

…ation The boundary-density loops in ValueAtQuantile/ValuesAtQuantiles never set boundaryDensity[n] — the case n: branch was unreachable because for i := range n only iterates 0..n-1. As a result, the right edge of the rightmost bucket was treated as having density zero, biasing interpolated values low (~20% of bucket width or more) right where p99 of long-tailed latency distributions lands. Set boundaryDensity[n] = avgDensity[n-1] explicitly, parallel to the existing boundaryDensity[0] = avgDensity[0]. Mirror the same fix in ValuesAtQuantilesInto (dR = currD at i == n-1) so the two paths stay in parity. Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>

Fits the existing Snapshot API surface (ValueAtQuantile, ValuesAtQuantiles) and follows the Go convention of an *Into suffix for functions that write into a caller-provided buffer. Two different verbs for the same conceptual operation made the relationship harder to find in godoc. Also gofmt the test file (spaces -> tabs) — the rename touches enough of it that bundling the format fix here keeps later commits clean. Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>

Both ValuesAtQuantilesInto and Snapshot.ValuesAtQuantiles call trapezoidalSolve with identical arguments in the same order, so the parity test can assert exact equality instead of an epsilon tolerance. Add TestValuesAtQuantilesIntoConcurrentWithRecord: 4 writer goroutines and 4 reader goroutines running for 100ms, intended for -race. Pins the lock-free contract so a future regression (e.g. accidentally sharing scratch state across callers) gets caught by CI. Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>

kyle-a-wong · 2026-05-18T14:18:02Z

Both whole-PR notes addressed. gofmt -w landed naturally as part of f4b1ed8 / 47a1510 (the renames and edits touched the whole files). Added TestValuesAtQuantilesIntoConcurrentWithRecord in 40fad13 — 4 writers + 4 readers for 100ms, passes under go test -race ./....

Note: gofmt -l . still flags benchmark_test.go, but that's pre-existing on main and outside this PR's scope — happy to fix separately.

angles-n-daemons

Nice, walked through the commits — all four look good, tests pass under -race.

Ah good to know on the dR test invariant — i was assuming the parity tests had been pinned to the buggy behavior, but you're right that they weren't tight enough to lock it in. Fix landed clean.

Opened #6 for the benchmark_test.go gofmt fix to keep it out of this PR's scope.

Approving.

kyle-a-wong requested a review from angles-n-daemons May 11, 2026 13:26

kyle-a-wong force-pushed the kwong/query-quantiles branch from f26b5e3 to 8b8db39 Compare May 11, 2026 13:29

angles-n-daemons reviewed May 15, 2026

View reviewed changes

kyle-a-wong and others added 3 commits May 18, 2026 10:13

kyle-a-wong requested a review from angles-n-daemons May 18, 2026 14:18

angles-n-daemons mentioned this pull request May 18, 2026

goodhistogram: gofmt benchmark_test.go #6

Merged

3 tasks

angles-n-daemons approved these changes May 18, 2026

View reviewed changes

kyle-a-wong merged commit 92f4812 into cockroachdb:main May 18, 2026
3 checks passed

kyle-a-wong deleted the kwong/query-quantiles branch May 18, 2026 15:02

Conversation

kyle-a-wong commented May 11, 2026

Summary

Approach

Numbers (Apple M3 Pro)

Test plan

Uh oh!

cockroachlabs-cla-agent Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

angles-n-daemons commented May 11, 2026

Uh oh!

kyle-a-wong commented May 11, 2026

Uh oh!

angles-n-daemons commented May 13, 2026

Uh oh!

angles-n-daemons left a comment

Choose a reason for hiding this comment

Uh oh!

angles-n-daemons May 15, 2026

Choose a reason for hiding this comment

Uh oh!

kyle-a-wong May 18, 2026

Choose a reason for hiding this comment

Uh oh!

angles-n-daemons May 15, 2026

Choose a reason for hiding this comment

Uh oh!

kyle-a-wong May 18, 2026

Choose a reason for hiding this comment

Uh oh!

angles-n-daemons May 15, 2026

Choose a reason for hiding this comment

Uh oh!

kyle-a-wong May 18, 2026

Choose a reason for hiding this comment

Uh oh!

kyle-a-wong commented May 18, 2026

Uh oh!

angles-n-daemons left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cockroachlabs-cla-agent Bot commented May 11, 2026 •

edited

Loading