goodhistogram: add QueryQuantiles for alloc-free live quantile reads#5
Conversation
Add Histogram.QueryQuantiles(dst, qs), which estimates quantile values by reading atomic counters in place rather than copying them into a Snapshot. The result reflects the same eventual consistency Snapshot already accepts (counters read independently and may observe a slightly inconsistent total). This targets hot-path consumers that read quantiles per recorded sample (e.g. an "is this query slow?" detector keyed by SQL fingerprint), where the existing Snapshot + ValuesAtQuantiles path's per-call allocations dominate the cost. The walk uses a 3-bucket sliding window of (prev, curr, next) counts to compute trapezoidal boundary densities on the fly, avoiding the n-sized avgDensity / boundaryDensity scratch slices used by ValuesAtQuantiles. The qs slice must be sorted ascending; with a stack-backed dst, the call is fully alloc-free. Boundary-density behavior at the rightmost bucket (dR=0) matches the existing ValuesAtQuantiles for parity; this is preserved deliberately so results agree across the two methods. The agreement test covers all 12 distributions in the existing benchmark suite. Per-call benchmark on Apple M3 Pro, n=10k populated, 3 quantiles: Snapshot+ValuesAtQuantiles 815 ns/op 2912 B/op 10 allocs/op QueryQuantiles 288 ns/op 0 B/op 0 allocs/op
f26b5e3 to
8b8db39
Compare
|
This is a question I haven't thought much about - but do you think there's any reason we need snapshots to do quantile estimation? If snapshots are inherently inconsistent, and there's no locks on the structure, should we just move fully over to direct quantile querying? Does this affect the API that prometheus users are familiar with in any meaningful way? |
Im not sure if we need snapshots to do quantile estimations, but i would assume we still want to support doing quantile estimations of snapshots right? |
|
Yeah that seems to make sense to me |
angles-n-daemons
left a comment
There was a problem hiding this comment.
Nice cleanup — sliding window is a clean improvement over the scratch-slice version. A few thoughts inline. Two whole-PR notes:
gofmt -l .flags both new files (spaces instead of tabs) —gofmt -wshould sort it.- No
-race-friendly test exercisingRecordandQueryQuantilesconcurrently. Even a smoke test would help lock in the lock-free contract against future regressions.
| // qs MUST be sorted in ascending order. dst must have cap >= len(qs); pass a | ||
| // stack-backed slice (e.g. var buf [4]float64; h.QueryQuantiles(buf[:0], qs)) | ||
| // to make the call fully alloc-free. | ||
| func (h *Histogram) QueryQuantiles(dst, qs []float64) []float64 { |
There was a problem hiding this comment.
Could you rename to fit alongside the existing ValueAtQuantile / ValuesAtQuantiles? Something like ValuesAtQuantilesInto(dst, qs) keeps the relationship to the snapshot version obvious and follows the Go convention of *Into for caller-provided buffers. Two different verbs for the same conceptual op makes the API harder to discover.
There was a problem hiding this comment.
Done in 47a1510 — renamed to ValuesAtQuantilesInto. Agreed the verb mismatch made the relationship hard to find in godoc.
| } | ||
| dR = (currD + nextD) / 2.0 | ||
| } | ||
| // dR remains 0 at i==n-1 to match existing behavior. |
There was a problem hiding this comment.
Worth fixing in the same PR rather than mirroring it. The dR=0 quirk biases the reported value low by ~20% of bucket width or more in the rightmost bucket — and p99 in long-tailed latency distributions almost always lands exactly there. If the natural near-term consumer is something like AnomalyDetector (per-statement comparison against reported p99), low-biased p99 means more false-positive slow-query flags in production.
One-line fix in quantile.go (add boundaryDensity[n] = avgDensity[n-1] after the loop), plus mirroring it here (set dR = currD when i == n-1, parallel to the existing i == 0 → dL = currD case). The "exact parity" tests would need to update, but they're effectively pinned to the wrong invariant.
There was a problem hiding this comment.
Fixed in f4b1ed8 (root fix in quantile.go for both ValueAtQuantile and ValuesAtQuantiles, mirrored in ValuesAtQuantilesInto). Worth noting the existing snapshot quantile tests in histogram_test.go didn't actually pin the old behavior — they all still pass — so the "tests pinned to the wrong invariant" concern didn't materialize. Nothing in the tree was depending on the bias.
| got := h.QueryQuantiles(buf[:0], qs) | ||
|
|
||
| for i, q := range qs { | ||
| if math.Abs(got[i]-want[i]) > 1e-6*math.Max(1, math.Abs(want[i])) { |
There was a problem hiding this comment.
Tolerance is 1e-6 here, but the PR description claims bit-for-bit parity. Both paths feed the same arguments to trapezoidalSolve in the same order, so equality should hold exactly — would tighten this to got[i] != want[i]. (Or if you keep the tolerance, soften the parity claim in the description.)
…ation The boundary-density loops in ValueAtQuantile/ValuesAtQuantiles never set boundaryDensity[n] — the case n: branch was unreachable because for i := range n only iterates 0..n-1. As a result, the right edge of the rightmost bucket was treated as having density zero, biasing interpolated values low (~20% of bucket width or more) right where p99 of long-tailed latency distributions lands. Set boundaryDensity[n] = avgDensity[n-1] explicitly, parallel to the existing boundaryDensity[0] = avgDensity[0]. Mirror the same fix in ValuesAtQuantilesInto (dR = currD at i == n-1) so the two paths stay in parity. Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Fits the existing Snapshot API surface (ValueAtQuantile, ValuesAtQuantiles) and follows the Go convention of an *Into suffix for functions that write into a caller-provided buffer. Two different verbs for the same conceptual operation made the relationship harder to find in godoc. Also gofmt the test file (spaces -> tabs) — the rename touches enough of it that bundling the format fix here keeps later commits clean. Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Both ValuesAtQuantilesInto and Snapshot.ValuesAtQuantiles call trapezoidalSolve with identical arguments in the same order, so the parity test can assert exact equality instead of an epsilon tolerance. Add TestValuesAtQuantilesIntoConcurrentWithRecord: 4 writer goroutines and 4 reader goroutines running for 100ms, intended for -race. Pins the lock-free contract so a future regression (e.g. accidentally sharing scratch state across callers) gets caught by CI. Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
|
Both whole-PR notes addressed. Note: |
angles-n-daemons
left a comment
There was a problem hiding this comment.
Nice, walked through the commits — all four look good, tests pass under -race.
Ah good to know on the dR test invariant — i was assuming the parity tests had been pinned to the buggy behavior, but you're right that they weren't tight enough to lock it in. Fix landed clean.
Opened #6 for the benchmark_test.go gofmt fix to keep it out of this PR's scope.
Approving.
Summary
Adds
Histogram.QueryQuantiles(dst, qs []float64) []float64, an alloc-free quantile-read path that reads atomic counters in place rather than materializing aSnapshot. The result reflects the same eventual consistencySnapshot()already accepts.This targets hot-path consumers that read quantiles per recorded sample (e.g. an "is this query slow?" detector keyed by SQL fingerprint), where the existing
Snapshot+ValuesAtQuantilespath's per-call allocations dominate the cost.Approach
(prev, curr, next)counts to compute trapezoidal boundary densities on the fly, avoiding theavgDensity/boundaryDensityscratch slices used byValuesAtQuantiles.qsmust be sorted ascending. With a stack-backeddst(e.g.var buf [4]float64; h.QueryQuantiles(buf[:0], qs)) the call is fully alloc-free.dR=0) matches the existingValuesAtQuantilesfor parity. Likely a separate, pre-existing bug; preserving it here keeps results bit-for-bit identical and is best fixed in its own PR.Recordis unchanged; no contention regression.Numbers (Apple M3 Pro)
3 quantiles, n=10k populated:
Snapshot+ValuesAtQuantilesQueryQuantiles2 quantiles, n=1k and n=100k both: 779 → 282 ns/op, 2840 → 0 B/op, 9 → 0 allocs.
Test plan
TestQueryQuantilesAgreesWithSnapshot— 12 distributions × 10 quantiles, exact agreement withSnapshot.ValuesAtQuantilesTestQueryQuantilesEdges— empty, only-underflow, only-overflow, empty-qsTestQueryQuantilesAllocFree— verifies 0 allocs/op whendsthas capBenchmarkQueryPathandBenchmarkQueryPathThreeQuantilesfor the numbers abovego test ./...passes