fix: add classic histogram buckets to perf insights metric#3027
fix: add classic histogram buckets to perf insights metric#3027miparnisari merged 7 commits intoauthzed:mainfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. ❌ Your project status has failed because the head coverage (73.57%) is below the target coverage (75.00%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #3027 +/- ##
==========================================
- Coverage 73.62% 73.57% -0.04%
==========================================
Files 497 497
Lines 59888 59888
==========================================
- Hits 44085 44059 -26
- Misses 12630 12648 +18
- Partials 3173 3181 +8 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| NativeHistogramBucketFactor: 1.1, | ||
| NativeHistogramMaxBucketNumber: 100, | ||
| Buckets: []float64{ | ||
| .001, .003, .006, .010, .018, .024, .032, .042, .056, .075, .100, .178, .316, .562, 1, 5, |
There was a problem hiding this comment.
did you copy these numbers from somewhere? if so, can you leave a comment?
There was a problem hiding this comment.
Yes, they are from
spicedb/pkg/cmd/server/defaults.go
Line 508 in c069f97
52c7667 to
1b0788a
Compare
|
Note that native histograms are enabled in the docker-compose file: Line 132 in bd7cb9c |
miparnisari
left a comment
There was a problem hiding this comment.
LGTM. Tested with --enable-feature=native-histograms commented out.
b590e1b to
ec44dcd
Compare
|
@ivanauth @miparnisari these metrics have very high cardinality and are expensive to run in Prometheus. It was already an issue for native histograms, despite our knowingly accepting it because native histograms lead to fewer time series than classic. Now we have the worst of both worlds, and it's going to lead to a fundamental increase in time-series and added pressure to prometheus. I don't understand this change; it doesn't seem correct to me. Why would one need to add both native and classic histograms? What would be the point of native histograms then? Native Histograms do support quantile computations. EDIT: I'm actually unsure whether both will be emitted by default - better double-check. If what this is trying to solve is to support environments that have no native-histogram support (older prom versions), I'd argue this has to be opt-in, since again, it's going to be a very expensive histogram to store. |
Description
Perf insights histogram was defined with
Buckets: nil+NativeHistogramBucketFactor: 1.1, which produces no classic buckets - onlyle="+Inf". This brokehistogram_quantilequeries and the Perf Insights UI.Adds explicit classic bucket boundaries matching
defaults.goand updates the test to assert buckets are populated.Testing
Start the stack
docker compose up --build -d
Wait for healthy
docker compose ps
Both spicedb-1 and spicedb-2 should show (healthy).
Write a schema and generate traffic
zed context set local localhost:50051 foobar --insecure
zed schema write schema.zed --insecure
//send some check requests
for i in $(seq 1 20); do zed permission check document:1 view user:1 --insecure; done
Verify classic buckets on metrics endpoint
curl -s http://localhost:$(docker compose port spicedb-1 9090 | cut -d: -f2)/metrics | grep
api_shape_latency_seconds_bucket
Should show 20 le= boundaries (0.001 through 10) plus +Inf.
Verify histogram_quantile works in Prometheus (localhost:9091)
histogram_quantile(0.99, sum(rate(spicedb_perf_insights_api_shape_latency_seconds_bucket[5m])) by (le,
api_kind))
Should return real values (e.g. ~0.009 for CheckPermission), not NaN or 5.
Verify native histograms aren't broken
Check Prometheus logs for errors:
docker compose logs prometheus | grep -i error
Should be clean — no histogram-related errors.
Cleanup
docker compose down