Skip to content

fix: add classic histogram buckets to perf insights metric#3027

Merged
miparnisari merged 7 commits intoauthzed:mainfrom
ivanauth:fix-perfinsights-classic-buckets
Apr 14, 2026
Merged

fix: add classic histogram buckets to perf insights metric#3027
miparnisari merged 7 commits intoauthzed:mainfrom
ivanauth:fix-perfinsights-classic-buckets

Conversation

@ivanauth
Copy link
Copy Markdown
Contributor

@ivanauth ivanauth commented Apr 8, 2026

Description

Perf insights histogram was defined with Buckets: nil + NativeHistogramBucketFactor: 1.1, which produces no classic buckets - only le="+Inf". This broke histogram_quantile queries and the Perf Insights UI.

Adds explicit classic bucket boundaries matching defaults.go and updates the test to assert buckets are populated.

Testing

  1. Start the stack
    docker compose up --build -d

  2. Wait for healthy
    docker compose ps
    Both spicedb-1 and spicedb-2 should show (healthy).

  3. Write a schema and generate traffic
    zed context set local localhost:50051 foobar --insecure
    zed schema write schema.zed --insecure
    //send some check requests
    for i in $(seq 1 20); do zed permission check document:1 view user:1 --insecure; done

  4. Verify classic buckets on metrics endpoint
    curl -s http://localhost:$(docker compose port spicedb-1 9090 | cut -d: -f2)/metrics | grep
    api_shape_latency_seconds_bucket
    Should show 20 le= boundaries (0.001 through 10) plus +Inf.

  5. Verify histogram_quantile works in Prometheus (localhost:9091)
    histogram_quantile(0.99, sum(rate(spicedb_perf_insights_api_shape_latency_seconds_bucket[5m])) by (le,
    api_kind))
    Should return real values (e.g. ~0.009 for CheckPermission), not NaN or 5.

  6. Verify native histograms aren't broken
    Check Prometheus logs for errors:
    docker compose logs prometheus | grep -i error
    Should be clean — no histogram-related errors.

  7. Cleanup
    docker compose down

@github-actions github-actions bot added the area/tooling Affects the dev or user toolchain (e.g. tests, ci, build tools) label Apr 8, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 8, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.57%. Comparing base (bd7cb9c) to head (72cc8c8).
⚠️ Report is 1 commits behind head on main.

❌ Your project status has failed because the head coverage (73.57%) is below the target coverage (75.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3027      +/-   ##
==========================================
- Coverage   73.62%   73.57%   -0.04%     
==========================================
  Files         497      497              
  Lines       59888    59888              
==========================================
- Hits        44085    44059      -26     
- Misses      12630    12648      +18     
- Partials     3173     3181       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

NativeHistogramBucketFactor: 1.1,
NativeHistogramMaxBucketNumber: 100,
Buckets: []float64{
.001, .003, .006, .010, .018, .024, .032, .042, .056, .075, .100, .178, .316, .562, 1, 5,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you copy these numbers from somewhere? if so, can you leave a comment?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they are from

.001, .003, .006, .010, .018, .024, .032, .042, .056, .075, .100, .178, .316, .562, 1, 5,
. Added the comment.

@ivanauth ivanauth requested a review from a team as a code owner April 8, 2026 16:54
miparnisari
miparnisari previously approved these changes Apr 8, 2026
@ivanauth ivanauth force-pushed the fix-perfinsights-classic-buckets branch from 52c7667 to 1b0788a Compare April 8, 2026 22:41
@miparnisari
Copy link
Copy Markdown
Contributor

miparnisari commented Apr 13, 2026

Note that native histograms are enabled in the docker-compose file:

- "--enable-feature=native-histograms"

Comment thread CHANGELOG.md Outdated
miparnisari
miparnisari previously approved these changes Apr 13, 2026
Copy link
Copy Markdown
Contributor

@miparnisari miparnisari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Tested with --enable-feature=native-histograms commented out.

@tstirrat15 tstirrat15 force-pushed the fix-perfinsights-classic-buckets branch from b590e1b to ec44dcd Compare April 13, 2026 19:46
@miparnisari miparnisari enabled auto-merge (squash) April 14, 2026 15:39
@miparnisari miparnisari merged commit 9a27f75 into authzed:main Apr 14, 2026
43 of 45 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Apr 14, 2026
@vroldanbet
Copy link
Copy Markdown
Contributor

vroldanbet commented Apr 15, 2026

@ivanauth @miparnisari these metrics have very high cardinality and are expensive to run in Prometheus. It was already an issue for native histograms, despite our knowingly accepting it because native histograms lead to fewer time series than classic. Now we have the worst of both worlds, and it's going to lead to a fundamental increase in time-series and added pressure to prometheus.

I don't understand this change; it doesn't seem correct to me. Why would one need to add both native and classic histograms? What would be the point of native histograms then? Native Histograms do support quantile computations.

EDIT: I'm actually unsure whether both will be emitted by default - better double-check. If what this is trying to solve is to support environments that have no native-histogram support (older prom versions), I'd argue this has to be opt-in, since again, it's going to be a very expensive histogram to store.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area/tooling Affects the dev or user toolchain (e.g. tests, ci, build tools)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants