Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use prom server registry for load generator & adjust buckets #4581

Merged
merged 2 commits into from
Sep 13, 2022

Conversation

longbowlu
Copy link
Contributor

@longbowlu longbowlu commented Sep 12, 2022

such that NetworkAuthorityClient's metrics can be reported to prom server

and adjust buckets again following https://prometheus.io/docs/practices/histograms/#errors-of-quantile-estimation

@longbowlu longbowlu changed the title use prom server registry for load generator use prom server registry for load generator & adjust buckets Sep 13, 2022
Copy link
Contributor

@andll andll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating buckets @longbowlu!

@@ -43,7 +43,9 @@ pub struct BenchMetrics {
pub latency_s: HistogramVec,
}

const LATENCY_SEC_BUCKETS: &[f64] = &[0.01, 0.1, 1., 2., 3., 5., 10., 20., 30., 60., 180.];
const LATENCY_SEC_BUCKETS: &[f64] = &[
0.01, 0.05, 0.1, 0.5, 1., 2.5, 5., 10., 20., 30., 40., 50., 60., 90.,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Are we sure we don't need a bucket beyond 90?
  2. I think it should be sufficient to use the same buckets as for 1-10, ie: 10 25, 50, 100
  3. Technically, if you want to minimize the amount of error, the buckets should be strictly geometric.
    It doesn't really matter where the bucket boundaries are because Prom will linearly interpolate anyways.

The geometric series for 4 buckets between 10 and 100 would be:
10, 18, 32, 56, 100

The reason is that mathematically 10-20 is a much bigger gap (2x) vs between 50-60, thus there will be a much higher rate of relative error in the 10-20 bucket.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also a global const would be good for LATENCY_SEC_BUCKETS. Maybe we should put some in Mysten-infra, maybe I'll do this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may be interested in different buckets for different metrics.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought little bit about it and yes, we don't need buckets beyond ~10s even I would say.

However, I also think that it is hard to come up with a decent set of buckets. I do have some PR at work that will try to calculate real pct rather than require to specify buckets initially. It is not going to be super trivial, but I think it should work. In the mean time I thought it would be ok to merge some improvement in bucket list in case we get another deployment to get some better numbers

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we already have automatic histograms for all spans. So we shouldn't need to explicitly add histograms where we already have spans - we should instead just add more spans. The span histograms use exponential buckets already.

This is preferred because we will need spans for tracing anyways.

I'll go through and clean this up when I have a chance.

@longbowlu longbowlu enabled auto-merge (squash) September 13, 2022 20:59
@longbowlu longbowlu merged commit 09e052a into main Sep 13, 2022
@longbowlu longbowlu deleted the use-registry-for-load-gen branch September 13, 2022 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants