[metrics] Add a more precise histogram implementation #4601

andll · 2022-09-13T18:14:08Z

Prometheus histogram implementation requires specifying buckets and reported histogram therefore is not very precise.

The implementation in this PR takes different approach and calculates 'true' histogram by ordering data points.

This approach is more expensive but provides better data and does not require specifying buckets in advance. For things like aggregating network latencies this seems like a better trade off, as it will show outliers better(e.g. there is no 'cut off' on the histogram) and calculate more exact values.

There are more details on caveats and potential usages in documentation for HistogramVec struct.

Current PR also replaces histogram in only one place(SafeClient) for now, hopefully we can see some better data from that.

The result from this PR is that latency histogram becomes much more granular and trustworthy.

Previously we've seen problem where real p90 was largely skewed by unfortunate bucket choice in prometheus histogram:

Note how all latencies on this graph gravitate to constant values.

With adapted histograms (this PR):

crates/sui-core/src/histogram.rs

sadhansood

Looks great, thanks!

velvia

Hey @andll I like the innovation you show by having a solution that dynamically calculates the quantiles. The obvious disadvantage is that fixed percentiles cannot be aggregated accurately, which means we can't get meaningful aggregated percentiles across nodes.

I don't like that all the labels are cloned per point, which makes this solution very very expensive.

To see how viable this is, I'm wondering how often the cycle interval is, and the reporting interval.

It's quite common to measure latencies over minutes. So for example, at 10k TPS (yeah I know I'm thinking for the future), for 5m sampling interval, you are looking at 56010000 = 3 million points. 100k would be only 1/30th of the samples in the above scenario, making this approach not viable.

What I suggest is, instead of keeping all the samples, we should use a digest structure that is explicitly designed to measure distributions. There are multiple of these that I can recommend, but the T-digest is a widely used one that I can recommend:

The T-digest (there is a Rust implementation)
DDSketch

I also suggest using Prometheus summaries instead of disparate gauges, since this is the exact use case for summaries.

velvia · 2022-09-15T21:36:22Z

crates/sui-core/src/histogram.rs

+/// Unlike the histogram from prometheus crate, this histogram does not require to specify buckets
+/// It works by calculating 'true' histogram by aggregating and sorting values.
+///
+/// The values are reported into prometheus gauge with requested labels and additional dimension


We should use the Prometheus summary which is designed for this exact purpose, to report percentiles:
https://www.robustperception.io/how-does-a-prometheus-summary-work/

The major downsides are that percentiles cannot be properly aggregated. For example, we cannot aggregate these percentiles from individual nodes into one for all nodes, there isn't a good way to combine percentiles like that. Also that the percentiles are fixed.

velvia · 2022-09-15T21:37:17Z

crates/sui-core/src/histogram.rs

+///
+/// On the bright side, this histogram exports less data to Prometheus comparing to prometheus::Histogram,
+/// it exports each requested percentile into separate prometheus gauge, while original implementation creates
+/// gauge per bucket.


More accurately, Prom histograms are a set of counters in two dimensions. This is an important difference: Prom counters are designed to be resilient to restarts.

crates/sui-core/src/histogram.rs

velvia · 2022-09-15T23:30:52Z

crates/sui-core/src/histogram.rs

+
+impl Histogram {
+    pub fn report(&self, v: Point) {
+        match self.channel.try_send((self.labels.clone(), v)) {


Can we not clone the labels on each report, but instead send a u64 ID or something? That means several allocations totalling 500 bytes - 1K per point, which is slow and also memory hungry.

Label is an Arc so it is essentially sending u64

velvia · 2022-09-15T23:32:32Z

crates/sui-core/src/histogram.rs

+                }
+                }
+        }
+        if count > MAX_POINTS {


We should consider using a rotating or ring buffer here. Newer data just automatically overwrites older data, which makes for a nice sliding window. No need to error here, because in the long run we will always overflow ...unless there is some periodic reset thing, which I didn't see

Oh never mind, it resets every cycle.

velvia · 2022-09-15T23:36:01Z

crates/sui-core/src/histogram.rs

+            data.sort_unstable();
+            for pct1000 in self.percentiles.iter() {
+                let index = Self::pct1000_index(data.len(), *pct1000);
+                let point = *data.get(index).unwrap();


Isn't data a Vec? I don't see a get method for data.

velvia · 2022-09-15T23:36:47Z

crates/sui-core/src/histogram.rs

+            self.known_labels.insert(label.clone());
+            reset_labels.remove(&label);
+            assert!(!data.is_empty());
+            data.sort_unstable();


How often will report get called?

andll · 2022-09-16T00:27:33Z

I don't like that all the labels are cloned per point, which makes this solution very very expensive.

As I replied in other comments, labels are Arc<>, so I don't think this is expensive.

For the time series format - yes, the format is changed, as mentioned in the PR description. I don't see how this is a practical problem to be honest, we can add other percentiles to the defaults if needed, adding additional percentiles is not expensive.

@velvia I understand that there are probably endless way to improve histograms.

I believe that what I use here for safe_client latencies is better then what we currently have with fixed prometheus buckets - it provides more precise results and with load we have on those counters the solution works just fine. Do you see any practical issues with this implementation for this particular use case?

Perhaps it is not a universal solution, and regular prometheus histograms are still available for other use cases, where we have a lot of data points or other reasons preventing usage of the histogram from this PR. People can chose what they want, I don't even mind if prometheus histograms will be a default choice.

Since this seem to be working solution that is better that what we had before and solves practical problem that we encountered(with us misunderstanding latencies on prometheus graphs), I propose to just move forward with it. When someone has time to implement better histograms we can do that, and I don't mind if this code will get deleted in a month. Working beats perfect.

velvia

I approve it for now, but we should acknowledge and document the drawbacks with the approach. Namely, that

percentiles cannot be accurately aggregated across nodes
Accuracy will go down once more than the limit samples is accumulated

crates/sui-core/src/histogram.rs

Prometheus histogram implementation requires specifying buckets and reported histogram therefore is not very precise. The implementation in this PR takes different approach and calculates 'true' histogram by ordering data points. This approach is more expensive but provides better data and does not require specifying buckets in advance. For things like aggregating network latencies this seems like a better trade off, as it will show outliers better(e.g. there is no 'cut off' on the histogram) and calculate more exact values. There are more details on caveats and potential usages in documentation for HistogramVec struct. Current PR also replaces histogram in only one place(SafeClient) for now, hopefully we can see some better data from that.

changeset-bot · 2022-09-16T17:02:35Z

⚠️ No Changeset found

Latest commit: dda9f45

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

andll requested review from velvia, lxfind, longbowlu and mystenmark September 13, 2022 18:14

andll force-pushed the andrey-23 branch from cc6936b to 52157a1 Compare September 13, 2022 21:29

longbowlu mentioned this pull request Sep 14, 2022

Add consensus time metrics on SUI side #4584

Merged

sadhansood reviewed Sep 15, 2022

View reviewed changes

crates/sui-core/src/histogram.rs Show resolved Hide resolved

crates/sui-core/src/histogram.rs Outdated Show resolved Hide resolved

crates/sui-core/src/histogram.rs Show resolved Hide resolved

andll force-pushed the andrey-23 branch from 52157a1 to 26b2220 Compare September 15, 2022 21:40

andll enabled auto-merge (squash) September 15, 2022 21:40

sadhansood approved these changes Sep 15, 2022

View reviewed changes

andll disabled auto-merge September 15, 2022 22:27

andll enabled auto-merge (squash) September 15, 2022 22:27

velvia suggested changes Sep 15, 2022

View reviewed changes

velvia approved these changes Sep 16, 2022

View reviewed changes

crates/sui-core/src/histogram.rs Show resolved Hide resolved

andll added 6 commits September 16, 2022 10:02

Count sum + count

3a8af35

Add unit test

3fec088

Add licence header

779eb15

Reset counters

de82a17

Address feedback

dda9f45

andll force-pushed the andrey-23 branch from 26b2220 to dda9f45 Compare September 16, 2022 17:02

andll merged commit 279bfe6 into main Sep 16, 2022

andll deleted the andrey-23 branch September 16, 2022 17:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[metrics] Add a more precise histogram implementation #4601

[metrics] Add a more precise histogram implementation #4601

andll commented Sep 13, 2022 •

edited

Loading

sadhansood left a comment

velvia left a comment

velvia Sep 15, 2022

velvia Sep 15, 2022

velvia Sep 15, 2022

andll Sep 16, 2022

velvia Sep 15, 2022

velvia Sep 15, 2022

velvia Sep 15, 2022

velvia Sep 15, 2022

andll commented Sep 16, 2022

velvia left a comment

changeset-bot bot commented Sep 16, 2022

[metrics] Add a more precise histogram implementation #4601

[metrics] Add a more precise histogram implementation #4601

Conversation

andll commented Sep 13, 2022 • edited Loading

sadhansood left a comment

Choose a reason for hiding this comment

velvia left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andll commented Sep 16, 2022

velvia left a comment

Choose a reason for hiding this comment

changeset-bot bot commented Sep 16, 2022

⚠️ No Changeset found

andll commented Sep 13, 2022 •

edited

Loading