-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
transient stats prototype #3504
Conversation
@mrtracy definitely needs to take a look at this. Review status: 0 of 18 files reviewed at latest revision, 4 unresolved discussions, some commit checks failed. server/status.go, line 94 [r1] (raw file): util/metric/metric.go, line 41 [r1] (raw file): After looking at the rest of the code, I'm wondering why the name isn't attached to the
That would also resolve the issue with util/metric/metric.go, line 265 [r1] (raw file): util/metric/metric.go, line 326 [r1] (raw file): Comments from the review on Reviewable.io |
Review status: 0 of 18 files reviewed at latest revision, 16 unresolved discussions, some commit checks failed. server/node.go, line 472 [r1] (raw file): server/status/feed_test.go, line 305 [r1] (raw file): server/status/monitor.go, line 59 [r1] (raw file): server/status/monitor.go, line 217 [r1] (raw file): util/metric/metric.go, line 19 [r1] (raw file): util/metric/metric.go, line 54 [r1] (raw file): util/metric/metric.go, line 91 [r1] (raw file): util/metric/metric.go, line 97 [r1] (raw file): util/metric/metric.go, line 177 [r1] (raw file): util/metric/metric.go, line 179 [r1] (raw file): util/metric/metric.go, line 263 [r1] (raw file): util/metric/metric.go, line 326 [r1] (raw file): util/metric/metric.go, line 327 [r1] (raw file): Comments from the review on Reviewable.io |
Review status: 0 of 18 files reviewed at latest revision, 15 unresolved discussions. server/node.go, line 472 [r1] (raw file): server/status.go, line 94 [r1] (raw file): server/status/monitor.go, line 59 [r1] (raw file): server/status/monitor.go, line 217 [r1] (raw file): util/metric/metric.go, line 19 [r1] (raw file): util/metric/metric.go, line 41 [r1] (raw file): util/metric/metric.go, line 54 [r1] (raw file): util/metric/metric.go, line 91 [r1] (raw file): util/metric/metric.go, line 97 [r1] (raw file): util/metric/metric.go, line 177 [r1] (raw file): util/metric/metric.go, line 179 [r1] (raw file): util/metric/metric.go, line 263 [r1] (raw file): util/metric/metric.go, line 265 [r1] (raw file): util/metric/metric.go, line 326 [r1] (raw file): util/metric/metric.go, line 327 [r1] (raw file): Comments from the review on Reviewable.io |
Review status: 0 of 18 files reviewed at latest revision, 13 unresolved discussions. server/status/monitor.go, line 59 [r1] (raw file): server/status/monitor.go, line 217 [r1] (raw file): util/metric/metric.go, line 41 [r1] (raw file): Comments from the review on Reviewable.io |
Review status: 0 of 18 files reviewed at latest revision, 13 unresolved discussions. server/status/monitor.go, line 59 [r1] (raw file): server/status/monitor.go, line 217 [r1] (raw file): util/metric/metric.go, line 41 [r1] (raw file): Comments from the review on Reviewable.io |
Review status: 0 of 19 files reviewed at latest revision, 13 unresolved discussions. util/metric/metric.go, line 97 [r1] (raw file): Comments from the review on Reviewable.io |
Reviewed 5 of 18 files at r1, 4 of 10 files at r2, 10 of 10 files at r3. sql/executor.go, line 235 [r3] (raw file): defer func(start time.Time) {
e.latency.RecordValue(time.Since(start).Nanoseconds())
}(time.Now()) util/metric/registry.go, line 46 [r3] (raw file): util/metric/registry.go, line 48 [r3] (raw file): util/metric/registry.go, line 69 [r3] (raw file): Comments from the review on Reviewable.io |
Review status: 15 of 19 files reviewed at latest revision, 5 unresolved discussions. server/status/monitor.go, line 59 [r1] (raw file): server/status/monitor.go, line 217 [r1] (raw file): util/metric/metric.go, line 179 [r1] (raw file): Comments from the review on Reviewable.io |
Review status: 15 of 19 files reviewed at latest revision, 5 unresolved discussions. server/status/monitor.go, line 59 [r1] (raw file): server/status/monitor.go, line 217 [r1] (raw file): util/metric/metric.go, line 179 [r1] (raw file): Comments from the review on Reviewable.io |
Review status: 8 of 19 files reviewed at latest revision, 5 unresolved discussions. server/status/monitor.go, line 217 [r1] (raw file): util/metric/metric.go, line 179 [r1] (raw file): Comments from the review on Reviewable.io |
b2c8370
to
adce909
Compare
LGTM Review status: 8 of 22 files reviewed at latest revision, 5 unresolved discussions, some commit checks failed. util/metric/metric.go, line 126 [r5] (raw file): Comments from the review on Reviewable.io |
Review status: 8 of 22 files reviewed at latest revision, 5 unresolved discussions, some commit checks failed. util/metric/metric.go, line 126 [r5] (raw file): Comments from the review on Reviewable.io |
Review status: 8 of 22 files reviewed at latest revision, 5 unresolved discussions, some commit checks failed. util/metric/metric.go, line 126 [r5] (raw file): Comments from the review on Reviewable.io |
LGTM Review status: 8 of 22 files reviewed at latest revision, 4 unresolved discussions. server/status/monitor.go, line 58 [r6] (raw file): Comments from the review on Reviewable.io |
Review status: 8 of 22 files reviewed at latest revision, 4 unresolved discussions. server/status/monitor.go, line 58 [r6] (raw file): Comments from the review on Reviewable.io |
Review status: 8 of 22 files reviewed at latest revision, 4 unresolved discussions. server/status/monitor.go, line 58 [r6] (raw file): Comments from the review on Reviewable.io |
b59dbb9
to
1553899
Compare
Reviewed 2 of 18 files at r1, 2 of 10 files at r2, 4 of 10 files at r3, 12 of 13 files at r5, 1 of 1 files at r6, 1 of 1 files at r7. server/status/monitor.go, line 58 [r6] (raw file): Comments from the review on Reviewable.io |
@@ -276,8 +279,14 @@ func TestNodeStatusRecorder(t *testing.T) { | |||
generateStoreData(2, "capacity.available", 100, 75), | |||
|
|||
// Node stats. | |||
generateNodeData(1, "calls.success", 100, 2), | |||
generateNodeData(1, "calls.error", 100, 1), | |||
generateNodeData(1, "exec.successcount", 100, 2), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These names need a delimiter, and I still think the metric package should add one. It doesn't need to be the same delimiter if that would alleviate your concern about codifying our naming conventions: exec.success@count
, exec.success@1h
.
Add a few comments, although I do approve of the direction. I finally had a chance to dig into go-metrics Histogram, and it does seem incorrect; while exponential decay seems more appealing in principle than the hard cliffs of the windowed histogram, the exponential system in go-metrics could probabilistically lose max/min samples which are very important. This might be something to eventually communicate back to the go-metrics team. I think your commit message is no longer correct, as ticking does not require a goroutine in your latest iteration. LGTM Review status: 19 of 22 files reviewed at latest revision, 7 unresolved discussions, some commit checks failed. server/status/monitor.go, line 59 [r1] (raw file): server/status/monitor.go, line 58 [r6] (raw file): server/status/recorder.go, line 196 [r8] (raw file): Given the windowed nature of this, I don't know if count makes a lot of sense. This could be paired with a rate or cumulative counter if that information is needed. server/status/recorder_test.go, line 282 [r8] (raw file): util/metric/registry.go, line 103 [r8] (raw file): Even without support for rollups, you probably at least want support for recording at variable time scales before adding this; all time series are currently recorded every ten seconds, but that's going to be of extremely dubious value for a 1H sample window, especially when old sample windows will be dropped on precise 15 minute boundaries. Comments from the review on Reviewable.io |
Review status: 19 of 22 files reviewed at latest revision, 7 unresolved discussions, some commit checks failed. server/status/monitor.go, line 58 [r6] (raw file): server/status/recorder.go, line 196 [r8] (raw file): server/status/recorder_test.go, line 282 [r8] (raw file): util/metric/registry.go, line 103 [r8] (raw file): Comments from the review on Reviewable.io |
thanks for the review everyone. @mrtracy I'm going to leave this open for you since I took care of the TODO about recording histograms. |
Review status: 15 of 22 files reviewed at latest revision, 7 unresolved discussions. server/status/monitor.go, line 59 [r1] (raw file): Googling for "metric naming schemes" turns up three different schemes in the top four results: librato and graphite use Comments from the review on Reviewable.io |
LGTM Review status: 15 of 22 files reviewed at latest revision, 7 unresolved discussions. util/metric/registry.go, line 103 [r8] (raw file): Comments from the review on Reviewable.io |
This is a prototype for the collection of transient performance statistics. It does not yet expose "useful" runtime statistics, but is meant to lay the groundwork for doing so across different parts of the system. We were previously using the `go-metrics` package, which provides the concept of a `Registry` bundling different metrics. Unfortunately, various limitations, a certain amount of interface bloat (which made it annoying to provide custom implementations) and various design choices made this unwieldy to work with without upstream adaptations. Instead, introduced the `util/metric` package which replaces the registry functionality and provides light wrapper implementations around metrics. Metric registries can be nested and in fact we do so, keeping one "global" metrics registry and various others in it. In particular Node- and Store-level metrics have individual registries. This is useful since this allows their metrics to be stored to their corresponding time series with simple iterations over a registry and with correct names globally and locally. For now, we wrap `Counter`, `Gauge` and `EWMA` types and provide a windowed histogram (based on Gil Tene's `HDRHistogram`s). Others (such as reservoir backed histograms can be added as required.
Give a human being some insight into how we might have gotten to this state based on feedback from cockroachdb#3504.
This is a prototype for the collection of transient performance statistics.
It does not yet expose many "useful" runtime statistics, but is meant to lay
the groundwork for doing so across different parts of the system.
We were previously using the
go-metrics
package, which provides the conceptof a
Registry
bundling different metrics. Unfortunately, various limitations,a certain amount of interface bloat (which made it annoying to provide custom
implementations) and various design choices made this unwieldy to work with
without upstream adaptations.
Instead, introduced the
util/metric
package which replaces the registryfunctionality and provides light wrapper implementations around metrics.
Each registry object owns its metrics' goroutines and terminates them using
a closer channel; this allows for easy integration with
util.Stopper
.Metric registries can be nested and in fact we do so, keeping one "global"
metrics registry and various others in it. In particular Node- and Store-level
metrics have individual registries. This is useful since this allows their
metrics to be stored to their corresponding time series with simple iterations
over a registry and with correct names globally and locally.
For now, we wrap
Counter
,Gauge
andEWMA
types and provide a windowedhistogram (based on Gil Tene's
HDRHistogram
s). Others (such as reservoirbacked histograms) can be added as required.