Telemetry stats #515

jakedt · 2022-04-06T00:21:57Z

No description provided.

jzelinskie

only some small things

internal/telemetry/metrics.go

internal/graph/check.go

internal/middleware/usagemetrics/usagemetrics.go

ecordell · 2022-04-07T17:50:10Z

TELEMETRY.md

+
+## Collected metrics
+
+### spicedb_telemetry_info (Gauge)


As I was reviewing I kept coming back to this doc to make sure it matched the impl, maybe we should throw something on the backlog to automate it

ecordell · 2022-04-07T18:06:41Z

internal/telemetry/reporter.go

+	Interval        = time.Hour
+)
+
+func writeTimeSeries(ctx context.Context, client *http.Client, endpoint string, ts []*prompb.TimeSeries) error {


Discussed offline, but I tried to find a spec for the remote write API and failed. The closest I got was https://github.com/prometheus/prometheus/blob/main/storage/remote/client.go#L191

Maybe we could use that implementation, or at least link back to it so that if there are changes between remote write api versions we know where to look

Is there a reason we wouldn't want to use this actual client?

ecordell · 2022-04-07T18:28:13Z

internal/telemetry/reporter.go

+		Msg("telemetry reporter scheduled")
+
+	// Fire off the first at start-up.
+	if err := discoverAndWriteMetrics(ctx, endpoint); err != nil {


should there be jitter here so a large cluster starting up doesn't thunderously herd the write endpoint?

ecordell · 2022-04-07T18:34:10Z

pkg/cmd/server/server.go

@@ -369,6 +383,8 @@ func (c *completedServerConfig) Run(ctx context.Context) error {
 	g.Go(c.dashboardServer.ListenAndServe)
 	g.Go(stopOnCancel(c.dashboardServer.Close))

+	g.Go(func() error { return telemetry.ReportForever(ctx, c.telemetryEndpoint) })


I don't have a strong opinion here, but I like the model that the grpc/http servers follow, where if they're disabled the objects get replaced with "noop" servers that just log and stop.

ecordell · 2022-04-07T18:37:05Z

internal/telemetry/reporter.go

+					Err(err).
+					Str("endpoint", endpoint).
+					Msg("failed to push telemetry metric")
+				nextPush = backoffInterval.NextBackOff()


what happens if the backoff interval becomes longer than the initial interval?

I don't understand the question.

The normal interval is 1hr, can we exponentially backoff until the interval is 2hrs?

Don't we want a base frequency of 1hr regardless of backoff?

internal/telemetry/metrics.go

ecordell · 2022-04-07T19:20:44Z

internal/telemetry/reporter.go

+				})
+			}
+
+			switch *fam.Type {


I think you can just use https://pkg.go.dev/github.com/prometheus/common/expfmt#ExtractSamples to avoid this, but I didn't test it

Looked into it and this looks usable. We'd iterate over those samples and create the prompb.

internal/telemetry/metrics.go

jzelinskie · 2022-04-08T15:12:10Z

internal/telemetry/reporter.go

+	"github.com/golang/snappy"
+	dto "github.com/prometheus/client_model/go"
+	"github.com/prometheus/common/model"
+	"github.com/prometheus/prometheus/prompb"


We can drop importing all of prometheus if we use this:

go.buf.build/protocolbuffers/go/prometheus/prometheus

ecordell

LGTM

ecordell

LGTM

just had one thing we might want to think about for the future

ecordell · 2022-04-08T22:49:19Z

internal/telemetry/reporter.go

+
+	return func(ctx context.Context) error {
+		// Smear the startup delay out over 10% of the reporting interval
+		startupDelay := time.Duration(rand.Int63n(int64(interval.Seconds()/10))) * time.Second


Since that could be up to 10m after startup, maybe we should attempt to write on shutdown if we haven't written at all yet and the server has at least successfully started up. And actually, maybe we want separate telemetry on startup error rates.

I realize this is a fairly involved change for an uncommon edge case, maybe we just make an issue and do it later.

evan is the real reviewer

jakedt requested a review from jzelinskie April 6, 2022 00:21

github-actions bot added area/CLI Affects the command line area/dependencies Affects dependencies labels Apr 6, 2022

jzelinskie previously requested changes Apr 6, 2022

View reviewed changes

internal/telemetry/metrics.go Show resolved Hide resolved

internal/telemetry/metrics.go Outdated Show resolved Hide resolved

internal/telemetry/metrics.go Outdated Show resolved Hide resolved

jakedt force-pushed the telemetry-stats branch 2 times, most recently from 7071dbd to 892605f Compare April 6, 2022 13:35

jakedt requested a review from ecordell April 6, 2022 15:33

ecordell requested changes Apr 7, 2022

View reviewed changes

internal/telemetry: init

f83734f

jzelinskie reviewed Apr 8, 2022

View reviewed changes

jakedt force-pushed the telemetry-stats branch from 892605f to 798d94e Compare April 8, 2022 18:43

ecordell previously approved these changes Apr 8, 2022

View reviewed changes

jakedt dismissed ecordell’s stale review via 8e10a0b April 8, 2022 21:38

jakedt force-pushed the telemetry-stats branch 2 times, most recently from 8e10a0b to 667f2be Compare April 8, 2022 21:44

split telemetry registry, read datastore stats, rework reporter timing

a62cd47

jakedt force-pushed the telemetry-stats branch from 667f2be to a62cd47 Compare April 8, 2022 21:48

ecordell approved these changes Apr 8, 2022

View reviewed changes

jakedt merged commit a13d8fc into main Apr 11, 2022

jakedt deleted the telemetry-stats branch April 11, 2022 13:51

github-actions bot locked and limited conversation to collaborators Apr 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Telemetry stats #515

Telemetry stats #515

jakedt commented Apr 6, 2022

jzelinskie left a comment

ecordell Apr 7, 2022

ecordell Apr 7, 2022

jakedt Apr 8, 2022

ecordell Apr 7, 2022 •

edited

ecordell Apr 7, 2022

ecordell Apr 7, 2022

jakedt Apr 7, 2022

ecordell Apr 7, 2022

ecordell Apr 7, 2022

jzelinskie Apr 7, 2022 •

edited

jzelinskie Apr 8, 2022

ecordell left a comment

ecordell left a comment

ecordell Apr 8, 2022

Telemetry stats #515

Telemetry stats #515

Conversation

jakedt commented Apr 6, 2022

jzelinskie left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ecordell Apr 7, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jzelinskie Apr 7, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ecordell left a comment

Choose a reason for hiding this comment

ecordell left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ecordell Apr 7, 2022 •

edited

jzelinskie Apr 7, 2022 •

edited