Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpc: Measure inter-node round trip latency of heartbeats #13533

Merged
merged 2 commits into from Feb 15, 2017

Conversation

a-robinson
Copy link
Contributor

@a-robinson a-robinson commented Feb 10, 2017

Will be used to inform locality-based leaseholder placement, as
described in docs/RFCs/leaseholder_locality.md

@tamird @petermattis


This change is Reviewable

@a-robinson
Copy link
Contributor Author

cc #13232

@tamird
Copy link
Contributor

tamird commented Feb 11, 2017

:lgtm:


Reviewed 2 of 4 files at r1.
Review status: 2 of 4 files reviewed at latest revision, 4 unresolved discussions, all commit checks successful.


pkg/rpc/clock_offset.go, line 67 at r1 (raw file):

		syncutil.Mutex
		offsets   map[string]RemoteOffset
		latencies map[string]ewma.MovingAverage

this name should indicate the units


pkg/rpc/clock_offset.go, line 142 at r1 (raw file):

			r.mu.latencies[addr] = latencyAvg
		}
		latencyAvg.Add(float64(roundTripLatency))

.Nanoseconds() for clarity


pkg/rpc/clock_offset.go, line 203 at r1 (raw file):

			return err
		}
		r.metrics.LatencyMeanNanos.Update(int64(meanLatency))

move this below the error check below


pkg/rpc/clock_offset_test.go, line 232 at r1 (raw file):

		monitor.mu.Lock()
		if l, ok := monitor.mu.latencies[key]; !ok {
			t.Errorf("missing latency measurement for test %q", key)

prefix with %q: for consistency with the next case or use subtests


Comments from Reviewable

@petermattis
Copy link
Collaborator

:lgtm:


Review status: 2 of 4 files reviewed at latest revision, 5 unresolved discussions, all commit checks successful.


pkg/rpc/clock_offset.go, line 41 at r1 (raw file):

	ClockOffsetStdDevNanos *metric.Gauge
	LatencyMeanNanos       *metric.Gauge
	LatencyStdDevNanos     *metric.Gauge

I wonder how useful having metrics for the mean and stddev for latency will be. The per-remote latency numbers seem useful, but the average across all the remotes seems strange, especially in multi-datacenter deployments.


Comments from Reviewable

Will be used to inform locality-based leaseholder placement, as
described in docs/RFCs/leaseholder_locality.md
@a-robinson
Copy link
Contributor Author

Review status: 0 of 4 files reviewed at latest revision, 5 unresolved discussions, all commit checks successful.


pkg/rpc/clock_offset.go, line 41 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I wonder how useful having metrics for the mean and stddev for latency will be. The per-remote latency numbers seem useful, but the average across all the remotes seems strange, especially in multi-datacenter deployments.

It's true that it won't be particularly meaningful in multi-DC deployments, but our metrics package isn't flexible enough to do anything significantly more informative here. We could use a histogram to get somewhat more fidelity, but it won't indicate which nodes are responsible for the which measurements.

I've removed the metric for now. I'd switch to a Histogram, but plumbing the recently-added HistogramWindowInterval down in all the tests that set up an rpc.Context gets ugly really quick. It wouldn't be so bad if it was part of the base.Config (instead of server.Config), which already gets passed down into the rpc package. cc @mrtracy for an opinion on moving it or on other suggestions, since it's not clear how permanent the HistogramWindowInterval option is.


pkg/rpc/clock_offset.go, line 67 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

this name should indicate the units

Done.


pkg/rpc/clock_offset.go, line 142 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

.Nanoseconds() for clarity

Done.


pkg/rpc/clock_offset.go, line 203 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

move this below the error check below

Done for the clock offset mean. Not very relevant for latency after removing the metrics due to @petermattis's comment.


pkg/rpc/clock_offset_test.go, line 232 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

prefix with %q: for consistency with the next case or use subtests

Done.


Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented Feb 13, 2017

Reviewed 2 of 4 files at r1, 2 of 2 files at r2.
Review status: all files reviewed at latest revision, 1 unresolved discussion, all commit checks successful.


Comments from Reviewable

@a-robinson
Copy link
Contributor Author

Review status: 2 of 6 files reviewed at latest revision, 1 unresolved discussion.


pkg/rpc/clock_offset.go, line 41 at r1 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

It's true that it won't be particularly meaningful in multi-DC deployments, but our metrics package isn't flexible enough to do anything significantly more informative here. We could use a histogram to get somewhat more fidelity, but it won't indicate which nodes are responsible for the which measurements.

I've removed the metric for now. I'd switch to a Histogram, but plumbing the recently-added HistogramWindowInterval down in all the tests that set up an rpc.Context gets ugly really quick. It wouldn't be so bad if it was part of the base.Config (instead of server.Config), which already gets passed down into the rpc package. cc @mrtracy for an opinion on moving it or on other suggestions, since it's not clear how permanent the HistogramWindowInterval option is.

Reworked to add in a latency histogram after confirming with @mrtracy that HistogramWindowInterval isn't permanent and should only be around for about a month longer.


Comments from Reviewable

This is a bit clunky, but it's really just a workaround until @mrtracy
can rework how our histograms work and remove HistogramWindowInterval.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants