New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rpc: Measure inter-node round trip latency of heartbeats #13533
Conversation
cc #13232 |
Reviewed 2 of 4 files at r1. pkg/rpc/clock_offset.go, line 67 at r1 (raw file):
this name should indicate the units pkg/rpc/clock_offset.go, line 142 at r1 (raw file):
.Nanoseconds() for clarity pkg/rpc/clock_offset.go, line 203 at r1 (raw file):
move this below the error check below pkg/rpc/clock_offset_test.go, line 232 at r1 (raw file):
prefix with %q: for consistency with the next case or use subtests Comments from Reviewable |
Review status: 2 of 4 files reviewed at latest revision, 5 unresolved discussions, all commit checks successful. pkg/rpc/clock_offset.go, line 41 at r1 (raw file):
I wonder how useful having metrics for the mean and stddev for latency will be. The per-remote latency numbers seem useful, but the average across all the remotes seems strange, especially in multi-datacenter deployments. Comments from Reviewable |
Will be used to inform locality-based leaseholder placement, as described in docs/RFCs/leaseholder_locality.md
Review status: 0 of 4 files reviewed at latest revision, 5 unresolved discussions, all commit checks successful. pkg/rpc/clock_offset.go, line 41 at r1 (raw file): Previously, petermattis (Peter Mattis) wrote…
It's true that it won't be particularly meaningful in multi-DC deployments, but our metrics package isn't flexible enough to do anything significantly more informative here. We could use a histogram to get somewhat more fidelity, but it won't indicate which nodes are responsible for the which measurements. I've removed the metric for now. I'd switch to a Histogram, but plumbing the recently-added pkg/rpc/clock_offset.go, line 67 at r1 (raw file): Previously, tamird (Tamir Duberstein) wrote…
Done. pkg/rpc/clock_offset.go, line 142 at r1 (raw file): Previously, tamird (Tamir Duberstein) wrote…
Done. pkg/rpc/clock_offset.go, line 203 at r1 (raw file): Previously, tamird (Tamir Duberstein) wrote…
Done for the clock offset mean. Not very relevant for latency after removing the metrics due to @petermattis's comment. pkg/rpc/clock_offset_test.go, line 232 at r1 (raw file): Previously, tamird (Tamir Duberstein) wrote…
Done. Comments from Reviewable |
Reviewed 2 of 4 files at r1, 2 of 2 files at r2. Comments from Reviewable |
Review status: 2 of 6 files reviewed at latest revision, 1 unresolved discussion. pkg/rpc/clock_offset.go, line 41 at r1 (raw file): Previously, a-robinson (Alex Robinson) wrote…
Reworked to add in a latency histogram after confirming with @mrtracy that Comments from Reviewable |
This is a bit clunky, but it's really just a workaround until @mrtracy can rework how our histograms work and remove HistogramWindowInterval.
Will be used to inform locality-based leaseholder placement, as
described in docs/RFCs/leaseholder_locality.md
@tamird @petermattis
This change is