-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv: add replicas.uninitialized metric #73975
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet turnaround time! And congrats on your first PR! In addition to the comments below, let's squash these changes into a single commit. We have this wiki-page on good commit hygiene that you might find interesting. For here I'd just suggest rewording the title to read something similar to what you have for the PR: "kvserver: add metric for uninitialized replicas".
pkg/kv/kvserver/store.go
Outdated
@@ -3011,6 +3012,9 @@ func (s *Store) updateReplicationGauges(ctx context.Context) error { | |||
if metrics.Quiescent { | |||
quiescentCount++ | |||
} | |||
|
|||
uninitializedCount = int64(len(s.mu.uninitReplicas)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird, github reviews don't show where this map is being updated, but I could see it locally. In any case, some notes from when we looked at this offline:
- The per-replica visitor above (
newStoreReplicaVisitor
) also iterates over uninitialized replicas. Instead of tracking+maintaining the set of uninitialized replicas in a separate map (probably more error prone), using the existing replica iteration + checking if the current replica is initialized is I think sufficient. - We'd want a test. We looked at another test to steal from where we could block the reception of a snapshot on a second node, ensure that our uninitialized count is as we'd expect, unblock the replica initialization, and ensure it's back to 0.
Note on labels: for most of them (mod |
413b9ff
to
82f95fb
Compare
c5d9d32
to
78acdf2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice test! I only have a few minor suggestions to trim it down and we should be all set. The CI failures also suggest an update to TestChartCatalogMetrics.
78acdf2
to
6fb857f
Compare
6fb857f
to
e72f50f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif and @nvanbenschoten)
pkg/kv/kvserver/store.go, line 3016 at r4 (raw file):
Previously, irfansharif (irfan sharif) wrote…
Weird, github reviews don't show where this map is being updated, but I could see it locally. In any case, some notes from when we looked at this offline:
- The per-replica visitor above (
newStoreReplicaVisitor
) also iterates over uninitialized replicas. Instead of tracking+maintaining the set of uninitialized replicas in a separate map (probably more error prone), using the existing replica iteration + checking if the current replica is initialized is I think sufficient.- We'd want a test. We looked at another test to steal from where we could block the reception of a snapshot on a second node, ensure that our uninitialized count is as we'd expect, unblock the replica initialization, and ensure it's back to 0.
Done. We resolved to use the original method and reuse the test.
pkg/kv/kvserver/client_replica_test.go, line 3912 at r5 (raw file):
Previously, irfansharif (irfan sharif) wrote…
I think we can get rid of all usages of
stickyEngineRegistry
,raftConfig
, andstickySpecTestServerArgs
. We're not restarting an existing test server and don't need server state to be persisted across the restarts, which is what all this machinery sets out to do. Trying it out locally, all we really need is the following:tc := testcluster.StartTestCluster(t, 1, base.TestClusterArgs{ ReplicationMode: base.ReplicationManual, ServerArgs: base.TestServerArgs{Insecure: true}, })
And below:
tc.AddAndStartServer(t, base.TestServerArgs{ Insecure: true,
Done.
pkg/kv/kvserver/client_replica_test.go, line 3945 at r5 (raw file):
Previously, irfansharif (irfan sharif) wrote…
While we technically can split out a separate range using a tenant prefix, we can do something more direct:
scratchKey := tc.ScratchRange(t) _, repl := getFirstStoreReplica(t, tc.Server(0), scratchKey)
Done.
pkg/kv/kvserver/client_replica_test.go, line 3949 at r5 (raw file):
Previously, irfansharif (irfan sharif) wrote…
Since there's only one sub-test, let's get rid of this
t.Run
outdent and make it part of the main test itself. We typically use sub-tests when testing minor alterations of a single test run (perhaps with shared setup), which isn't quite the case here.
Done.
pkg/kv/kvserver/client_replica_test.go, line 3960 at r5 (raw file):
Previously, irfansharif (irfan sharif) wrote…
[nit] Given these are unused,
s/request_type/_
ands/strings/_
.
Done.
pkg/kv/kvserver/client_replica_test.go, line 3997 at r5 (raw file):
Previously, irfansharif (irfan sharif) wrote…
s/iota/0
Done.
pkg/kv/kvserver/client_replica_test.go, line 4001 at r5 (raw file):
Previously, irfansharif (irfan sharif) wrote…
[nit] Inline things here and below:
require.NoError(t, uninitStore.ComputeMetrics(ctx, 0)) // ... require.Equal(t, int64(1), uninitStore.Metrics().UninitializedCount.Value()) // ... require.NoError(t, uninitStore.ComputeMetrics(ctx, 0)) // ... require.Equal(t, int64(0), uninitStore.Metrics().UninitializedCount.Value()) // now initialized
Done.
pkg/ui/workspaces/cluster-ui/src/store/nodes/nodes.fixtures.ts, line 891 at r5 (raw file):
Previously, irfansharif (irfan sharif) wrote…
Is this diff still relevant given we're not adding it to the cluster UI? (anything in addition to the advanced debug page i.e.)
You are right, I thought it might be necessary to declare a default value. I tested without this line and it still shows for advanced debugging. Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@irfansharif thanks for the comments. I've adopted your suggestions.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif and @nvanbenschoten)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet, LGTM! For this kind of test I'd typically stress it locally to shake out any flakes, but because it's a new test our CI should already do that for you. For posterity:
$ dev test pkg/kv/kvserver -f=TestUninitializedMetric --stress
INFO: Found 1 test target
16 runs so far, 0 failures, over 5s
59 runs so far, 0 failures, over 10s
95 runs so far, 0 failures, over 15s
125 runs so far, 0 failures, over 20s
...
Had to first rebase on master to pick up #73982. Could also use make stress PKG=./pkg/kv/kvserver TESTS=TestUninitializedMetric
.
@@ -3901,6 +3901,89 @@ func TestTenantID(t *testing.T) { | |||
|
|||
} | |||
|
|||
// TestUninitializedMetric |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// TestUninitializedMetric ensures that uninitialized replicas are reflected in the UninitializedCount store metric.
forceMetricTick++ | ||
|
||
require.NoError(t, <-addReplicaErr) | ||
require.NoError(t, targetStore.ComputeMetrics(ctx, forceMetricTick)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, instead of maintaining the forceMetricCounter, just passing 0 directly in ComputeMetrics also works (looking at other usages that's what we seem to do).
e72f50f
to
24bb005
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 5 files at r5, 1 of 2 files at r6, 1 of 1 files at r7, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif and @kvoli)
-- commits, line 5 at r7:
You'll want to link this to #73368 here and the in PR description. You have "Addresses" in the PR description, but unfortunately, that is not one of the supported keywords. See https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword.
pkg/kv/kvserver/metrics.go, line 73 at r4 (raw file):
Previously, kvoli (Austen) wrote…
This is from: #73368. @nvanbenschoten mentions
uninitialized replicas can lie dormant in persistent state without any corresponding in-memory representation
Right, there's a small amount of state (typically just the HardState
) that can be persisted by an uninitialized replica. We don't look for this on process startup and don't instantiate a Replica
object for these unless that replica hears a new message. Undercounting here is mostly fine, as this persisted state is tiny and we're mostly interested in the cost of these in-memory Replica
objects.
There's some discussion about this in here and here.
pkg/kv/kvserver/metrics.go, line 1249 at r7 (raw file):
RaftLeaderNotLeaseHolderCount *metric.Gauge LeaseHolderCount *metric.Gauge UninitializedCount *metric.Gauge
nit: keep the ordering consistent. So move this below QuiescentCount
.
pkg/kv/kvserver/store.go, line 3016 at r4 (raw file):
Previously, kvoli (Austen) wrote…
Done. We resolved to use the original method and reuse the test.
An alternative to this approach is to add a new WithUninitialized()
method (similar to InOrder
) to storeReplicaVisitor
which avoids filtering out uninitialized replicas in storeReplicaVisitor.Visit
. We could then add a small branch at the top of this visitor function that checks if the current replica is uninitialized ((*Replica).IsInitialized
). If it is, increment uninitializedCount
and return true
.
But what we have here works too.
pkg/kv/kvserver/store.go, line 3019 at r7 (raw file):
quiescentCount++ }
nit: now that you've addressed Irfan's comment, there's a stray line here.
1acc811
to
b51b14c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @kvoli, and @nvanbenschoten)
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
You'll want to link this to #73368 here and the in PR description. You have "Addresses" in the PR description, but unfortunately, that is not one of the supported keywords. See https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword.
Updated with a supported keyword.
pkg/kv/kvserver/metrics.go, line 73 at r4 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Right, there's a small amount of state (typically just the
HardState
) that can be persisted by an uninitialized replica. We don't look for this on process startup and don't instantiate aReplica
object for these unless that replica hears a new message. Undercounting here is mostly fine, as this persisted state is tiny and we're mostly interested in the cost of these in-memoryReplica
objects.
Noted, thanks for the context.
pkg/kv/kvserver/metrics.go, line 1249 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: keep the ordering consistent. So move this below
QuiescentCount
.
Done.
pkg/kv/kvserver/store.go, line 3016 at r4 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
An alternative to this approach is to add a new
WithUninitialized()
method (similar toInOrder
) tostoreReplicaVisitor
which avoids filtering out uninitialized replicas instoreReplicaVisitor.Visit
. We could then add a small branch at the top of this visitor function that checks if the current replica is uninitialized ((*Replica).IsInitialized
). If it is, incrementuninitializedCount
andreturn true
.But what we have here works too.
So it is being filtered out here. I wasn't certain. I'll keep it as is atm, since I don't know what else it might be necessary for yet.
pkg/kv/kvserver/client_replica_test.go, line 3904 at r6 (raw file):
Previously, irfansharif (irfan sharif) wrote…
// TestUninitializedMetric ensures that uninitialized replicas are reflected in the UninitializedCount store metric.
Done.
pkg/kv/kvserver/client_replica_test.go, line 3981 at r6 (raw file):
Previously, irfansharif (irfan sharif) wrote…
Btw, instead of maintaining the forceMetricCounter, just passing 0 directly in ComputeMetrics also works (looking at other usages that's what we seem to do).
Adopted this change.
9f2d01d
to
db1e944
Compare
This change adds a metric `replica.uninitialized` counting the number of uninitialized replicas within a store. resolves cockroachdb#73368 Release note (ops change): Added a metric `replica.uninitialized` which tracks the number of `Uninitialized` replicas in a store.
db1e944
to
2e2ecd9
Compare
bors r+ |
This PR was included in a batch that was canceled, it will be automatically retried |
Build succeeded: |
This change adds
replicas.unitialized
metric, tracking the number of non-init replicas on a store.Note that this metric doesn't give the exact number of uninitialized replicas because, after a process restart, uninitialized replicas can remain on disk without an in memory representation.
resolves #73368