-
Notifications
You must be signed in to change notification settings - Fork 4k
mmaprototype: move rebalanceStores to its own file #157224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks. It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
e7e27ff to
782ad5f
Compare
|
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
|
Reviewer notes: this PR was generated with the help of AI. Concretely, I scoped out a step-by-step plan for the refactor, and had Cursor work through it step by step. The commit messages are Cursor-generated as well, and I reviewed everything as it went in (occasionally telling Cursor to backtrack and solve things differently). Each individual commit should be easy to review, if that is not the case, please flag so that I can figure out what went wrong. |
This is step 1 in extracting the shedStore logic into a standalone function.
Convert return and continue statements to boolean flags (shouldReturnEarly, shouldContinue) that are checked after the block. This prepares the code for extraction into a function by moving all loop control structure outside the block that will become shedStore.
Convert the shedStore block to an IIFE that returns (bool, bool) for shouldReturnEarly and shouldContinue. Replace all flag assignments with return statements. This prepares the code for adding more return values and parameters.
Introduce rebalanceState struct that contains *clusterState and tracks counters and outcomes of rebalanceStores invocation. Refactor the shedStore function to write into *rebalanceState instead of returning values. This prepares for passing *rebalanceState as a parameter in a future change.
Change the shedStore IIFE to accept *rebalanceState as a parameter instead of capturing it from the outer scope. This prepares the function for extraction to a standalone method on *rebalanceState.
Add store sheddingStore as a parameter to the shedStore function instead of capturing it from the loop. Remove idx parameter as it was only used for logging and is not needed.
Move rng and dsm fields into the rebalanceState struct instead of passing them as parameters. This reduces the parameter list and keeps related state together.
…vars to rebalanceState Add ctx, localStoreID, and now as parameters to the shedStore function. Move scratch variables (disj, storesToExclude, storesToExcludeForRange, scratchNodes, scratchStores) into a nested scratch struct within rebalanceState for better organization.
Extract the shedStore immediately invoked function expression (IIFE) into a standalone rebalanceStore method on *rebalanceState. Move sheddingStore type definition to top level. The method now has a clean signature with rs as receiver and takes store, ctx, localStoreID, and now as parameters.
Remove the shouldReturnEarly flag and instead check the condition directly in the loop. The loop now checks if rangeMoveCount or leaseTransferCount has reached their limits before processing each store, and breaks if either condition is met.
Remove the shouldContinue flag since the function returns normally when it wants to skip to the next store, and the loop naturally continues to the next iteration.
782ad5f to
db7c2de
Compare
wenyihu6
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola)
pkg/kv/kvserver/allocator/mmaprototype/cluster_state_rebalance_stores.go line 344 at r3 (raw file):
if leaseTransferCount >= maxLeaseTransferCount { log.KvDistribution.VInfof(ctx, 2, "reached max lease transfer count %d, returning", maxLeaseTransferCount) return true, false
nit: not needed now - lets squeeze this in in the future can we add comments like return true /shouldReturnEarly/, false /* shouldContinue */
update: nvm later commits changed it
pkg/kv/kvserver/allocator/mmaprototype/cluster_state_rebalance_stores.go line 380 at r2 (raw file):
log.KvDistribution.VInfof(ctx, 2, "skipping remote store s%d: in lease shedding grace period", store.StoreID) shouldContinue = true break
I found this intermediate state confusing - am i seeing this right that we break out of the loop here for sheddingStores do we want that
pkg/kv/kvserver/allocator/mmaprototype/cluster_state_rebalance_stores.go line 582 at r2 (raw file):
return changes } if shouldContinue {
when we added the shouldContinue flag here, is it correct that we don't actually need this since the loop would naturally go to the next iteration like you did in the last commit?
pkg/kv/kvserver/allocator/mmaprototype/cluster_state_rebalance_stores.go line 167 at r4 (raw file):
// See the long comment where rangeState.lastFailedChange is declared. const lastFailedChangeDelayDuration time.Duration = 60 * time.Second rs := &rebalanceState{
nit: not needed now but can we rename this rs - it confuses me with rangeState
pkg/kv/kvserver/allocator/mmaprototype/cluster_state_rebalance_stores.go line 606 at r4 (raw file):
} }() if rs.shouldReturnEarly {
nit: can we add an assertion that rs.shouldReturnEarly != rs.shouldContinue
update: nvm your later commits got rid of it
sumeerbhola
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sumeerbhola reviewed 1 of 1 files at r1.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @tbg)
pkg/kv/kvserver/allocator/mmaprototype/cluster_state_rebalance_stores.go line 346 at r2 (raw file):
if leaseTransferCount >= maxLeaseTransferCount { log.KvDistribution.VInfof(ctx, 2, "reached max lease transfer count %d, returning", maxLeaseTransferCount) shouldReturnEarly = true
I can't correctly code review this piece-by-piece refactor without having the code open in my IDE and looking at the nested loops and what loop is being exited with a break.
One didn't need to think through the nesting with a return statement. And looks like the final state as the returns again, so I am just looking at the full thing as a single diff.
tbg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTRs!
bors r+
@tbg dismissed @sumeerbhola from a discussion.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @sumeerbhola and @wenyihu6)
pkg/kv/kvserver/allocator/mmaprototype/cluster_state_rebalance_stores.go line 346 at r2 (raw file):
Previously, sumeerbhola wrote…
I can't correctly code review this piece-by-piece refactor without having the code open in my IDE and looking at the nested loops and what loop is being exited with a break.
One didn't need to think through the nesting with areturnstatement. And looks like the final state as the returns again, so I am just looking at the full thing as a single diff.
Yeah, single-diff is the way to go here, in hindsight. This will be different in follow-up PRs, where large chunks of code will actually have to move. I'll need to think of a better way to "ease out" of a nested for loop - I agree that the review of this is a real pain (not to mention I introduced a bug that lasted a few commits until it was patched up).
pkg/kv/kvserver/allocator/mmaprototype/cluster_state_rebalance_stores.go line 380 at r2 (raw file):
Previously, wenyihu6 (Wenyi Hu) wrote…
I found this intermediate state confusing - am i seeing this right that we break out of the loop here for sheddingStores do we want that
Yeah, this looks wrong - good catch, I missed this during self-review. It's hard to track whether one is in the inner or outer loop, and this is in the outer one. This is fixed by the end, fortunately, and all tests passed (lol).
I'll think of a better strategy for easing out of such loops in a more reviewable way.
pkg/kv/kvserver/allocator/mmaprototype/cluster_state_rebalance_stores.go line 582 at r2 (raw file):
Previously, wenyihu6 (Wenyi Hu) wrote…
when we added the
shouldContinueflag here, is it correct that we don't actually need this since the loop would naturally go to the next iteration like you did in the last commit?
Yeah, the if shouldContinue { continue } isn't needed. I think the LLM was being unnecessarily verbose and I didn't catch it.
pkg/kv/kvserver/allocator/mmaprototype/cluster_state_rebalance_stores.go line 167 at r4 (raw file):
Previously, wenyihu6 (Wenyi Hu) wrote…
nit: not needed now but can we rename this rs - it confuses me with rangeState
Good point. Queued up for a follow-up.
This PR refactors
rebalanceStoresby extracting the store-shedding logic from into a standalonerebalanceStoremethod on a newrebalanceStatestruct. The struct encapsulates rebalancing state (cluster state references, counters, configuration, and scratch variables), reducing the method's cyclomatic complexity and improving organization. More follow-up refactors are planned to straighten this up further and decompose it more, but this seemed like a good checkpoint.There should be zero behavior changes in this PR.
Informs #157757.
Epic: CRDB-55052