-
Notifications
You must be signed in to change notification settings - Fork 4k
mmaprototype: comment on the lifecycle of pending changes #157032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
|
I have a remaining question that arose while writing this PR. I’ll continue the discussion here so that it doesn’t block the other PR from merging. From our convo this morning (pls correct me if i misinterpreted), my mental model is now that when there’s a partial intermediate state and we observe something we’d like to undo, the idea is that the changes have already been subsumed in some sense. We’re not really dropping a pending change, just ignoring the remaining portion that has already been added to the state. What I’m unsure about is whether we should undo the load change here
processStoreLoadMsg
|
6d37831 to
b8e08f7
Compare
tbg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks!
@tbg reviewed 4 of 11 files at r1, 1 of 3 files at r5, 5 of 6 files at r6, 1 of 1 files at r9, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola)
pkg/kv/kvserver/allocator/mmaprototype/cluster_state.go line 942 at r9 (raw file):
// // A pending change is removed from tracking in one of three ways: // 1. Marked as enacted successfully: remove the pending changes, but the
I'm still a little shaky on this. When the change is "enacted", does it get dropped from pendingChanges? And what keeps track of the fact that there's still adjusted load? Do we just leave the load adjustment around (but there's no more pending change to tie it to) and we also track when the last change was made (for the timeout mechanism)?
pkg/kv/kvserver/allocator/mmaprototype/cluster_state.go line 953 at r9 (raw file):
// - The leaseholder of the range has changed. This is a special case where // the leaseholder of the range has moved to a different store, and the // rangeMag no longer contains the range. We assume that the pending change
rangeMag typo
pkg/kv/kvserver/allocator/mmaprototype/cluster_state.go line 956 at r9 (raw file):
// has been enacted in this case. // // 2. Undone as failed: corresponding replica and load change is rolled back.
Sumeer's example from yesterday was interesting:
- initially {s1*,s2,s3}
- decision: add s4 and transfer the lease to it
- this produces one pending change (new replica s4 with lease) - iiuc
- StoreLeaseholderMsg arrives that shows {s1*,s2,s3,s4}
- pending change is still pending (b/c s4 doesn't have the lease yet)
- if nothing else happens (store leaseholder msgs stay the same) eventually the pending change gets GC'ed, and range state reverts to
{s1*,s2,s3}(even though it should be{s1*,s2,s3,s4}as seen in recent leaseholder msgs)
Now of course the next leaseholder msg would roll the state forward again to the correct `{s1*,s2,s3,s4}. But this behavior is weird.
Sumeer seems to want to fix it, so hopefully it won't remain for long. I'll defer to @sumeerbhola what should be documented now. Personally I'd add the example and then Sumeer can update it when he updates the rest of this comment as part of the second round of re-work.
pkg/kv/kvserver/allocator/mmaprototype/cluster_state.go line 979 at r9 (raw file):
// conflicting state. See preCheckOnApplyReplicaChanges for details on how // compatibility between the pending change and the new range state is // determined. When incompatibility is detected, the pending change is
And here again all pending changes for one logical change are rolled back, right? Might be worth saying explicitly.
|
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
|
Rebased on top of master to pick up the PR. fyi - planning to force push on this one since its short. |
tbg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tbg reviewed 11 of 11 files at r10, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola and @wenyihu6)
wenyihu6
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dragging the matrix from r10 to r11 should give the recent change. I still want to discuss #157032 (comment) so I will hold off merging for now.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola and @tbg)
pkg/kv/kvserver/allocator/mmaprototype/cluster_state.go line 942 at r9 (raw file):
Previously, tbg (Tobias Grieger) wrote…
I'm still a little shaky on this. When the change is "enacted", does it get dropped from pendingChanges? And what keeps track of the fact that there's still adjusted load? Do we just leave the load adjustment around (but there's no more pending change to tie it to) and we also track when the last change was made (for the timeout mechanism)?
Added more details for clarity: when a change is enacted, it is removed from the range and cluster state's pending changes, and enactedAtTime is set here. The load adjustment remains at the store level ss.adjusted.loadPendingChanges. When processStoreLoadMsg sees that lagForChangeReflectedInLoad has elapsed since enactedAtTime here, the pending load is no longer applied. It isn’t dropped similar to how we treat pending replica changes - we just treat it as already reflected in the load message and avoid applying it again.
pkg/kv/kvserver/allocator/mmaprototype/cluster_state.go line 953 at r9 (raw file):
Previously, tbg (Tobias Grieger) wrote…
rangeMag typo
Fixed,
pkg/kv/kvserver/allocator/mmaprototype/cluster_state.go line 956 at r9 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Sumeer's example from yesterday was interesting:
- initially {s1*,s2,s3}
- decision: add s4 and transfer the lease to it
- this produces one pending change (new replica s4 with lease) - iiuc
- StoreLeaseholderMsg arrives that shows {s1*,s2,s3,s4}
- pending change is still pending (b/c s4 doesn't have the lease yet)
- if nothing else happens (store leaseholder msgs stay the same) eventually the pending change gets GC'ed, and range state reverts to
{s1*,s2,s3}(even though it should be{s1*,s2,s3,s4}as seen in recent leaseholder msgs)Now of course the next leaseholder msg would roll the state forward again to the correct `{s1*,s2,s3,s4}. But this behavior is weird.
Sumeer seems to want to fix it, so hopefully it won't remain for long. I'll defer to @sumeerbhola what should be documented now. Personally I'd add the example and then Sumeer can update it when he updates the rest of this comment as part of the second round of re-work.
This example looks the same as
cockroach/pkg/kv/kvserver/allocator/mmaprototype/cluster_state.go
Lines 962 to 973 in 204dfc1
| // and leaseholder to s4. This will be modeled with two pending changes, one | |
| // that removes the replica and leaseholder from s3, and another that adds | |
| // the replica and leaseholder to s4. An intermediate state that can be | |
| // observed is {s1, s2, s3, s4} with the lease still at s3. But the pending | |
| // change for adding s4 includes both that it has a replica, and it has the | |
| // lease, so we will not mark it done, and keep pretending that the whole | |
| // change is pending. Since lease transfers are fast, we accept this | |
| // imperfect modeling fidelity. One consequence of this imperfect modeling | |
| // is that if in this example there are no further changes observed until | |
| // GC, the allocator will undo both changes and go back to the state {s1, | |
| // s2, s3} with s3 as the leaseholder. That is, it has forgotten that s4 was | |
| // added. This is unavoidable and will be fixed by the first |
pkg/kv/kvserver/allocator/mmaprototype/cluster_state.go line 979 at r9 (raw file):
Previously, tbg (Tobias Grieger) wrote…
And here again all pending changes for one logical change are rolled back, right? Might be worth saying explicitly.
This made realize that my comment was off - I don't think we roll back replica pending changes but just load adjustment. When incompatibility is detected,
| if err != nil { |
cockroach/pkg/kv/kvserver/allocator/mmaprototype/cluster_state.go
Lines 1445 to 1450 in 204dfc1
| rs.removePendingChangeTracking(change.ChangeID) | |
| delete(cs.stores[change.target.StoreID].adjusted.loadPendingChanges, change.ChangeID) | |
| delete(cs.pendingChanges, change.ChangeID) | |
| cs.undoChangeLoadDelta(change.ReplicaChange) | |
| } | |
| if n := len(rs.pendingChanges); n > 0 { |
wenyihu6
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r10 to r13 now
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola and @tbg)
wenyihu6
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola and @tbg)
pkg/kv/kvserver/allocator/mmaprototype/cluster_state.go line 979 at r9 (raw file):
Previously, wenyihu6 (Wenyi Hu) wrote…
This made realize that my comment was off - I don't think we roll back replica pending changes but just load adjustment. When incompatibility is detected,
, we just discard the remaining changes
if err != nil { . It is possible that some of that logical action may have already been enacted, so this does not necessarily include all pending changes for a single logical change. But when we revert, it should always include all pending changes for one logical change (which only happens when we gc things and when AdjustPendingChangesDisposition(failed)).cockroach/pkg/kv/kvserver/allocator/mmaprototype/cluster_state.go
Lines 1445 to 1450 in 204dfc1
rs.removePendingChangeTracking(change.ChangeID) delete(cs.stores[change.target.StoreID].adjusted.loadPendingChanges, change.ChangeID) delete(cs.pendingChanges, change.ChangeID) cs.undoChangeLoadDelta(change.ReplicaChange) } if n := len(rs.pendingChanges); n > 0 {
Hm, perhaps we are saying the same thing - it is rolled back in the sense that we just don't apply the pending changes again to the new leaseholder range state.
|
Pulled the discussion over on slack. Will bors this in. bors r+ |
157032: mmaprototype: comment on the lifecycle of pending changes r=wenyihu6 a=wenyihu6 Epic: CRDB-55052 Release note: None Co-authored-by: wenyihu6 <wenyi@cockroachlabs.com>
|
Build failed: |
|
Rebased. bors r+ |
Epic: CRDB-55052
Release note: None