-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: transfer away leases when draining #13792
Conversation
Currently working on some code reuse in |
Reviewed 4 of 6 files at r1. pkg/storage/store.go, line 969 at r1 (raw file):
"SetDraining causes incoming lease transfers to be rejected and prevents ..." instead of the second sentence, which is confusing since it's not clear who is attempting the transfer. pkg/storage/store.go, line 991 at r1 (raw file):
this doesn't seem ok - i think you need all this jazz to be done under the same lock as the stuff going on in pkg/storage/store.go, line 1002 at r1 (raw file):
pkg/storage/store.go, line 1008 at r1 (raw file):
extract a single context and annotate it. also, this needs to stop executing, i think. pkg/storage/store.go, line 1016 at r1 (raw file):
why doesn't this use pkg/storage/store.go, line 1021 at r1 (raw file):
also needs to stop execution? Comments from Reviewable |
Reviewed 5 of 6 files at r1. pkg/storage/store.go, line 975 at r1 (raw file):
What is the goal of this TODO? What will we do with this error if we don't abort the draining process? (in general aborting the draining process seems like a bad idea - in most cases you'd rather exit gracelessly than have the process stay alive when you're trying to take it down) pkg/storage/store.go, line 985 at r1 (raw file):
Transferring all leases serially will take a long time; we need to be able to have multiple transfers in flight at once. Comments from Reviewable |
Addressed comments and moved the tests into Review status: 5 of 6 files reviewed at latest revision, 8 unresolved discussions. pkg/storage/store.go, line 969 at r1 (raw file): Previously, tamird (Tamir Duberstein) wrote…
Done. pkg/storage/store.go, line 975 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
The goal is to provide the caller with more information as to the state that SetDraining left the store in. I agree, the draining process shouldn't be aborted on error, which is why I suggest changing the upper-level draining logic. I don't have strong feelings about this though and it might be better to not return errors since this is a best-effort attempt. pkg/storage/store.go, line 985 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. pkg/storage/store.go, line 991 at r1 (raw file): Previously, tamird (Tamir Duberstein) wrote…
Done. pkg/storage/store.go, line 1002 at r1 (raw file): Previously, tamird (Tamir Duberstein) wrote…
Done. pkg/storage/store.go, line 1008 at r1 (raw file): Previously, tamird (Tamir Duberstein) wrote…
Why do you think it needs to stop executing? I think that if a zone config does not exist, it's better to transfer the lease with no constraints than not to. pkg/storage/store.go, line 1016 at r1 (raw file): Previously, tamird (Tamir Duberstein) wrote…
Done. Was a mistake. pkg/storage/store.go, line 1021 at r1 (raw file): Previously, tamird (Tamir Duberstein) wrote…
If a replica fails to transfer away a lease, I don't think we should stop other replicas from attempting to transfer away their leases. Comments from Reviewable |
Reviewed 3 of 4 files at r2. pkg/storage/store.go, line 993 at r2 (raw file):
why is this function necessary? this defer doesn't seem to buy much:
Comments from Reviewable |
Reviewed 1 of 4 files at r2. pkg/storage/client_replica_test.go, line 531 at r2 (raw file):
pkg/storage/client_replica_test.go, line 574 at r2 (raw file):
ditto pkg/storage/client_replica_test.go, line 685 at r2 (raw file):
why not use the error returned? pkg/storage/client_replica_test.go, line 688 at r2 (raw file):
please consider what someone would do with this message. which replica is the raft leader? in general, error messages should be useful, and this one is not. pkg/storage/client_replica_test.go, line 697 at r2 (raw file):
ditto. include the map? pkg/storage/client_replica_test.go, line 700 at r2 (raw file):
ditto, include something relevant? Comments from Reviewable |
Reviewed 4 of 4 files at r2. pkg/storage/client_replica_test.go, line 697 at r2 (raw file): Previously, tamird (Tamir Duberstein) wrote…
What map? It's nil/undefined if pkg/storage/client_replica_test.go, line 700 at r2 (raw file): Previously, tamird (Tamir Duberstein) wrote…
This seems fine to me; the exact values are unlikely to be helpful. pkg/storage/store.go, line 975 at r1 (raw file): Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
It's probably better to do that in some other way than returning an error. Look at the way the FreezeCluster stuff works, by returning status information as the freeze progresses. pkg/storage/store.go, line 985 at r1 (raw file): Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Transferring every lease at once is probably also a bad idea. Use Stopper.RunLimitedAsyncTask instead. Comments from Reviewable |
Review status: all files reviewed at latest revision, 9 unresolved discussions, some commit checks failed. pkg/storage/client_replica_test.go, line 697 at r2 (raw file): Previously, bdarnell (Ben Darnell) wrote…
"the map" I was referring to is Comments from Reviewable |
Review status: all files reviewed at latest revision, 9 unresolved discussions, some commit checks failed. pkg/storage/client_replica_test.go, line 531 at r2 (raw file): Previously, tamird (Tamir Duberstein) wrote…
Done. pkg/storage/client_replica_test.go, line 574 at r2 (raw file): Previously, tamird (Tamir Duberstein) wrote…
Done. pkg/storage/client_replica_test.go, line 685 at r2 (raw file): Previously, tamird (Tamir Duberstein) wrote…
Done. pkg/storage/client_replica_test.go, line 688 at r2 (raw file): Previously, tamird (Tamir Duberstein) wrote…
Done. pkg/storage/client_replica_test.go, line 697 at r2 (raw file): Previously, tamird (Tamir Duberstein) wrote…
Added the map. pkg/storage/store.go, line 975 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Thanks for that, that will probably be a good thing to add to the whole draining process. Removed the error and TODO. pkg/storage/store.go, line 985 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. pkg/storage/store.go, line 993 at r2 (raw file): Previously, tamird (Tamir Duberstein) wrote…
Done. Thanks. Comments from Reviewable |
Reviewed 6 of 6 files at r3. pkg/storage/client_raft_test.go, line 1035 at r3 (raw file):
seems that you can make this a plain pkg/storage/client_replica_test.go, line 685 at r2 (raw file): Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
use Comments from Reviewable |
When a Store is put into draining mode, it now attempts to transfer away any leases owned by its Replicas. This avoids the need to wait for leases to expire while maintaining availability.
Fixed a failing interactive tcl test. Review status: all files reviewed at latest revision, 6 unresolved discussions, some commit checks failed. pkg/storage/client_raft_test.go, line 1035 at r3 (raw file): Previously, tamird (Tamir Duberstein) wrote…
Done. pkg/storage/client_replica_test.go, line 685 at r2 (raw file): Previously, tamird (Tamir Duberstein) wrote…
Done. Comments from Reviewable |
Maybe @knz wants to look Reviewed 8 of 8 files at r4. Comments from Reviewable |
Reviewed 1 of 8 files at r4. pkg/cli/interactive_tests/test_server_sig.tcl, line 48 at r4 (raw file):
👍 Comments from Reviewable |
When a Store is put into draining mode, it now attempts to transfer away
any leases owned by its Replicas. This avoids the need to wait for
leases to expire while maintaining availability.
This change is