-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: enforce gc ttl strictly #45826
storage: enforce gc ttl strictly #45826
Conversation
513fdfc
to
3e5c6f5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 16 of 16 files at r1.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)
pkg/kv/kvserver/client_replica_test.go, line 2951 at r1 (raw file):
defer leaktest.AfterTest(t)() // The unfortunate thing about this test is that the gcttl is in seconds and
How long does it take on average?
pkg/kv/kvserver/client_replica_test.go, line 3064 at r1 (raw file):
t.Run("disable strict enforcement", func(t *testing.T) { sqlDB.Exec(t, `SET CLUSTER SETTING kv.gc_ttl.strict_enforcement = false`) defer sqlDB.Exec(t, `SET CLUSTER SETTING kv.gc_ttl.strict_enforcement = DEFAULT`)
Do you think it's worth structuring this test so that it will work regardless of whether kv.gc_ttl.strict_enforcement
is defaulted to true or not (i.e. set to true at the beginning and avoid DEFAULT
)?
pkg/kv/kvserver/client_replica_test.go, line 3104 at r1 (raw file):
// Let's now release the protected timestamp, refresh past that, then // make sure that we've got enforcement. Then let's add a new one. // That one should not see itself respect because we haven't verified.
Were you intending to extend this test?
pkg/kv/kvserver/replica.go, line 458 at r1 (raw file):
draining bool // protectedTimestampCache provides the state of the protected timestamp
s/protectedTimestampCache/cachedProtectedTS/
pkg/kv/kvserver/replica.go, line 737 at r1 (raw file):
impliedThreshold := gc.CalculateThreshold(now, *r.mu.zone.GC) if threshold.Less(impliedThreshold) { threshold = impliedThreshold
nit: consider using .Forward()
in here in both places.
pkg/kv/kvserver/replica_protected_timestamp.go, line 44 at r1 (raw file):
// makeUpdateFunc returns a function which will attempt to update the Replica's // cached with cpts if it is non-empty. It assumes that the Replica is not
"with cpts"
this looks stale. Same issue below. I think you mean "ts".
pkg/kv/kvserver/replica_protected_timestamp.go, line 46 at r1 (raw file):
// cached with cpts if it is non-empty. It assumes that the Replica is not // locked when the returned function is called. func (ts *cachedProtectedTimestampState) makeUpdateFunc(r *Replica) func() {
I feel like this would be cleaner as a method on Replica
instead of a closure capturing the cachedProtectedTimestampState
reference and the Replica
reference, but I guess this is fine if you prefer this structure.
pkg/kv/kvserver/replica_protected_timestamp.go, line 60 at r1 (raw file):
// clearIfNotNewer clears the state in cpts if it is not newer than the passed // ts. This will enable the fastpath in the update func.
Could you make it clear that this is an optimization and describe how it works. It's pretty subtle that we're including this in the rlock path to avoid hitting the wlock path in makeUpdateFunc
.
pkg/kv/kvserver/replica_protected_timestamp.go, line 105 at r1 (raw file):
earliestValidRecord
Bad copy-paste?
pkg/kv/kvserver/replica_protected_timestamp.go, line 123 at r1 (raw file):
seen = true } return !seen
We lost the early return here. Don't we want that? It shouldn't be hard to add it back in.
pkg/kv/kvserver/replica_protected_timestamp.go, line 166 at r1 (raw file):
// here. If we do so, update the replica's cache of its state. The guarantee // we provide is that if a record is successfully verified then the state will // be updated to include it or some protected timestamp which precedes it.
Precedes it or follows it? I thought we're ratcheting the timestamp in cachedProtectedTimestampState
up.
3e5c6f5
to
672b452
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/kv/kvserver/client_replica_test.go, line 2951 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
How long does it take on average?
--- PASS: TestStrictGCEnforcement (1.75s)
--- PASS: TestStrictGCEnforcement/strict_enforcement (0.00s)
--- PASS: TestStrictGCEnforcement/disable_strict_enforcement (0.04s)
--- PASS: TestStrictGCEnforcement/zone_config_changes_are_respected (0.08s)
--- PASS: TestStrictGCEnforcement/system_ranges_are_unaffected (0.02s)
--- PASS: TestStrictGCEnforcement/protected_timestamps_are_respected (0.04s)
pkg/kv/kvserver/client_replica_test.go, line 3064 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Do you think it's worth structuring this test so that it will work regardless of whether
kv.gc_ttl.strict_enforcement
is defaulted to true or not (i.e. set to true at the beginning and avoidDEFAULT
)?
sure, done.
pkg/kv/kvserver/client_replica_test.go, line 3104 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Were you intending to extend this test?
I think I decided that there wasn't a lot of added value in this additional work.
pkg/kv/kvserver/replica.go, line 737 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: consider using
.Forward()
in here in both places.
Done here. The one on the other side is not a forward, it's a backwards if you will. I added a comment.
pkg/kv/kvserver/replica_protected_timestamp.go, line 46 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I feel like this would be cleaner as a method on
Replica
instead of a closure capturing thecachedProtectedTimestampState
reference and theReplica
reference, but I guess this is fine if you prefer this structure.
That's better.
pkg/kv/kvserver/replica_protected_timestamp.go, line 123 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
We lost the early return here. Don't we want that? It shouldn't be hard to add it back in.
Yes, that was intentional. If we early return we may miss a different earliest valid record and we wouldn't be able to reliably update the state.
pkg/kv/kvserver/replica_protected_timestamp.go, line 166 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Precedes it or follows it? I thought we're ratcheting the timestamp in
cachedProtectedTimestampState
up.
Well, we're ratcheting readAt
but we're not obviously changing the earliest record. Hence the confusion.
I've tried to make the comment clearer.
75de851
to
8d488fa
Compare
@nvanbenschoten this is RFAL |
8d488fa
to
60fdd29
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 9 of 9 files at r2.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @ajwerner)
pkg/kv/kvserver/helpers_test.go, line 473 at r2 (raw file):
} func (r *Replica) ReadProtectedTimestamps() {
Should this take a context.Context
?
1fc3bd0
to
a1fa14b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)
pkg/kv/kvserver/helpers_test.go, line 473 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Should this take a
context.Context
?
Done.
This PR creates a cluster setting to control whether the GC TTL is enforced "strictly". Strict GC enforcement is not a strong guarantee that we will serve no user reads whatsoever for timestamps that are older than the current reading of the node's clock less the GC TTL. Rather it is a setting to enable the behavior that most of the time, most requests will see an error if they use a timestamp which is older than the GC TTL. The main case in which the user will see an error is if the only reason a read would be able to request a timestamp was because of a protected timestamp. The process of verifying a protected timestamp will prevent this case. If the users of protected timestamps verify the protected timestamp before they operate, or they use admin commands, they'll be fine. In the presence of a protected timestamp, all requests can read back to that protected timestamp. Lease transfers are another source possibility for data to be visible at the GC Threshold rather than the TTL. Strict mode is controlled via a cluster setting. Strict mode does not apply to admin commands or to system ranges. The interesting thing with this all is that the problem which prompted this setting in the first place was due to backups being run at a period longer than the GC TTL and pretty much never failing. Backups in 20.1 (in theory) are going to use protected timestamps. If they successfully verify the protected timestamp then they'll be able to still succeed an even higher percentage of the time! If we want to discourage this behavior we could * Have the protected timestamp subsystem enforce that clients can't save data they currently view as expired. This is probably a bad idea because interestingly enough the protected timestamp subsystem ignores zone configs completely. * Have the higher levels try to do it. This seems probably better. The backup pulls the table descriptors and knows exactly the timestamp ranges it's backing up. In a lot of cases I can imagine wanting to back up data that is expired. Making that hard would probably be worse than making messing up like this worse. It probably also has zone configs. I might instead just warn loudly in this case. What this will do is bring greater awareness of the GC ttl. If you don't update or delete data frequently the GC hueristic won't kick in for quite some time. Release justification: bug fixes and low-risk updates to new functionality Release note (general change): GC TTLs will now be enforced by default. This enforcement can be controlled by disabling `kv.gc_ttl.strict_enforcement`
b99c9a1
to
984e50a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r3, 2 of 2 files at r4, 3 of 3 files at r5.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @ajwerner)
pkg/ccl/changefeedccl/changefeed_test.go, line 2325 at r5 (raw file):
) *roachpb.Error { if r, ok := ba.GetArg(roachpb.AdminVerifyProtectedTimestamp); ok {
nit: stray line
pkg/ccl/changefeedccl/changefeed_test.go, line 2327 at r5 (raw file):
verifyRequestCh <- r.(*roachpb.AdminVerifyProtectedTimestampRequest) return roachpb.NewError(errors.New("boom"))
What error does AdminVerifyProtectedTimestamp
return when it fails? Consider returning that here instead of this fake error so that readers of the test are more clear about why this might fail.
pkg/ccl/changefeedccl/changefeed_test.go, line 2350 at r5 (raw file):
pts := cfg.ProtectedTimestampProvider // Make sure that the canceled job gets moved through its OnFailOrCancel // phase.
"phase and removes its protected timestamp."
pkg/jobs/jobs.go, line 819 at r4 (raw file):
func (sj *StartableJob) Cancel(ctx context.Context) error { defer sj.registry.unregister(*sj.ID()) return sj.registry.CancelRequested(ctx, nil, *sj.id)
nit: why the sj.ID()
vs sj.id
?
This is important to unhook the job from the job registry. Release justification: bug fixes and low-risk updates to new functionality Release note: None
This is important to fix the ProtectedTimestamp tests in the face of a strict GC policy. We should send off an async verification request when creating protected timestamps due to scan boundaries but that can wait for a follow up PR. I was hoping to avoid this one too but it seems important. Release justification: bug fixes and low-risk updates to new functionality Release note: None
984e50a
to
2e0fa12
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)
pkg/ccl/changefeedccl/changefeed_test.go, line 2325 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: stray line
Done.
pkg/ccl/changefeedccl/changefeed_test.go, line 2327 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
What error does
AdminVerifyProtectedTimestamp
return when it fails? Consider returning that here instead of this fake error so that readers of the test are more clear about why this might fail.
Done.
pkg/ccl/changefeedccl/changefeed_test.go, line 2350 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
"phase and removes its protected timestamp."
Done.
pkg/jobs/jobs.go, line 819 at r4 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: why the
sj.ID()
vssj.id
?
no reason, made it consistent with the rest of the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bors r=nvanbenschoten
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)
Build failed |
Release justification. bors r=nvanbenschoten |
Build failed |
Flaked in something with protos and js? bors r+ |
Build failed |
flaked on #28033 bors r+ |
Build succeeded |
Indeed! |
This PR creates a cluster setting to control whether the GC TTL is enforced
"strictly". Strict GC enforcement is not a strong guarantee that we will
serve no user reads whatsoever for timestamps that are older than the current
reading of the node's clock less the GC TTL. Rather it is a setting to enable
the behavior that most of the time, most requests will see an error if they
use a timestamp which is older than the GC TTL.
The main case in which the user will see an error is if the only reason a
read would be able to request a timestamp was because of a protected timestamp.
The process of verifying a protected timestamp will prevent this case.
If the users of protected timestamps verify the protected timestamp before
they operate, or they use admin commands, they'll be fine.
In the presence of a protected timestamp, all requests can read back to
that protected timestamp.
Lease transfers are another source possibility for data to be visible at the
GC Threshold rather than the TTL.
Strict mode is controlled via a cluster setting.
Strict mode does not apply to admin commands or to system ranges.
The interesting thing with this all is that the problem which prompted this
setting in the first place was due to backups being run at a period longer
than the GC TTL and pretty much never failing.
Backups in 20.1 (in theory) are going to use protected timestamps. If they
successfully verify the protected timestamp then they'll be able to still
succeed an even higher percentage of the time! If we want to discourage this
behavior we could
Have the protected timestamp subsystem enforce that clients can't save
data they currently view as expired. This is probably a bad idea because
interestingly enough the protected timestamp subsystem ignores zone configs
completely.
Have the higher levels try to do it. This seems probably better. The backup
pulls the table descriptors and knows exactly the timestamp ranges it's
backing up. In a lot of cases I can imagine wanting to back up data that
is expired. Making that hard would probably be worse than making messing
up like this worse. It probably also has zone configs. I might instead
just warn loudly in this case.
What this will do is bring greater awareness of the GC ttl. If you don't
update or delete data frequently the GC hueristic won't kick in for quite
some time.
Release justification: bug fixes and low-risk updates to new functionality
Release note (general change): GC TTLs will now be enforced by default.
This enforcement can be controlled by disabling
kv.gc_ttl.strict_enforcement